Closed anjelinejeline closed 8 months ago
Does this go away if you call library(sf)
at the top of your script? Sorry, away from a computer so I can't test this myself, but that should work.
Hello @anjelinejeline 👋
What are you expecting to get back when applying unnest()
here? I don't know the rsample packages as much as @mikemahoney218, but I don't see that as something that these packages support.
@mikemahoney218 no unfortunately it does not go away ... BTW @EmilHvitfeldt I am trying to unlist the column with the fold data .. I am also struggling to create spatial clusters with equal size.. I need equal sized folds to use the predict function of a spatialregression as it is not possible to predict on a dataset with different size.. can you help me with that too? Is there a function in this package I could use?
Sorry - Emil was more careful than I was and understood the actual problem better :)
So the key issue here is that there's not really a column that contains "the fold data" as you might expect. If you're interested, I wrote a blog post a while back about the internals of the objects in rsample and spatialsample, but the key thing is that the splits
column doesn't actually contain the data assigned to each fold, but rather the row indices of the assessment set for each split of your data. So "unnesting" here doesn't make a ton of sense, because you don't want to unnest those indices; you want (I think!) a record of what row belongs to what assessment set.
So the easiest way to get that, assuming I understand what you're looking for, is to get each assessment set separately, give it an identifier, and then combine those into a single table.
For example, say we've got some rset
object that looks like this:
set.seed(123)
library(spatialsample)
nc <- sf::read_sf(system.file("shape/nc.shp", package = "sf"))
cluster_folds=spatial_clustering_cv(nc, v = 10)
autoplot(cluster_folds)
We could use the following code to pull out what row belongs to what fold (and obviously, drop the ggplot2 code if you just want the output data frame):
lapply(
seq_len(nrow(cluster_folds)),
function(fold) {
get_rsplit(cluster_folds, fold) |>
assessment() |>
dplyr::mutate(fold = fold)
}
) |>
do.call(what = rbind) |>
ggplot2::ggplot(ggplot2::aes(fill = factor(fold))) +
ggplot2::geom_sf()
Created on 2024-02-02 with reprex v2.0.2
Let me know if that isn't what you're trying to accomplish, but I think this is how you get what you're looking for.
As for
create spatial clusters with equal size
This isn't something we currently support in spatialsample directly. Would you be able to link the package (or paper, or so on) that you're using that has this restriction? What happens if the number of data points are a prime number, and so can't be divided evenly into folds?
What you could do is pass a custom function to the cluster_function
argument. That custom function can use whatever logic you want, in order to enforce that all folds are of equal sizes. Hopefully the function documentation (especially the Details
section) is helpful in describing what that function needs to accept and return -- but let me know if it isn't and if I can help clarify anything.
I'm going to go ahead and close this out -- please feel free to open a new issue if we didn't wind up fixing the core problem here!
Hi I would like to unnest the rsplit object but I am not able to do it This is my code