tidymodels / spatialsample

Create and summarize spatial resampling objects 🗺
https://spatialsample.tidymodels.org
Other
71 stars 5 forks source link

Wrap clustering_cv() #126

Closed mikemahoney218 closed 1 year ago

mikemahoney218 commented 1 year ago

This fixes #120 and fixes #104 by wrapping rsample::clustering_cv().

There's three big (breaking) changes here that I'm aware of:

  1. spatial_clustering_cv() no longer handles non-sf objects, because I think they'd be better handled via rsample::clustering_cv().
  2. Distances are now calculated between edges, not centroids, of non-point geometries. This is how the rest of the package works, and it makes the most sense to me; if you have polygons which touch, you'd probably assume their data has some amount of spatial relationship, regardless of where the midpoint of each polygon is.
  3. Because the new distance_function argument is now a function by default, and gets assigned as an attribute to the resulting rset, the distance_function attribute winds up having a somewhat complex environment, which is non-intuitive:
library(spatialsample)

clust <- spatial_clustering_cv(boston_canopy, v = 2)
lobstr::obj_sizes(
  boston_canopy,
  environment(attr(clust, "distance_function"))
)
#> * 984.07 kB
#> *  60.74 kB

ls(environment(attr(clust, "distance_function")))
#>  [1] "buffer"            "cluster_function"  "cv_att"           
#>  [4] "data"              "distance_function" "n"                
#>  [7] "radius"            "repeats"           "rset"             
#> [10] "v"

Created on 2022-12-08 by the reprex package (v2.0.1)

I'm not sure if there's a good way to "zero out" that environment, so that we aren't accidentally dragging extra data along.

Otherwise, this function should work the same as it always has.

github-actions[bot] commented 1 year ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.