tidymodels / rsample

Classes and functions to create and summarize resampling objects
https://rsample.tidymodels.org
Other
341 stars 66 forks source link

Function for returning the original data frame with fold assignments appended? #468

Open mikemahoney218 opened 8 months ago

mikemahoney218 commented 8 months ago

Feature

Over on spatialsample, there have been a few requests (https://github.com/tidymodels/spatialsample/issues/158, https://github.com/tidymodels/spatialsample/issues/157) for a function that basically works like this:

library(rsample)
library(magrittr)
library(generics)

augment.rset <- function(rset, ..., fold_column = "fold") {
  purrr::list_rbind(
    purrr::map(
      seq_len(nrow(rset)),
      function(fold) {
        fold_members <- get_rsplit(rset, fold) %>%
          assessment()
        fold_members[[fold_column]] <- fold
        fold_members
      }
    )
  )
}

vfold_cv(Orange) %>%
  augment()
#>    Tree  age circumference fold
#> 1     1 1004           115    1
#> 2     3 1231           115    1
#> 3     5 1004           125    1
#> 4     5 1231           142    1
#> 5     2  118            33    2
#> 6     2 1231           172    2
#> 7     4  664           112    2
#> 8     5  484            49    2
#> 9     1  118            30    3
#> 10    2 1582           203    3
#> 11    3  118            30    3
#> 12    4 1231           179    3
#> 13    1  484            58    4
#> 14    1 1582           145    4
#> 15    4 1004           167    4
#> 16    5  118            30    4
#> 17    1  664            87    5
#> 18    2 1004           156    5
#> 19    3  484            51    5
#> 20    5 1372           174    5
#> 21    1 1372           142    6
#> 22    2  664           111    6
#> 23    4 1372           209    6
#> 24    1 1231           120    7
#> 25    3 1582           140    7
#> 26    4  118            32    7
#> 27    2 1372           203    8
#> 28    3 1372           139    8
#> 29    5  664            81    8
#> 30    2  484            69    9
#> 31    3 1004           108    9
#> 32    5 1582           177    9
#> 33    3  664            75   10
#> 34    4  484            62   10
#> 35    4 1582           214   10

Created on 2024-03-06 with reprex v2.0.2

I think this is wanted both as an "escape hatch" from spatialsample, to go and use these CV objects with models that aren't (yet?) built into the tidymodels framework, and to make it easier to visualize fold assignments. The above is basically how autoplot.spatial_rset gets fold assignments for its own visualizations.

Would it make sense to add a function like this to rsample?

mikemahoney218 commented 8 months ago

Thinking about this for a second longer -- the implementation above wouldn't work with repeated CV (or nested, I think)