Open juliasilge opened 2 years ago
Related to the consistency of arguments, should the newly-added repeats
argument for group_vfold_cv()
come after v
in the arg list (i.e., before balance
), as is the case for its non-grouped sibling vfold_cv()
?
It borks the positional arg matching with the current implementation:
library(rsample)
vfold_cv(mtcars, 5, 2)
#> # 5-fold cross-validation repeated 2 times
#> # A tibble: 10 × 3
#> splits id id2
#> <list> <chr> <chr>
#> 1 <split [25/7]> Repeat1 Fold1
#> 2 <split [25/7]> Repeat1 Fold2
#> 3 <split [26/6]> Repeat1 Fold3
#> 4 <split [26/6]> Repeat1 Fold4
#> 5 <split [26/6]> Repeat1 Fold5
#> 6 <split [25/7]> Repeat2 Fold1
#> 7 <split [25/7]> Repeat2 Fold2
#> 8 <split [26/6]> Repeat2 Fold3
#> 9 <split [26/6]> Repeat2 Fold4
#> 10 <split [26/6]> Repeat2 Fold5
group_vfold_cv(mtcars, cyl, 5, 2)
#> Error in `group_vfold_cv()`:
#> ! `balance` must be a character vector, not a number.
Related to the consistency of arguments, should the newly-added
repeats
argument forgroup_vfold_cv()
come afterv
in the arg list (i.e., beforebalance
), as is the case for its non-grouped siblingvfold_cv()
?
I opened #334 for this. That makes a lot of sense to me (and balance
hasn't hit CRAN yet so I feel fine just doing that)
There is a lot of nice consistency between the new grouped variants implemented in #313, #315, and #316, but the original, older grouped resampling function
group_vfold_cv()
doesn't have the same default as its sister functionvfold_cv()
. With the default ofv = NULL
, it currently returns what is also called "leave-group-out" CV.As @mattwarkentin correctly points out in #324:
Unfortunately, making these defaults consistent would be a breaking change, and also unfortunately it's the kind of breaking change we don't have good options for to communicate with users about. On the other hand, we know that
group_vfold_cv()
isn't a heavily used function in rsample.🎯 What is the best option to pursue in this situation?