tdhock / mlr3resampling

Resampling algorithms for mlr3 framework in R
3 stars 1 forks source link

VariableSizeTrain enhanced strata support #5

Closed tdhock closed 8 months ago

tdhock commented 8 months ago

strata are already supported when deciding folds, but not when choosing the random order to go through the train data. would be especially useful for multi-class problems. would be nice to have each sample be a max likelihood according to the multinomial.

tdhock commented 8 months ago

min_train_data could be interpreted as the min number of samples for the least frequent stratum, and then we would take other strata proportionally. for example we could add a test for the following. 10% class A, 90% class B, 2000 samples overall in train. smallest size would be 10 A / 90 B, largest would be 200 A, 1800 B.