Closed Ofran-a closed 1 year ago
Yes. Stratified CV is done automatically for binary and categorical outcomes in the devel branch of sl3 (which will eventually be merged to master), so if you install that branch, then you don’t need to do anything else to achieve this.
Otherwise, you can create the folds yourself using the origami R package make_folds
function, and you can specify the strata you want to stratify by (i.e., your outcome) in that function. You can then pass the returned object to the folds
argument of make_sl3_Task
.
Thank you so much :)
My pleasure! This is a common Q, and now we have an answer to refer to when it comes up again. Thanks for filing the issue ☺️
Hello The outcome in my dataset is binary and quite sparse, I have 2700 outcomes in a dataset of 44,500 with 28 covariates. Is there a way to make sure that the cross validation folds are stratified by the outcome in the same way as can be done in
glmnet
. For some of the learners I was able to add the optionstratify_cv = TRUE
, like below:Lrnr_glmnet$new(stratify_cv = TRUE, family = "binomial", alpha = 1, use_min = TRUE)
When I extract the predictions from the tmle fit object then convert them into classes, I get only ~500 positive outcomes and the rest are classified as negative.
Many thanks Ofran