topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 634 forks source link

Discussion/ Request: Optimizing probability thresholds for class imbalances using CV in caret::train() #1360

Open leowerne opened 5 months ago

leowerne commented 5 months ago

As mentioned on topepo.github.io, own models can be specified to find an optimal threshold in care::train() in cases where it may be needed due to class imbalance. The example (using rf) works by creating submodels and a loop. This cannot be easily applied to models that already use submodels and a loop like gbm and xgb. How can I optimize the threshold using CV in those cases? I tried to just loop through all options, but seem to be unable to make this inefficient option work. Can an explanation or even a feature be added for those cases? The option that I would go for instead is to train models using CV without optimizing the threshold, then optimizing the threshold using caret::thresholder(). But even if I implemented some makeshift CV in the post-training thresholding, the potentially optimal models could be discarded in caret::train() since it is inferior to other models given the default threshold. Thank you and best Regards

leowerne commented 5 months ago

Maybe I missunderstood something, but would I achieve the same result if I use thresholder with final = F?