Open lcrmorin opened 3 days ago
Basically, we are also working in scikit-learn on this topic. As milestone, we want to have an example that show the effect of sample-weight and class-weight in scikit-learn and then I would like to revamp the documentation of imbalanced-learn
.
Describe the issue linked to the documentation
There is some discussion going on about the usefulness of some (if not all) over / under sampling methods implemented in the imbalanced learn package.
Typically there is some doubt about the usefulness of SMOTE:
Basically it seems that:
I think that it is a problem that those discussions are not more visible to the newcomers. (And that more experienced people need to have to deal with that on a weekly basis).
Suggest a potential alternative/fix
It would be nice to have
1) a clearer demonstration in the doc, because for the moment only the usage is described:
It shows that it oversampled, but not that it works either in terms or ranking (AUC) / probability calibration (ECE / calibration curve).
Could the doc be upgraded with a better exemple ?
2) a visible user warning regarding the discussions on usefulness of these methods.
While (one of the) authors have changed its mind about the usefulness of these methods, it seems that a younger crowd is still very eager to jump on these shiny methods. I think it would be helpful for the DS community to make a clearer stance.
I would suggest at least a very visible warning in the doc, like a red banner ('there are some discussion about the usefulness of these methods. See: XXX. Use with caution').
This could be expanded with a UserWarning... may be a bit brutal but it could prevent a lot of trouble.
Edit: not sure why it added the good first issue automatically... but I'll take it.