scikit-learn-contrib / boruta_py

Python implementations of the Boruta all-relevant feature selection method.
BSD 3-Clause "New" or "Revised" License
1.46k stars 252 forks source link

[ENH] Include importance history in the output #87

Closed davidfstein closed 3 years ago

davidfstein commented 3 years ago

The R package provides the ImpHistory object in the output of the fit feature selector. This allows for more granular interpretation of the importance of the features beyond accepted, tentative, and rejected. I think it would be worthwhile to include this feature in this package as well.

danielhomola commented 3 years ago

Kinda sounds like you're looking for reasons /was to include badly performing features.. I'd stay away from anything that's not confirmed..

davidfstein commented 3 years ago

No not at all. I just want a more granular look at the accepted features. I'm hoping for something like this. image

The inclusion of the importance history in the R package allows you to go beyond the accepted features and to look at a summary of their importances across the runs.

I'm working on a biological classification problem and it would be nice to have a look at which features provide the most explanatory value.