scikit-learn-contrib / boruta_py

Python implementations of the Boruta all-relevant feature selection method.
BSD 3-Clause "New" or "Revised" License
1.5k stars 256 forks source link

idea: early stopping based on % tenative #56

Closed angelosarto closed 5 years ago

angelosarto commented 5 years ago

I have a feature idea: maybe it would be possible to stop early if the number of tentative features reaches a threshold (possibly a percentage of the full feature set, or a specific number, or if we want to get fancy - a function parameter that returns a boolean.)

Why? I noticed that in one instance, Boruta has less than 5% of my features marked as tentative after less than 10 rounds, but then it may take many many rounds to classify these 5%. In a lot of cases I would be fine just calling all of these confirmed.

I could work on a PR for this, but thought I would ask before I start working on it.

danielhomola commented 5 years ago

Thanks for the suggestion! Unfortunately, I don't think this is widely applicable or that it be implemented in a way that would work for many datasets.. Quite often those tentative features will be discarded (and rightly so) with subsequent runs, so confirming them due to impatience is the wrong thing to do. Also, if you don't want to wait, simply terminate the run and re-run Boruta with smaller iteration number.

If you do implement this on your fork and find it useful on other datasets, do let me know and then I'll be reconsider a PR for this, but currently I can't see how this feature might be useful for a wide range of datasets.