scikit-learn-contrib / boruta_py

Python implementations of the Boruta all-relevant feature selection method.
BSD 3-Clause "New" or "Revised" License
1.5k stars 256 forks source link

Results vary in R in Python implementation #28

Closed sisupalan closed 6 years ago

sisupalan commented 7 years ago

It is mentioned "the two_step parameter has to be set to False, then (with perc=100) BorutaPy behaves exactly as the R version." Inspite of doing this, results vary significantly. Is there a way to replicate the results exactly as the R version of the package?

mbq commented 7 years ago

Boruta, in general, can be unstable when the number of trees in the forest is too low (prohibiting importance scores from convergence) -- you may want to check it out.

danielhomola commented 6 years ago

closing this now! as always, thanks a lot Miron for chiming in!