scikit-learn-contrib / boruta_py

Python implementations of the Boruta all-relevant feature selection method.
BSD 3-Clause "New" or "Revised" License
1.46k stars 252 forks source link

Overwrites model's random state #67

Open MichaelRFox opened 4 years ago

MichaelRFox commented 4 years ago

It seems that boruta passes RandomState(MT19937) to the model it is fitting regardless of the model's parameters. This doesn't bother a random forest model, but causes an xgBoost model to fail with the following error:

ValueError: Please check your X and y variable. The provided estimator cannot be fitted to your data. Invalid Parameter format for seed expect int but value='RandomState(MT19937)'

DreHar commented 4 years ago

Try changing line 283 in boruta/boruta_py.py to something like:

self.estimator.set_params(random_state=self.random_state.random_integers(1e9))

If you are concerned with the seed you could pass something nicer.. this is just a hack.

I'm not sure if this is the best way to do it but maybe boruta should instead of passing the seed object it should take a seeded integer like this so it's compatible with more models (xgboost). I would suggest it taking either a seed integer, or if it's a seed object to generate the first integer from that object.

sskarkhanis commented 4 years ago

I kept wondering what I was doing wrong with using xgboost & boruta. I hope there is a fix soon.