scikit-learn-contrib / boruta_py

Python implementations of the Boruta all-relevant feature selection method.
BSD 3-Clause "New" or "Revised" License
1.46k stars 252 forks source link

Can I somehow speed the Borutapy process #108

Closed VEZcoding closed 1 year ago

VEZcoding commented 1 year ago

Hello is there anyway to parallelize the iterations? Or some parameter like n-jobs or something. its super slow with 3k features

Wuuzzaa commented 1 year ago

Hi it is not possible to parallelize the iterations because each iteration needs the results of the previous. You can provide an estimator with n_jobs=-1 to boruta if you want. Too speed up you can try early_stopping=True or reduce the n_estimators. Maybe it is a good idea to prefilter your features first with simpler stuff like duplicate or constant features.

How much samples do you have. If they are huge use a subsample like 10k. I am sure it will make no huge diffrence to run boruta on 10k or 1 mio. samples regarding to the selected features.

VEZcoding commented 1 year ago

Thanks for you answer, Will check this early stopping :)