scikit-learn-contrib / boruta_py

Python implementations of the Boruta all-relevant feature selection method.
BSD 3-Clause "New" or "Revised" License
1.46k stars 253 forks source link

How to speed up the algorithm? #128

Open kkckk1110 opened 3 days ago

kkckk1110 commented 3 days ago

I found the process extremely slow when having many features. How can I speed up the process? Can I use the GPU to accelerate?

Wuuzzaa commented 3 days ago

GPU is not supported. Ensure to use all CPU cores in your estimator, if not done already. Maybe consider using a faster/simpler estimator. If the dataset is too huge, use subsampling to speedup. The feature importance should not change extremely when 10k rows compared with 10m rows. When all of the above fails, try to filter out some features with simpler methods and run boruta afterward. Hope this helps.