scikit-learn-contrib / boruta_py

Python implementations of the Boruta all-relevant feature selection method.
BSD 3-Clause "New" or "Revised" License
1.46k stars 252 forks source link

Misleading documentation regarding similarity to original R Boruta #91

Closed DSKritzinger closed 3 years ago

DSKritzinger commented 3 years ago

Is it not misleading to say that BorutaPy is "exactly the same ..." as the original Boruta algorithm implemented in R, when it is clear that BorutaPy makes use of the native feature selection importances scores, derived from gini impurity, in comparison to the original and still implemented R versions "mean decrease in accuray" (mda) feature importance approximation method.

It is known that gini impurity feature importance scores are biased towards features with high cardinalities and the results between the gini impurity and mda approach is vastly different.

It is necessary to either state this more clearly in the documentation, or append the necessary methods to BorutaPy.

danielhomola commented 3 years ago

Changed the readme.