scikit-learn-contrib / boruta_py

Python implementations of the Boruta all-relevant feature selection method.
BSD 3-Clause "New" or "Revised" License
1.46k stars 252 forks source link

How to use boruta_py with BaggingClassifier? #75

Closed BlackArbsCEO closed 4 years ago

BlackArbsCEO commented 4 years ago

Does it work with ensembles besides randomforest?

a-berg commented 4 years ago

This I'd like to know too. From the README:

A supervised learning estimator, with a 'fit' method that returns the featureimportances attribute. Important features must correspond to high absolute values in the featureimportances.

However, I tried using RReliefF and it gives an error, saying that the method needs a "max_depth" variable (therefore can only work with random forests).

    371     def _get_tree_num(self, n_feat):
--> 372         depth = self.estimator.get_params()['max_depth']
    373         if depth == None:
    374             depth = 10

I'd urge the author to fix the README as to not mislead people into believing this works with any other algorithm that is not tree-based.

UTUnex commented 4 years ago

Hi, I also met the problem that a-berg mentioned. It seems that the model must explicitly have the parameter 'max_depth' to make it usable in BorutaPy. I'm working with the model 'AdaBoostClassifier' and 'RUSBoostClassifier', both of which do not explicitly have the 'max_depth' in their parameter lists and when either of them is put in BorutaPy, there just appears the KeyError: 'max_depth'. I wonder whether this problem can be solved.

danielhomola commented 4 years ago

This is the only place we rely on the max_depth param of the estimator https://github.com/scikit-learn-contrib/boruta_py/blob/master/boruta/boruta_py.py#L398

It'd be trivial to add a try/except statement here, and make sure to inform the user if the estimator does not have a max_depth property and as a consequence we cannot automatically estimate the number of trees to use.. Any one of you'd like to submit a PR for this?

Thanks

UTUnex commented 4 years ago

@danielhomola ,hi, thank you for your reply. So will you consider making some modification to the code to support those estimators like Adaboost which itself doesn't explicitly have the max_depth parameter but its base_estimator like DecisionTree does have this parameter. So I wonder whether it's possible for you to make the BorutaPy able to extract such implict max_depth in the future.

Besides, I also tried to put the XGBoostClassifier and LightGBMClassifier in BorutaPy and again the error occurred, it's about the RandomState problem : 'TypeError: Unknown type of parameter:random_state, got:RandomState'. I dont know how to solve this, could you please help? Do you have the plan to improve the compatibility of BorutaPy with other popular tree-based models besides RandomForest?

danielhomola commented 4 years ago

?? no.. that's why I asked you to submit a PR.. if you do so, I'll review it and merge it happily but I don't have time to actively do dev on this repo any more.