scikit-learn-contrib / boruta_py

Python implementations of the Boruta all-relevant feature selection method.
BSD 3-Clause "New" or "Revised" License
1.5k stars 256 forks source link

can't reproduce the example #31

Closed Issam28 closed 6 years ago

Issam28 commented 6 years ago

i can't reproduce your example with my dataset . here's the error that i'm getting

 File "/home/imahmoudi/python/lib/python3.6/site-packages/boruta/boruta_py.py", line 380, in _get_imp
    self.estimator.fit(X, y)
  File "/home/imahmoudi/python/lib/python3.6/site-packages/sklearn/ensemble/forest.py", line 272, in fit
    y, expanded_class_weight = self._validate_y_class_weight(y)
  File "/home/imahmoudi/python/lib/python3.6/site-packages/sklearn/ensemble/forest.py", line 493, in _validate_y_                                                       class_weight
    % self.class_weight)
ValueError: Valid presets for class_weight include "balanced" and "balanced_subsample". Given "auto".

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "FSelection.py", line 31, in <module>
    feat_selector.fit(X[:5000], y[:5000])
  File "/home/imahmoudi/python/lib/python3.6/site-packages/boruta/boruta_py.py", line 201, in fit
    return self._fit(X, y)
  File "/home/imahmoudi/python/lib/python3.6/site-packages/boruta/boruta_py.py", line 285, in _fit
    cur_imp = self._add_shadows_get_imps(X, y, dec_reg)
  File "/home/imahmoudi/python/lib/python3.6/site-packages/boruta/boruta_py.py", line 408, in _add_shadows_get_im                                                       ps
    imp = self._get_imp(np.hstack((x_cur, x_sha)), y)
  File "/home/imahmoudi/python/lib/python3.6/site-packages/boruta/boruta_py.py", line 383, in _get_imp
    'estimator cannot be fitted to your data.\n' + e)
TypeError: must be str, not ValueError

any idea how to solve this ?

tagomatech commented 6 years ago

Fortunately, these are not dramatic issues.

The first one is due to a mistake in the authors' example code. 'auto' is not a valid parameter for class_weight. You have to replace it by a valid one, that is balanced or balanced_subsample. (the error message you get is explicit on this) E.g: rf = RandomForestClassifier(n_jobs=-1, class_weight='auto', max_depth=5)

see my pull request.

As per your second point. It is a slight coding error in the definition of the _get_imp() function. See my other pull request.

Once you are done with these changes, it shall work ok.

danielhomola commented 6 years ago

sorry, scikit learn changed the RF api.. this used to be 'auto' for sure. Thanks a lot for the fix @tagomatech