scikit-learn-contrib / boruta_py

Python implementations of the Boruta all-relevant feature selection method.
BSD 3-Clause "New" or "Revised" License
1.46k stars 252 forks source link

Example Madalon_Data_Set.ipynb does not work #29

Closed flaviozamponi closed 6 years ago

flaviozamponi commented 6 years ago

Dear all, I'm getting acquainted with boruta and I have tried to execute the example Madalon_Data_Set.ipynb. Everything works fine until

feat_selector.fit(X,y),

where I get the following error message: `TypeError Traceback (most recent call last)

in () ----> 1 feat_selector.fit(X,y) /usr/local/lib/python3.5/dist-packages/boruta/boruta_py.py in fit(self, X, y) 199 """ 200 --> 201 return self._fit(X, y) 202 203 def transform(self, X, weak=False): /usr/local/lib/python3.5/dist-packages/boruta/boruta_py.py in _fit(self, X, y) 283 284 # add shadow attributes, shuffle them and train estimator, get imps --> 285 cur_imp = self._add_shadows_get_imps(X, y, dec_reg) 286 287 # get the threshold of shadow importances we will use for rejection /usr/local/lib/python3.5/dist-packages/boruta/boruta_py.py in _add_shadows_get_imps(self, X, y, dec_reg) 396 # find features that are tentative still 397 x_cur_ind = np.where(dec_reg >= 0)[0] --> 398 x_cur = np.copy(X[:, x_cur_ind]) 399 x_cur_w = x_cur.shape[1] 400 # deep copy the matrix for the shadow matrix /usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in __getitem__(self, key) 1962 return self._getitem_multilevel(key) 1963 else: -> 1964 return self._getitem_column(key) 1965 1966 def _getitem_column(self, key): /usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in _getitem_column(self, key) 1969 # get column 1970 if self.columns.is_unique: -> 1971 return self._get_item_cache(key) 1972 1973 # duplicate columns & possible reduce dimensionality /usr/local/lib/python3.5/dist-packages/pandas/core/generic.py in _get_item_cache(self, item) 1641 """Return the cached item, item represents a label indexer.""" 1642 cache = self._item_cache -> 1643 res = cache.get(item) 1644 if res is None: 1645 values = self._data.get(item) TypeError: unhashable type: 'slice'` Is there a simple solution? I'm working on a linux machine with python 3.5. My python packages are up to date. Many thanks in advance, Flavio
tagomatech commented 6 years ago

Replace feat_selector.fit(X,y) with feat_selector.fit(X.values,y.values), You will see this generates another error. In fact, the argument auto in rf = RandomForestClassifier(n_jobs=-1, class_weight='auto', max_depth=3). Replace it with a valid one, that is balanced or balanced_subsample . See my pull request.

flaviozamponi commented 6 years ago

Hi tagomatech, it works fine now. Thanks. Flavio