scikit-learn-contrib / boruta_py

Python implementations of the Boruta all-relevant feature selection method.
BSD 3-Clause "New" or "Revised" License
1.46k stars 252 forks source link

Fix TypeError: iteration over a 0-d array error when no features are rejected #46

Closed guitarmind closed 5 years ago

guitarmind commented 5 years ago

Hi there,

This PR should provide a fix to the error of TypeError: iteration over a 0-d array from self._nanrankdata function. It relates to the past ticket #16 and #43.

In my test cases, the error is triggered when all features seem to be important and caused the not_selected variable become an empty array [] with a shape of (0,):

# all rejected features are sorted by importance history
not_selected = np.setdiff1d(np.arange(n_feat), selected)

After that imp_history_rejected, a 0-d array with a shape of (len(imp_history)-1, 0), is used as the input to the self._nanrankdata function, and further triggers the error.

# large importance values should rank higher = lower ranks -> *(-1)
imp_history_rejected = imp_history[1:, not_selected] * -1

# calculate ranks in each iteration, then median of ranks across feats
iter_ranks = self._nanrankdata(imp_history_rejected, axis=1)
......

The fix here is to make sure that all dimensions of not_selected are greater than zero. If not, it assumes that we should keep all features as important and set their _support to True.

# all are selected, thus we set feature supports to True
self.support_ = np.ones(n_feat, dtype=np.bool)
danielhomola commented 5 years ago

Thanks a lot for your contribution, I really appreciate it!

guitarmind commented 5 years ago

@danielhomola Thanks for your acceptance! I'm glad to help! 👍