Closed guitarmind closed 5 years ago
HI, @guitarmind I tried the Madalon Dataset. ipynd provided by the package. It showed the following error message "Type error: unhashable type: slice" from "pandas/core/generic.py line 2487 : res= cache.get (item)". The current python version I used is 3.6.4 pandas is 0.23.4. Could you provide your package setting from your test environment?
Hi @freshnemo,
Are you using the my forked version? The error Type error: unhashable type: slice
is actually what I was trying to fix in the Madalon_Data_Set
notebook. It is because that X
needs to be an numpy array to do slicing in the line 402 of boruta_py.py
source:
x_cur = np.copy(X[:, x_cur_ind])
In the PR I made a change in the notebook to get X in numpy format:
y = data.pop('target')
X = data.copy().values
Note that this PR is not merged so the changes are not applied yet. Could you share full stacktrace mesage to know more details, thanks.
Test environment:
Yes, when I forked the commend you provided, at beginning, Boruta_py can run but will soon stop. If the iteration is 100, Boruta usually stop at 45 iteration and show the error as #47 "if not_selected.shape[0] > 0 and not_selected.shape[1] > 0:" tuple is out of index. My python version is 3.6.4 and pandas is 0.23.4. In addition, I found an example which used Boruta_py at kaggle which could work. This is why I suspect python 3.6.4 might have a bug.
So what is the stacktrace of error? "if not_selected.shape[0] > 0 and not_selected.shape[1] > 0:
has been changed in the PR.
Oh, Sorry, I did not check the "file changed " tab. I modified the code which you provided. Thanks for your help, the code can run.
Good to know that 👍
Edit: Seems like it's working. 😄
Hi @guitarmind,
When I change the line 336 to if not_selected.shape[0] > 0:
I get this error:
IndexError Traceback (most recent call last)
<timed eval> in <module>()
<ipython-input-40-4c6a084e678c> in fit(self, X, y)
199 """
200
--> 201 return self._fit(X, y)
202
203 def transform(self, X, weak=False):
<ipython-input-40-4c6a084e678c> in _fit(self, X, y)
312 tentative = np.where(dec_reg == 0)[0]
313 # ignore the first row of zeros
--> 314 tentative_median = np.median(imp_history[1:, tentative], axis=0)
315 # which tentative to keep
316 tentative_confirmed = np.where(tentative_median
IndexError: too many indices for array
Before changing it just gives tuple index error in the end but function works properly. Do you have any idea?
This PR relates to #47, a bug I made in #46. We should only check the size of 1st dimension of
not_selected
array, as it would be0
already if all features are relevant.This time I have also double-checked that unit test case is passed (fixed a small issue inside as well).
Here is the output log of unit test:
I also discovered some compatibility issues to Python (I'm using 3.6.5) and Pandas in the example notebook while doing correctness test, and it should work well with current version as well!