ralphhaygood / sklearn-gbmi

scikit-learn gradient-boosting-model interactions
MIT License
25 stars 6 forks source link

import error ModuleNotFoundError: No module named 'sklearn.ensemble.partial_dependence' #1

Closed nemar3 closed 2 years ago

nemar3 commented 4 years ago

Hi

thanks for creating this package, I installed it now and got an import error

File "D:\Anaconda3\envs\py37\lib\site-packages\sklearn_gbmi\sklearn_gbmi.py", line 13, in import sklearn.ensemble.partial_dependence as partial_dependence

ModuleNotFoundError: No module named 'sklearn.ensemble.partial_dependence'

quick fix is to replace:

import sklearn.ensemble.partial_dependence as partial_dependence

with import sklearn.inspection.partial_dependence as partial_dependence

ralphhaygood commented 4 years ago

Thanks for calling this problem to my attention. Unfortunately, the quick fix isn't enough, because they didn't just move the function to a different module. Very annoyingly, they completely redefined the function, so it now takes different arguments and returns different results. It's going to take me awhile to sort this mess out. I'll post here when I've done so.

Vasile-Ciorna commented 4 years ago

Please do as this is really annoying!

ralphhaygood commented 4 years ago

Unfortunately, I'm not going to get to this until July, because it's nontrivial, and I have a deadline to meet for paid work at the end of June.

What makes it nontrivial is that the people responsible for Scikit-learn's partial_dependence function didn't just move it to a new module and didn't just rearrange its arguments and results but actually abolished the entire operating mode my package used, namely evaluation on a specified grid. Compare the old version,

https://docs.w3cub.com/scikit_learn/modules/generated/sklearn.ensemble.partial_dependence.partial_dependence/ ,

with the new one,

https://scikit-learn.org/stable/modules/generated/sklearn.inspection.partial_dependence.html ,

and you'll see that the argument specifying "[t]he grid of target_variables values for which the partial dependecy [sic] should be evaluated" has vanished. The old version optionally chose its own grid; the new version always does.

In my opinion, this change is gratuitous. Looking into the source code, it seems to me they could easily have continued supporting both operating modes. Frankly, it's obnoxious to break backward compatibility without a truly compelling reason, and I don't see one here.

I could work around the problem by, essentially, rewriting their function to include the omitted operating mode. However, it would be calling functions that aren't in the public API and hence could, in principle, be changed without notice at any time. It would be safer to create my own complete replacement, but that's somewhat more complicated. So, nontrivial! I'll get to it, but not this month.

Vasile-Ciorna commented 4 years ago

Thank you very much! I tried implementing something on my own but for now is very slow!

Halmari commented 4 years ago

Thank you for developing the package! I'm also interested in the fix for this issue.

janphhe commented 4 years ago

Thank you also from my side, a very useful implementation! After having successfully used it with the scikit-learn 0.20 I am now (after upgrading) running into similar problems like the others:

    199 def compute_f_vals(gbm, model_inds, arr, inds):
    200     feat_vals, feat_val_counts = unique_rows_with_counts(arr[:, inds])
--> 201     uncentd_f_vals = partial_dependence.partial_dependence(gbm, model_inds[(inds,)], grid = feat_vals)[0][0]
    202     mean_uncentd_f_val = np.dot(feat_val_counts, uncentd_f_vals)/arr.shape[0]
    203     f_vals = uncentd_f_vals-mean_uncentd_f_val

AttributeError: 'function' object has no attribute 'partial_dependence'
ralphhaygood commented 4 years ago

I've attempted to solve the problem, I hope successfully. The package no longer depends on either the defunct sklearn.ensemble.partial_dependence.partial_dependence or the new sklearn.inspection.partial_dependence, as I've more or less copied the appropriate parts of the former into this package.

For performance, those parts include some Cython code, which gets converted into C code, which gets compiled when the package is installed. This process is more delicate than installing a standard, pure Python package. Per Cython recommendation, the distributed version of this package includes the C code, so a user doesn't need to have installed or know anything about Cython in order to install the package ... unless something goes wrong. What I've seen under Python 2.7 on a machine running Ubuntu 18.04 LTS is that the C code compiles but crashes when run. (Under Python 3.6 on the same machine, there's no problem.) In such cases, the user needs to regenerate the C code as part of installation. The README file of this package now includes instructions for doing so.

Please leave a comment here if you try the new version (1.0.3), but it doesn't work for you.