predict-idlab / powershap

A power-full Shapley feature selection method.
Other
193 stars 18 forks source link

Can't Pickle #19

Closed jmrichardson closed 2 years ago

jmrichardson commented 2 years ago

Hi,

With version 0.0.7 I am receiving the following error when trying to pickle (ie joblib):

_pickle.PicklingError: Can't pickle <function PowerShap.__init__.<locals>._infinite_splitter.<locals>.split at 0x0000020167F551F0>: it's not found as powershap.powershap.PowerShap.__init__.<locals>._infinite_splitter.<locals>.split

Powershap work's great, it's just saving the fit for future use with pickle that is the problem. I have to use pickle as it's part of joblib which is caching my functions using "memory".

Here's a reproduceable example:

        self.selector = PowerShap(model=CatBoostClassifier(n_estimators=250, verbose=0, use_best_model=True), cv=cv, verbose=True)
        self.selector.fit(X, y)
        import joblib
        joblib.dump(self, "self.job")

Thanks in advance for any help.

*** Update: this is not a major issue. Worked around the problem by storing the feature names vs the fit and happy to close issue if this is not something that is a priority.

jvdd commented 2 years ago

Hi @jmrichardson,

I think joblib / pickle is limited as it cannot serialize functions / closures that are defined in local scope (as we do in the constructor). A solution for this is to use dill instead of pickle / joblib to serialize your feature selector.

In [9]: import dill

In [10]: with open("test.pkl", "wb") as f:
    ...:     dill.dump(selector, f, recurse=True)

FYI: this is what we do under the hood in tsflex (see here). I might add a .serialize() method to PowerShap as well

jmrichardson commented 2 years ago

Hi @jvdd ,

Yes, thank you, I did see that Dill is a solution for this problem. Unfortunately, my framework uses joblib's memory to cache my class methods which uses pickle under the hood. However, I was able to work around the problem not caching the fit and instead caching the transform feature columns. I am going to close this issue as I don't see this as a powershap issue.

Tsflex is next on my list to checkout in the future. I really appreciate the libraries you offer and keep up the great work! :)