online-ml / river

🌊 Online machine learning in Python
https://riverml.xyz
BSD 3-Clause "New" or "Revised" License
5.06k stars 543 forks source link

Getting RecursionError when attempting to Pickle model #1358

Open robme-l opened 12 months ago

robme-l commented 12 months ago

As mentioned I get RecursionError: maximum recursion depth exceeded while pickling an object when attempting to pickle a model as per the FAQs (which in this case is actually a pipeline: StandardScaler() | KNNClassifier()). I suspect this may be due to the dist_func parameter of the KNNClassifier but I have not been able to confirm yet.

Furthermore, I am wondering if there is a method similar to clone() that can be used on models except it prints a Python dictionaries of their hyperparameters. Why I ask is sometimes we may not need persistent model weights, but we may need persistent hyperparameters, and having a json-serializeable object of just names and values makes it far easier to load models.

So far something like each model's _get_params() method seems to do the trick, however since some of these models intake objects and functions we run into the same problem as trying to pickle. Advice is appreciated since I am unsure how to proceed.

robme-l commented 12 months ago

Following up with an update, indeed a hackish way I have resorted to saving my model information is using the _get_params() method that is coupled with River's models. Note, saving weights was not important for me but hyper-parameters were, hence this works for me for the time being. I am keeping this issue open in case someone has something else to add.

smastelini commented 12 months ago

Hi @robme-l. Can you share a MWE to speed up the tracking of this potential bug?

MaxHalford commented 12 months ago

Indeed, I'm not able to reproduce this locally. Please provide some code :)

gabrivoy commented 4 months ago

Hi @MaxHalford, hope everything is great with you.

I've faced the same error that @robme-l faced with the pickling process on the KNN/SWINN model. I'm running the example from the documentation and a simple pickling process. I can provide you the following behaviours I've faced:

1st - I've just created and pickled/opened the model:

import functools
import pickle

from river import datasets
from river import evaluate
from river import metrics
from river import neighbors
from river import preprocessing
from river import utils

dataset = datasets.Phishing()

l1_dist = functools.partial(utils.math.minkowski_distance, p=1)
model = (
    preprocessing.StandardScaler() |
    neighbors.KNNClassifier(
        engine=neighbors.SWINN(
            dist_func=l1_dist,
            seed=42
        )
    )
)

with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

with open('model.pkl', 'rb') as f:
    model = pickle.load(f)

model

This works just fine as you can see from the following result:

image

2nd - However, when I train the model using the evaluate.progressive_val_score(...) function, the recursion error happens:

import functools
import pickle

from river import datasets
from river import evaluate
from river import metrics
from river import neighbors
from river import preprocessing
from river import utils

dataset = datasets.Phishing()

l1_dist = functools.partial(utils.math.minkowski_distance, p=1)
model = (
    preprocessing.StandardScaler() |
    neighbors.KNNClassifier(
        engine=neighbors.SWINN(
            dist_func=l1_dist,
            seed=42
        )
    )
)

evaluate.progressive_val_score(dataset, model, metrics.Accuracy())

with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

with open('model.pkl', 'rb') as f:
    model = pickle.load(f)

model

Leading to the following error:

image

For reference, I'm using Python 3.11.8 and River 0.19.0 (because I needed compatibility with pandas version 1.x and River 0.20+ needs pandas 2.x+).

smastelini commented 4 months ago

Hi @gabrivoy, the fix for the recursion error was released in 0.21.1 and was indeed a problem.

Unfortunately, I don't have a solution for this problem concerning your setup and pandas. 😞