Open robme-l opened 12 months ago
Following up with an update, indeed a hackish way I have resorted to saving my model information is using the _get_params()
method that is coupled with River's models. Note, saving weights was not important for me but hyper-parameters were, hence this works for me for the time being. I am keeping this issue open in case someone has something else to add.
Hi @robme-l. Can you share a MWE to speed up the tracking of this potential bug?
Indeed, I'm not able to reproduce this locally. Please provide some code :)
Hi @MaxHalford, hope everything is great with you.
I've faced the same error that @robme-l faced with the pickling process on the KNN/SWINN model. I'm running the example from the documentation and a simple pickling process. I can provide you the following behaviours I've faced:
1st - I've just created and pickled/opened the model:
import functools
import pickle
from river import datasets
from river import evaluate
from river import metrics
from river import neighbors
from river import preprocessing
from river import utils
dataset = datasets.Phishing()
l1_dist = functools.partial(utils.math.minkowski_distance, p=1)
model = (
preprocessing.StandardScaler() |
neighbors.KNNClassifier(
engine=neighbors.SWINN(
dist_func=l1_dist,
seed=42
)
)
)
with open('model.pkl', 'wb') as f:
pickle.dump(model, f)
with open('model.pkl', 'rb') as f:
model = pickle.load(f)
model
This works just fine as you can see from the following result:
2nd - However, when I train the model using the evaluate.progressive_val_score(...)
function, the recursion error happens:
import functools
import pickle
from river import datasets
from river import evaluate
from river import metrics
from river import neighbors
from river import preprocessing
from river import utils
dataset = datasets.Phishing()
l1_dist = functools.partial(utils.math.minkowski_distance, p=1)
model = (
preprocessing.StandardScaler() |
neighbors.KNNClassifier(
engine=neighbors.SWINN(
dist_func=l1_dist,
seed=42
)
)
)
evaluate.progressive_val_score(dataset, model, metrics.Accuracy())
with open('model.pkl', 'wb') as f:
pickle.dump(model, f)
with open('model.pkl', 'rb') as f:
model = pickle.load(f)
model
Leading to the following error:
For reference, I'm using Python 3.11.8 and River 0.19.0 (because I needed compatibility with pandas version 1.x and River 0.20+ needs pandas 2.x+).
Hi @gabrivoy, the fix for the recursion error was released in 0.21.1 and was indeed a problem.
Unfortunately, I don't have a solution for this problem concerning your setup and pandas. 😞
As mentioned I get
RecursionError: maximum recursion depth exceeded while pickling an object
when attempting to pickle a model as per the FAQs (which in this case is actually a pipeline:StandardScaler() | KNNClassifier()
). I suspect this may be due to thedist_func
parameter of theKNNClassifier
but I have not been able to confirm yet.Furthermore, I am wondering if there is a method similar to
clone()
that can be used on models except it prints a Python dictionaries of their hyperparameters. Why I ask is sometimes we may not need persistent model weights, but we may need persistent hyperparameters, and having a json-serializeable object of just names and values makes it far easier to load models.So far something like each model's
_get_params()
method seems to do the trick, however since some of these models intake objects and functions we run into the same problem as trying to pickle. Advice is appreciated since I am unsure how to proceed.