online-ml / river

🌊 Online machine learning in Python
https://riverml.xyz
BSD 3-Clause "New" or "Revised" License
5.09k stars 552 forks source link

model.to_dict()? #856

Closed vsoch closed 2 years ago

vsoch commented 2 years ago

Hiya! I am adding a view to quickly return some summary for a model, and I'm wondering if there could be some kind of model.to_dict(). E.g., it looks like some basic view of this could be unwrapped?

In [5]: dict(model.steps)
Out[5]: 
{'StandardScaler': StandardScaler (
   with_std=True
 ),
 'LinearRegression': LinearRegression (
   optimizer=SGD (
     lr=Constant (
       learning_rate=0.01
     )
   )
   loss=Squared ()
   l2=0.
   intercept_init=0.
   intercept_lr=Constant (
     learning_rate=0.01
   )
   clip_gradient=1e+12
   initializer=Zeros ()
 )}

And this is different from an endpoint to download the model itself.

MaxHalford commented 2 years ago

Hey there. So every River object (not just estimators) have a _get_params and a _set_params method. Meaning you can do this:

from river import compose
from river import linear_model
from river import metrics
from river import preprocessing

model = compose.Pipeline(
    preprocessing.StandardScaler(),
    linear_model.LogisticRegression()
)

params = model._get_params()
model._set_params(params)

This is what params looks like:

{'LogisticRegression': {'clip_gradient': 1000000000000.0,
                        'initializer': (<class 'river.optim.initializers.Zeros'>,
                                        {}),
                        'intercept_init': 0.0,
                        'intercept_lr': (<class 'river.optim.schedulers.Constant'>,
                                         {'learning_rate': 0.01}),
                        'l2': 0.0,
                        'loss': (<class 'river.optim.losses.Log'>,
                                 {'weight_neg': 1.0, 'weight_pos': 1.0}),
                        'optimizer': (<class 'river.optim.sgd.SGD'>,
                                      {'lr': (<class 'river.optim.schedulers.Constant'>,
                                              {'learning_rate': 0.01})})},
 'StandardScaler': {'with_std': True}}

I don't think you can use this directly in an API route because the classes are not JSON serializable. But you could get around this by working with the __name__ attribute.

Let me know if I can do anything else for you. I checked out the project you're working on, it looks great. I love that it's a plugin to Django.

vsoch commented 2 years ago

That worked like a charm! :partying_face:

Created model adorable-platanos-1903
{
    "StandardScaler": {
        "with_std": true
    },
    "LinearRegression": {
        "optimizer": [
            "SGD",
            {
                "lr": [
                    "Constant",
                    {
                        "learning_rate": 0.01
                    }
                ]
            }
        ],
        "loss": [
            "Squared"
        ],
        "l2": 0.0,
        "intercept_init": 0.0,
        "intercept_lr": [
            "Constant",
            {
                "learning_rate": 0.01
            }
        ],
        "clip_gradient": 1000000000000.0,
        "initializer": [
            "Zeros"
        ]
    }
}

I think that's probably sufficient for "show me a json dump of the model" and really anyone that wants further detail can download it directly.

Let me know if I can do anything else for you. I checked out the project you're working on, it looks great. I love that it's a plugin to Django.

Will do! I'm working on it little bits in the evenings, should have some time tonight and push new changes.

Thanks for your help!