neptune-ai / neptune-sklearn

Experiment tracking for scikit-learn. 🧩 Log, organize, visualize and compare model metrics, parameters, dataset versions, and more.
https://docs.neptune.ai/integrations/sklearn/
Apache License 2.0
6 stars 2 forks source link

Hande AssertionError in create regression summary #20

Closed radlfabs closed 10 months ago

radlfabs commented 12 months ago

Hey guys! In 'create_regressor_summary' the regressor is pickled. https://github.com/neptune-ai/neptune-sklearn/blob/9ad683cc1b3b3c64237fc593e52569df4dd23d96/src/neptune_sklearn/impl/__init__.py#L138 In 'get_pickled_model' the regressor is checked if it is an sklearn model. https://github.com/neptune-ai/neptune-sklearn/blob/9ad683cc1b3b3c64237fc593e52569df4dd23d96/src/neptune_sklearn/impl/__init__.py#L335 Handling an AssertioNErrorfor the call or accepting a kwarg "log_regressor" in create_regressor_summary would allow to use the create_regressor_summary with other regression models and in many more contexts. That would be useful in cases where the regressor summary function would just generate useful plots but the class does not match the sklearn API perfectly. A summary function could well be separated from the check for sklearn API. Also by the name of the function I would not suspect it to pickle the model by default without a possible prevention by the caller. Is there any other reason why the function should check for is_estimator that I have not thought of or could it be made optional?

SiddhantSadangi commented 11 months ago

Hey @radlfabs 👋

Thanks for pointing this out. What you've mentioned definitely makes sense, and I am checking with the wider team if we can expand get_pickled_model() to accept non-sklearn models/objects too.

I'll keep you updated and reach out in case we need any further details. 🤝

SiddhantSadangi commented 10 months ago

Hey @radlfabs,

I am looking into this feature request. I noticed that while you have pointed out only get_pickled_model(), create_regressor_summary() itself checks for the estimator to be a scikit-learn regressor in L133: https://github.com/neptune-ai/neptune-sklearn/blob/9ad683cc1b3b3c64237fc593e52569df4dd23d96/src/neptune_sklearn/impl/__init__.py#L133

As such, using create_regressor_summary() on non-scikit-learn-like models won't work. If you just want to upload the model as a pickle file, you can use Neptune's File.as_pickle() method to do the same. get_pickled_model() itself uses this method, so the end-result would be the same, minus the assertion regarding the estimator type.

Would this work for you?

radlfabs commented 10 months ago

Hi @SiddhantSadangi thanks for looking into this request! The use case is a method comparison loop, where the user is able to pass a couple of models and each model's summary and plots would be logged with the sklearn integration.

I'm saving models as pickle or as json (e.g. for xgboost) in a different module, i.e. with the File.as_pickle() solution that you mentioned. But pickling the model is not what I would use create_regressor_summary() for in my case. I am after using this function only for plotting and summary purposes.

So where I run into problems is with models, that behave like sklearn-regressors most of the time but won't pass the is_regressor assert statement, or which are not serializable by as_pickle(). Still, it would be nice to be able to use the existing plotting functionality in those cases.

What would help my use case a lot, would be making the logging of the model (and the assert is_regressor()) optional, e.g. by using a keyword argument as a user flag that defaults to logging the model and which optionally deactivates the model pickling. Therefore, existing code won't break and folks wo just want to use the function for plots would be able to do so.

I really appreciate your time and thoughts on that matter! Best wishes! 👍

SiddhantSadangi commented 10 months ago

Thanks for the quick response!

From what I understand, you are not really interested in get_pickled_model(), but rather in create_regressor_summary()'s plotting and summarization functions?

If that is the case, you can completely avoid using create_regressor_summary() (and consequently, avoid the is_regressor() asssertion) , and just use the individual plotting and helper methods directly. All of those functions are public and documented: https://docs.neptune.ai/api/sklearn/

This might involve adding a few more lines to your code but will give you the flexibility of logging exactly what you want and where you want it.

Would this be feasible for you?

radlfabs commented 10 months ago

Hey Siddhant, thanks for the help on that topic! You're correct with your understanding and that's a great hint to solve my purpose. I think this is totally fine and we can close this 😊