Properly restore encoders defined and saved in notebooks

qdrant / quaterion

Blazing fast framework for fine-tuning similarity learning models

https://quaterion.qdrant.tech/

Apache License 2.0

642 stars 45 forks source link

Properly restore encoders defined and saved in notebooks #38

Open monatis opened 2 years ago

monatis commented 2 years ago

Needs attension and discussion. It's particularly important when users work in places such as Colab.

monatis commented 2 years ago

Instead of just saving the names of the module and the class to import them, what about using dill to pickle the class directly?

WDYT? @generall and @joein

generall commented 2 years ago

Just to clarify the behavior we want to achieve:

define all classes, encoders, trainable models in Colab
Use save_servable after training
Be able to restore from the same colab afterwards

Is that correct?

monatis commented 2 years ago

Not from the same notebook actually. My initial consideration was defining encoders in notebooks, e.g., on Colab, training models, saving servable and then using it elsewhere outside the notebook. But it does not seem to be that easy in either way.

monatis commented 2 years ago

Another idea might be giving users a simple utility to create a boilerplate, e.g., quaterion new project-name may generate a basic template with dependencies defined, encoders.py, training.py, inference.py, notebook.ipynb etc. This may help users structure their projects correctly, and make experiments and inference quickly.

This may sound a little bit overkill, but documenting the correct project structure, emphasizing its importance and answering the questions about problems issues in the future may be much more difficult.

generall commented 2 years ago

cookie-cutter template is a good idea, actually. I like it

monatis commented 2 years ago

It also makes a good competitive advantage to similar projects, and easily reproduceable projects may help accelerate the adoption.

Raising a separate issue for this.

monatis commented 2 years ago

I guess we can do something with cell magics for this issue.

Other alternatives such as class serialization etc. are neither reliable nor safe.