univieCUBE / deepnog

Protein orthologous group assignment with deep learning
BSD 3-Clause "New" or "Revised" License
26 stars 8 forks source link

Model versioning #17

Open VarIr opened 4 years ago

VarIr commented 4 years ago

Currently, deepnog ships one model per eggnog level and network architecture. If we ever decide to retrain certain models, users need to individually come up with strategies to tell models apart, or use a specific model (e.g., for reproducibility), such as manually moving files around, renaming accordingly, etc. Retraining, however, could sometimes make sense. For example, we might want to use different data splits, increase the share of training sequences compared to test sequences to squeeze a little more performance out of the model.

We should at least introduce some versioning, model identifiers, etc., that are stored with the model. Could be a simple string inside the model_dict. This could even be "backported" to existing models.

Ideally, automatic model download should also be version-aware. Currently, a user that already has downloaded a model will not receive any updated model.

VarIr commented 3 years ago

To summarize some key points of the recent discussion:

Models will receive a metadata field that holds the following information,

Model filenames obtain a version hint, e.g. the date, or v1, v2, etc., and a "latest" pointer to the most up-to-date version.

The client subcommand deepnog infer will use a use_latest boolean flag to use the latest model (otherwise, the one currently installed). A warning/info could be issued to users, when new models are available.