[ ] Strategies for Managing Machine Learning Model Metadata and Lineage

Strategies for Managing Machine Learning Model Metadata and Lineage

Snippet

Keeping track of models and their associated metadata.

Discussion

I am starting to accumulate a large number of models for a project I am working on, many of these models are old which I am keeping for archival sake, and many are fine tuned from other models. I am wondering if there is an industry standard way of dealing with this, in particular I am looking for the following:

Information about parameters used to train the model
Datasets used to train the model
Other metadata about the model (i.e. what objects an object detection model trained for)
Model performance
Model lineage (What model was it fine tuned from)
Model progression (Is this model a direct upgrade from some other model, such as being fine tuned from the same model but using better hyper parameters)
Model source (Not sure about this, but I'm thinking some way of linking the model to the python script which was used to train it. Not crucial but something like this would be nice)

Are there any tools of services which could help be achieve some of this functionality? Also, if this is not the sub for this question could I get some pointers in the correct direction. Thanks!

Original Reddit Discussion

Comments

u/fiftyfourseventeen

Weights and biases (wandb)

Gardienss

Aren't you just describing tensorboard with some added metadata ?

u/qalis

MLFlow is built literally for this purpose

Material_Policy6327

MLFlow is what we use

metric_logger

My colleague wrote a blog post on how to use Comet (an experiment tracking solution that does all you said) for object detection use-cases.

Compare Object Detection Models from Torchvision

gdpoc

In addition to wandb, comet, mlflow, and Neptune, (plug) state farm just open sourced a package called ThingStore for general process logging and tracking.

irthomasthomas / undecidability