iterative / gto

🏷️ Git Tag Ops. Turn your Git repository into Artifact Registry or Model Registry.
https://dvc.org/doc/gto
Apache License 2.0
140 stars 16 forks source link

Feature idea: track artifacts (co-)dependencies #260

Open tibor-mach opened 2 years ago

tibor-mach commented 2 years ago

I think it might be quite useful to track dependencies in such a way that e.g. if I want to deploy a new version of artifact foo to prod and foo depends on artifacts bar and baz then GTO will warn me if bar and baz are not yet in prod.

Reasoning:

In a scenario with coupled data and model versions (see here for my reasons for why that might be a good idea) it would be nice to be able to link them together explicitly.

I can look at the dvc pipeline of my foo model in the foo_training repository and see that it depends on foo_data, rev:v0.1.0. I can even add this info to the annotation of foo_model in GTO in the model registry (so that I don't have to keep going back to the model to check). But with this workflow, I would like to make sure I cannot deploy a model to production if it depends on a dataset which is not yet in production (i.e. if it is produced by a data pipeline which does not run in production and consequently, the integration of the model service and data preparation service will fail).

So a cool feature would be if GTO gives me a warning (or prevents me unless I use something like gto assign --force) when I am assigning prod (or any specified reserved stage name) to the model, unless its dataset dependency is also in prod.

aguschin commented 1 year ago

Potential feature for Studio, I think.