Open tibor-mach opened 1 year ago
@tibor-mach what is the high level scenario that we are trying to solve here - what is the sequence of actions that leads to a confusing results in Studio? Or is it about allowing multiple models at the same time in the same stage?
@shcheklein I am thinking mostly about auditability here.
Currently there are no persistent ways of making GTO behave one way or another. By default it is assumed that there cannot be more than a single version assigned to a stage at the same time. But the default is only materialised in Studio, otherwise it depends on how you call gto show.
I am imagining a scenario where you have multiple projects in an organisation, some of the projects decide to organise the model registry in a way where multiple model versions can be assigned to the same stage at the same time (it might even make sense for some use-cases), others use the default.
Without explicit stage un-assignments it is not clear from git whether the defaults are used or whether people actually intended to have more models in the same stage. You basically have to ask them.
If there was an option in .gto
to specify how gto show works, then specifying this settings could be enforced across the organisation and you could always tell from that config that this and that repo uses the approach with multiple models per stage. You could then also have Studio react to that and visualise things in accord.
Like I said, I think it is a minor issue but it makes the gto model registry slightly ambiguous in these scenarios.
One of the most powerful features of GTO is the ability to audit model lifecycle and have its history tightly coupled with your git repository.
But the result of GTO is in a sense not completely immutable right now since changing the parameters of
gto show
can lead to different interpretations of the model lifecycle history.In studio we only allow the default now and most users will probably use that but I think it would be good if we allowed users to make the default explicit and also to be able to switch to a non-default (with potentially multiple models per stage) and make it explicit.
The use-case is probably best illustrated with the following two images:
Both show a simple git history with two model versions. In the default GTO settings they produce the same history as in the following picture, but in the non-default only the second one corresponds to this history.
This creates some uncertainty in auditing. What did the author of the repository intend? In an organization I might want to make sure everyone follows the same standard.
So I propose to have
gto show
first check.gto
for settings. Then you can make it explicit to everyone which setup is used. If nothing is specified in.gto
I would still fall back to the current default. If you specify parameters manually when runninggto show
then the.gto
config would be ignored. But in case you are doing an audit and want to really make sure things are done in a standardized way, you would be able to not just specify standard stages in.gto
but also a standard "model registry mode".At the same time I think the default is reasonable enough and most people won't even know about there being alternatives so I think this is a nice to have feature. Still, if you think it is not a waste of time I would try take it and create a contribution (although a slow moving one probably, given the relatively low priority)