Closed dberenbaum closed 2 years ago
Thanks for your feedback, @dberenbaum!
I think we can do that. One question: let's suppose you have some artifacts: one of type "model", another of type "dataset" and several without any type. You could expect to list models with gto show --type model
or even gto show model
, but how would you expect listing artifacts without type?
My take is that they should be defined by names. It makes git tags look beautiful. Besides that, I never seen git tags like path/to/something@v1.0.0
. Finally, from technical perspective, git tags that contain slashes /
may cause some issues (I would add such a restriction on artifact names, but it's not in place right now).
The downside of using names is that you cannot take existing commit and add a path to the artifact, if you didn't commit artifacts.yaml
in advance. The question is do you need this path at all? You could hardcode it in your CI as a workaround.
2a. Path could not required for virtual artifact, I agree. WDYT about these examples?
$ gto add rf --type model # (no path) adds enrichment for virtual artifact
$ gto add rf --path path/to/it # (with path) adds enrichment for virtual artifact
$ gto add rf --path path/to/it --must-exist # (with path) checks that path is committed, then adds enrichment for NON-virtual artifact
^ by default we assume artifact is virtual, until you say it's not. Then only we verify existence.
other option is to just check if path is supplied, check that path is committed, and put virtual
flag automatically:
$ gto add rf --path path/to/it # if path exists, then write it's not a virtual artifact. If don't exists, write it's a virtual.
It's a technical issue, I'll resolve that, #84
Sounds reasonable, let's call them labels
.
cc @dmpetrov
1. I think we can do that. One question: let's suppose you have some artifacts: one of type "model", another of type "dataset" and several without any type. You could expect to list models with `gto show --type model` or even `gto show model`, but how would you expect listing artifacts without type?
I think gto show
should include those artifacts but gto show --type
should always exclude them.
2. My take is that they should be defined by names. It makes git tags look beautiful. Besides that, I never seen git tags like `path/to/something@v1.0.0`. Finally, from technical perspective, git tags that contain slashes `/` may cause some issues (I would add such a restriction on artifact names, but it's not in place right now). The downside of using names is that you cannot take existing commit and add a path to the artifact, if you didn't commit `artifacts.yaml` in advance. The question is do you need this path at all? You could hardcode it in your CI as a workaround.
I agree that the workflow of having to make an extra commit to start tracking an artifact path feels awkward. It reminds me of the mlflow experiment tracking problem where you need to log the commit as part of your run and then save the results of the run in a separate commit.
Should the commit history of artifacts.yaml
matter? What if you always use artifacts.yaml
from the current workspace to describe the model? This makes it easy to update paths, descriptions, etc. without having to change the commit history. This is similar to how dvc plots
works, where dvc uses the git history to collect the data but renders the plots based on the current plot configuration. The only downside to this approach has been that it can make things slightly confusing in Studio, where there's no current workspace (but often the head of the default branch is sufficient).
2a. Path could not required for virtual artifact, I agree. WDYT about these examples?
$ gto add rf --type model # (no path) adds enrichment for virtual artifact
I think allowing this is probably enough for now. The other ideas about how to handle --path
for virtual artifacts seem like they don't change enough to be worth worrying about for now.
Ok, first part of this issue was implemented in #130.
@dberenbaum
Now, about using artifacts.yaml
from HEAD. While it makes sense when you want to display something, as with dvc plots
, or in Studio / GTO CLI (to show path
/description
of the HEAD/latest version first), I'm not sure about how broadly we should extend this idea.
I guess when you call gto show --type model
, we can TAL at the latest registered version or HEAD. But that doesn't mean when you want to, let's say, deploy your model from HEAD^10, you can take path
from HEAD - it may changed already.
One more question. If you do gto history --type model
and you have an artifact that was of type cnn
in some commits and model
in others, do you expect to see part of the history when it was model
? Or do you expect to see all its history? Do your expectations change if it's model
in the latest version (or HEAD), or if it's a cnn
?
Thanks, @aguschin. Let's forget about using artifacts.yaml
from HEAD for now. I was thinking it could be useful for certain ephemeral info like descriptions, but I don't think it's necessary and probably adds more confusion for now.
@dmpetrov mentioned the ability to register a default model/dataset without a name. Is that implemented, and how does/will that work? Is it gto register model
?
I'm not clear on whether that causes any issues with the suggested changes here to make names the unique artifact keys and make types optional.
Almost, but it's not artifact "type=model without name", instead it's "name=model without type". Which will be treated specially. Basically, Studio will join output of gto show --type model
and gto show --name model
to show in Model Registry.
(Just want to note here that we have related discussion going on in #127)
Okay, let's close and discuss over there
Just found this. Interesting discussion
I never seen git tags like path/to/something@v1.0.0. Finally, from technical perspective, git tags that contain slashes / may cause some issues
You can use hierarchical tags. In fact I think that that at least GTO tags should at least be grouped into gto/alias@version
format. This could be useful for other systems to avoid clutter when displaying Git tags (hypothetically).
Then we can recommend that the entire refs/gto/
dir. gets protected in Github, Gitlab, etc.
Some questions/feedback on
gto add
andartifacts.yaml
:My biggest question is whether the
gto add TYPE NAME PATH
really needs all 3 required arguments. It's a lot to remember and type to add an artifact, and it's not clear why all 3 are required and how each is useful. Related questions:--type
option instead?--virtual
is probably not even necessary since external artifacts could be added without paths.artifacts.yaml
and also under each key in thename
field? Are both needed?One smaller unrelated question: