Closed rnoxy closed 11 months ago
Hi @rnoxy. Do you use your artifacts from CLI or from Studio? We made Studio support both formats (old and new), but for CLI I was assuming you either use old GTO CLI or new DVC API, not trying to do both at once.
There is this script that moves annotations from artifacts.yaml
to dvc.yaml
, but to move annotations for existing commits you'll need to rewrite git repo history, which will be a complex task.
If you need to work with both old and new formats, I guess the easiest option would be to use some kind of try...except
construction, trying to use new format, and if annotation doesn't exist, fall back to the old annotation.
Hope this is helpful. You can also ask other GTO users that participated in https://github.com/iterative/gto/issues/337, they may have better ideas how to handle that.
Hi @aguschin, I am using pure GTO with DVC by CLI and recently with Python API, only. We use DVC only for data version control, not for experiments, pipelines, ... We do not have Studio, as well.
I really liked the approach with gto describe
and gto annotate
.
I do not understand why such commands were removed.
I think I can implement them by my own with Python git.Repo
.
For example, here is some first approach, which searches for artifacts.yaml
and dvc.yaml
def _get_dvc_artifact_path(fs: DVCFileSystem, artifact_name: str) -> Optional[str]:
"""
Load the artifacts YAML file from the DVC repository and return the path to the artifact.
This method is compatible with GTO v0.2.x (artifacts.yaml) and v0.3.x (dvc.yaml)
format of the artifacts YAML file.
In case of any error (e.g. the file is not found, the artifact is not found, etc.)
None is returned.
Args:
fs: The DVCFileSystem object
artifact_name: The name of the artifact to load.
"""
import yaml
from dvc.scm import RevError
for artifacts_filename in ["artifacts.yaml", "dvc.yaml"]:
try:
with fs.open(artifacts_filename) as f:
artifacts = yaml.safe_load(f)
# In `artifacts.yaml` the artifacts are at the root level
# In `dvc.yaml` the artifacts are under the "artifacts" key
if artifacts_filename == "dvc.yaml":
artifacts = artifacts["artifacts"]
return artifacts[artifact_name].get("path")
except (KeyError, FileNotFoundError):
pass
except RevError:
break
return None
Assume we have a repo with artifacts registered with GTO
v0.2.x
(with fileartifacts.yaml
). After upgrading GTO tov0.3.x
one should change theartifacts.yaml
todvc.yaml
and start usingdvc.api
(in Python) in order to getgto describe
orgto annotate
functionalities.The question is, how to use new
dvc.api
with old artifacts, registered with GTOv0.2.x
. The problem is thatdvc.api
expects the filedvc.yaml
in therepo.Index
.How to migrate all registered artifacts? Shall we rebase all commits and
re-register
the artifacts again?Any script for this process would be helpful.