Closed iesahin closed 1 year ago
Why do metrics need to be cached for Studio?
No need them to be cached (for plots and metrics). Model files I would expected to be cached and not gitignored. Intermediate artifacts - depends. Probably it's fine to avoid caching 70K images and ignore them. But we should really fix the DVC performances for this. Repo becomes suboptimal, artificial because of the limitations that we have.
I would rename the ticket also - it's not about Studio, for any scenario I would expect model files to be DVC-tracked (and may be some important intermediate artifacts).
In the code, I see some dead code:
# print(f"Training Dataset Shape: {training_images.shape}")
# print(f"Testing Dataset Shape: {testing_images.shape}")
# print(f"Training Labels: {training_labels}")
# print(f"Testing Labels: {testing_labels}")
we should not keep dead code around. Also let's run linters, and the other regular tools to keep it clean please.
Is there a reason that data/images
has cache: false
in dvc.yaml
? Was caching the output causing some issue?
data/images
is extracted from data/images.tar.gz
and contains 70K small files. dvc push/pull
takes considerable time when we cache it. (We might need "cache but don't send to remote" setting for files and dirs, but that's a separate discussion.)
We might need "cache but don't send to remote" setting for files and dirs, but that's a separate discussion.
That's in https://github.com/iterative/dvc/issues/2095 and probably is pretty easy now that we have https://github.com/iterative/dvc/pull/6486. We just need a way to specify "none."
What do we need to make https://github.com/iterative/example-dvc-experiments useful for the Studio?
Originally posted by @shcheklein in https://github.com/iterative/example-repos-dev/issues/79#issuecomment-911902501