`example-dvc-experiments`: Improvements

iterative / example-repos-dev

Source code and generator scripts for example DVC projects

https://dvc.org/doc

21 stars 13 forks source link

`example-dvc-experiments`: Improvements #85

Closed iesahin closed 1 year ago

iesahin commented 3 years ago

What do we need to make https://github.com/iterative/example-dvc-experiments useful for the Studio?

[x] #86
[x] #83

Originally posted by @shcheklein in https://github.com/iterative/example-repos-dev/issues/79#issuecomment-911902501

dberenbaum commented 3 years ago

Why do metrics need to be cached for Studio?

shcheklein commented 3 years ago

No need them to be cached (for plots and metrics). Model files I would expected to be cached and not gitignored. Intermediate artifacts - depends. Probably it's fine to avoid caching 70K images and ignore them. But we should really fix the DVC performances for this. Repo becomes suboptimal, artificial because of the limitations that we have.

shcheklein commented 3 years ago

I would rename the ticket also - it's not about Studio, for any scenario I would expect model files to be DVC-tracked (and may be some important intermediate artifacts).

shcheklein commented 2 years ago

In the code, I see some dead code:

# print(f"Training Dataset Shape: {training_images.shape}")
    # print(f"Testing Dataset Shape: {testing_images.shape}")
    # print(f"Training Labels: {training_labels}")
    # print(f"Testing Labels: {testing_labels}")

we should not keep dead code around. Also let's run linters, and the other regular tools to keep it clean please.

dberenbaum commented 2 years ago

Is there a reason that data/images has cache: false in dvc.yaml? Was caching the output causing some issue?

iesahin commented 2 years ago

data/images is extracted from data/images.tar.gz and contains 70K small files. dvc push/pull takes considerable time when we cache it. (We might need "cache but don't send to remote" setting for files and dirs, but that's a separate discussion.)

dberenbaum commented 2 years ago

We might need "cache but don't send to remote" setting for files and dirs, but that's a separate discussion.

That's in https://github.com/iterative/dvc/issues/2095 and probably is pretty easy now that we have https://github.com/iterative/dvc/pull/6486. We just need a way to specify "none."