Closed casperdcl closed 1 year ago
I think we're still waiting to see if repro
is going to be deprecated in an upcoming release.
Rel https://github.com/iterative/dvc/issues/7866#issuecomment-1151842420
Nothing in use-cases/experiment-tracking nor user-guide/experiment-management seems to tell existing dvc repro users why they should bother with/what are the use cases of dvc exp.
We do mention exp run
vs. repro
specifically in several places like https://dvc.org/doc/user-guide/experiment-management/experiments-overview#basic-workflow, https://dvc.org/doc/user-guide/experiment-management/running-experiments#running-the-pipelines, and https://dvc.org/doc/command-reference/exp/run.
None of those links make it remotely clear what the difference is.
The closest near-miss to being potentially helpful is:
đź“– dvc exp run is an experiment-specific alternative to dvc repro.
What are the use cases? When would you use one over another? Are there any examples? Does the description meaningfully reduce a confused user's frustration?
very few users want to be using software. Instead, they want to do the things that software enables. [...] Users don’t want to buy your software, and they don’t want to read your documentation—they just want to have their problems solved
and http://mkremins.github.io/blog/doors-headaches-intellectual-need/
A hammer (numerous dvc
subcommands) seems pointless if you’ve never seen a nail (what are the different problems?)
I think, I missing the point of the question, or I also have some bias.
exp
is captured repro. exp
enables a higher lever use case of "experiments" on top of some low level building blocks like pipelines (including repro), etc. Do we need a separate command like dvc repro
- I don't know. I don't like it personally "aesthetically" (that it's disconnected from dvc stage
, that it overlaps with exp
, etc). I also don't like dvc run
that is hopefully will be replaced finally with dvc stage add
. But it feels that some low level "make"-file like interface has its place.
Can I come up with a use case where dvc exp run
won't solve the problem? Don't know tbh, feels like no, so again it will be only some aesthetics, or some edge cases. May be some automation, when it's clear that you don't want to deal with some overhead (no matter how small it is) of the dvc exp run
. May be we can rename it to dvc stage run --all
to make it cleaner.
Nothing in use-cases/experiment-tracking nor user-guide/experiment-management seems to tell existing dvc repro users why they should bother with/what are the use cases of dvc exp.
the whole point was not to complicate this and not bother users of dvc exp
with low level details like dvc repro
- why should they care? why do you think it's important for people who come to experiments to know about some strange alternative?
It doesn't seem clear to users what's the difference between stage/repro (i.e. pipelines) and exp (i.e. experiments).
as I mentioned, what you call pipelines is just one of the building blocks for experiments
Should there be a page clearly describing the difference between stages and experiements?
I can only see it from the perspective of a single command (repro vs exp run), what else? stage add
does not compete at all with experiments.
In case I wasn't clear earlier: I also wish this topic was clearer, but there's ambiguity in the product itself, and the docs are reflecting that. Deprecating repro
or even exp
is constantly chattered about, for example. @casperdcl do you have a suggestion on how to clarify this?
exp is captured repro low level "make"-file like interface
I like this. exp
builds on top of repro
and the latter becomes more of a "helper" (kind of how we expose fetch
even when it's part of pull
). Good notes for the cmd ref as @shcheklein points out.
why do you think it's important for people who come to experiments to know about some strange alternative?
Yes, we consciously decided not to do this. In fact we have a pending task to remove all or most "pipeline" info from https://dvc.org/doc/user-guide/experiment-management/running-experiments (see https://github.com/iterative/dvc.org/issues/2768).
CLI discussion at https://github.com/iterative/dvc/issues/7866 is a prerequisite to docs.
These two clarification points I've found in various places (the latter one from @SoyGema) have been very useful for me as a user:
exp
produce a git ref, that is how it stores its state.Some additional feedback.
From @mvshmakov:
We’ve recently discovered that dvc repro is not really suitable for CI if the user wants live experiments in Studio to be enabled. As dvc repro does not create a new experiment, we don’t log params to the Studio, thus the experiment will be displayed only partially.
From https://discord.com/channels/485586884165107732/1065577177007018015/1065630078668648458:
I guess I was confused because when I checked the difference in docu, dvc exp run has the comment "Provides a way to execute and track experiments in your project without polluting it with unnecessary commits, branches, directories, etc." so I thought dvc exp is only "experimental" mode for stuff I don't want to have tracked (which I wanted). A remark about legacy in dvc run docs could be preventing further newbies like me asking stupid questions 🙂
Some features often underused/misunderstood/unknown could be helped by better docs/messaging/onboarding clarity.
stage
s andexp
eriements?Nothing in use-cases/experiment-tracking nor user-guide/experiment-management seems to tell existing
dvc repro
users why they should bother with/what are the use cases ofdvc exp
.It doesn't seem clear to users what's the difference between
stage/repro
(i.e. pipelines) andexp
(i.e. experiments).