iterative / dvc

🦉 Data Versioning and ML Experiments
https://dvc.org
Apache License 2.0
13.96k stars 1.19k forks source link

`dvc exp list` (and other commands): always show full unique experiment hash #9485

Open shcheklein opened 1 year ago

shcheklein commented 1 year ago

I hit this issue with duplicated experiments (don't know yet why, any ideas? did we change the logic behind the naming recently @daavoo @dberenbaum ?):

@shcheklein ➜ /workspaces/ultralytics (dvclive) $ dvc exp list -A
21122ee:                                                              
        armed-fuzz
        testy-leno
d0f99f9:
        level-tugs
        mothy-jeer
        weepy-haet
        armed-fuzz
        testy-leno
        nosed-mold
dvclive:
        testy-leno
        withy-keek
5a2d27b:
        nosed-mold
        armed-fuzz
        testy-leno
        withy-keek
fd8435a:
        testy-leno
@shcheklein ➜ /workspaces/ultralytics (dvclive) $ dvc exp remove armed-fuzz
ERROR: Ambiguous name 'armed-fuzz' refers to multiple experiments. Use one of the following full refnames instead:

        refs/exps/5a/2d27b1aeb5575916e3480b1b77900b155002a1/armed-fuzz
        refs/exps/d0/f99f94555e5488a2f64cd7680b2419d921eae6/armed-fuzz
        refs/exps/21/122eefcbe711de67eea087c4f93e0566ce3dbd/armed-fuzz

dvc exp show -A doesn't show all commits in this case (can be orphaned commits and experiments in this case - I'm doing a lot of rebases, etc).

There is not easy way to disambiguate those from the listing (I don't know how to remove a specific one under a specific commit).

We should always everywhere show a Git hash for an experiment. Name should serve as tag, or as a human readable name, etc. Collisions should be fine.

daavoo commented 1 year ago

did we change the logic behind the naming recently

Name duplications (the pair of name, baseline_sha is not allowed to collide), started to be allowed when we introduced the human-readable names.

daavoo commented 1 year ago

dvc exp show -A doesn't show all commits in this case (can be orphaned commits and experiments in this case - I'm doing a lot of rebases, etc).

There is not easy way to disambiguate those from the listing (I don't know how to remove a specific one under a specific commit).|

TBH, I was not aware that orphan experiments were displayed in exp list

Apart from this issue (which I think makes sense to show the sha), we are deprecating exp gc so maybe we need something like exp remove --orphan.

dberenbaum commented 1 year ago

Name duplications (the pair of name, baseline_sha is not allowed to collide), started to be allowed when we introduced the human-readable names.

I think exp names should be unique (see #9361). It solves a lot of these problems and I don't see much downside. Is it important to be able to give two experiments with different baselines the same name?

Apart from this issue (which I think makes sense to show the sha), we are deprecating exp gc so maybe we need something like exp remove --orphan.

It could also be part of exp clean. I would rather avoid having to explain the whole concept of orphaned refs if we can.

dberenbaum commented 1 year ago

We should always everywhere show a Git hash for an experiment.

From a quick look at the code, it appears to be simple for the local repo, but not sure how easy it is to do for remote repos.

pmrowla commented 1 year ago

It's not straightforward to determine what experiments are "orphaned". By definition, exp commits are never orphaned since there is at least one git ref pointing to the commit (i.e. our exp ref). We could walk backwards from every git tag and branch head to get a list of commits and then clean exps that derive from other commits, but there could be other non-branch/tag refs pointing to those commits as well.

In particular, with remote git repos there could be things like github pull request or deleted branch refs pointing to those commits, in which case it is not clear whether or not we should be deleting exps that derive from those commits.

daavoo commented 1 year ago

@pmrowla could we do the equivalent to exp gc with all options enabled? That would be quite conservative, right?

pmrowla commented 1 year ago

Yes we can do that, but gc --all-commits has the same problem I was describing.

DVC's --all-commits is not actually all non-orphaned commits in the git repo. It's just commits that are reachable from git branch heads and tags (which again does not include commits that may be reachable from things like github PR's or backups of deleted github branches). --all-commits is good enough for the majority of users in the local use case, but may not be what we want when we get into remote repos and exps that have been shared via exp push

dberenbaum commented 1 year ago

If we think this gives enough reason to keep dvc exp gc, I'm fine to reverse and keep it.