Open mstrupp opened 1 year ago
Hi @mstrupp , could you try upgrading to the latest DVC version?
Hi @daavoo, thank you for the response. I upgraded dvc but the problem still exists.
$ dvc doctor
DVC version: 2.51.0 (pip)
-------------------------
Platform: Python 3.10.8 on Windows-10-10.0.19045-SP0
Subprojects:
dvc_data = 0.44.1
dvc_objects = 0.21.1
dvc_render = 0.3.1
dvc_task = 0.2.0
scmrepo = 0.1.17
Supports:
http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: NTFS on C:\
Repo: dvc, git
Repo.site_cache_dir: C:\ProgramData\iterative\dvc\Cache\repo\5db899e06b13bbca5a630f6ac0c2cbfd
The workaround here would be to remove the exec ref with
git update-ref -d refs/exps/exec/EXEC_BASELINE
The issue is that we have logic to account for when HEAD
has moved during experiment execution, where exp show
will then show experiments derived from EXEC_BASELINE
instead of HEAD
. We could consider updating the logic to check and see if there is also an active workspace run (and cleanup the ref when there is not), but this would also introduced additional overhead into every dvc command that uses resolve_rev
.
@pmrowla Is it needed for anything besides exp list
and exp show
? Can we do it only in those commands?
@dberenbaum it's needed for every DVC command that has any kind of parameter that can be set to (or defaults to) HEAD
(so any diff/show command)
Should also note that if we drop checkpoints support we could also consider just dropping this behavior as well. HEAD
is still moved for regular experiments but we restore it shortly afterwards when the experiment run ends. The main issue here is that for checkpoints, HEAD
is moved to the most recently generated checkpoint commit. (We may not actually be able to drop this entirely though since tools like vscode could still try to run DVC commands before HEAD
is restored at the end of a regular exp run)
Thanks for the suggested workaround @pmrowla.
Unfortunaly, the user doesn't realize when the problem occurs and the workaround should be applied. DVC happily shows the experiments before EXEC_BASELINE
. The user expects to see the new experiments but never realizes why they are not shown.
Bug Report
Description
When the terminal is killed while
dvc exp run
is executing, the ref.git/refs/exps/exec/EXEC_BASELINE
is not removed. Then when a git commit is made, git might pack the references to optimize performance. Now,dvc exp list
is stuck with the list of experiments before the commit and will not update when new experiments are run.This also affects the experiments table in the vscode extension.
Reproduce
git init
dvc init
dvc stage add -n prepare -d prepare.py python prepare.py
time.sleep(10)
)git add .
git commit -m "commit 1"
dvc exp run
git add .
git commit -m "commit 2"
git pack-refs --all
: when committing, git sometimes does "git pack-refs" for optimization. It can happen right here. To simulate the automatic packing, run git pack-refs --alldvc exp run
dvc exp list
Expected
dvc exp list
should show the experiment from 13. Instead, it returns nothing. It only shows the experiment withdvc exp list -A
Environment information
Output of
dvc doctor
: