iterative / dvc

🦉 Data Versioning and ML Experiments
https://dvc.org
Apache License 2.0
13.95k stars 1.19k forks source link

`push -a/-A` don't work #10617

Closed ryan-williams closed 2 weeks ago

ryan-williams commented 2 weeks ago

Bug Report

dvc push's -a (all branches) and -A (all commits) flags don't seem to do anything. Both dvc push -a and dvc push -A result in only blobs from the HEAD commit being pushed to my default remote.

Reproduce

# Init `dvc-test` dir and Git repo
mkdir dvc-test
cd dvc-test
git init

# Init DVC with default remote "remote" pointing at local dir `remote/`
dvc init
dvc config core.autostage true
dvc remote add remote "$PWD/remote"
dvc remote default remote
git commit -m 'dvc init'

# Add+Commit a DVC file, leave branch `branch` pointing at this Git commit
echo aaa > 1.txt
dvc add 1.txt
git commit -m 'echo aaa > 1.txt'
git branch branch

# Modify DVC file, commit
echo AAA > 1.txt
dvc add 1.txt
git commit -m 'echo AAA > 1.txt'

# ✅ 2 versions present in local cache
tree .dvc/cache/files/md5
# .dvc/cache/files/md5
# ├── 5c
# │   └── 9597f3c8245907ea71a89d9d39d08e
# └── 88
#     └── 80cd8c1fb402585779766f681b868b
#
# 3 directories, 2 files

# ❌ Push "all commits" (`-A`) to remote, but only latest version of 1.txt is pushed
dvc push -A
# Collecting
# Pushing
# 1 file pushed

# ❌ Confirming: current version of 1.txt was pushed to remote, but previous version was not:
tree remote/files/md5
# remote/files/md5
# └── 88
#     └── 80cd8c1fb402585779766f681b868b
#
# 2 directories, 1 file

# ❌ Push "all branches" (`-a`) no-ops; previous version of 1.txt still missing
dvc push -a
# Collecting
# Pushing
# Everything is up to date.

# Confirming: branch `branch` points at previous commit
git --no-pager log --oneline --decorate
# c61818b (HEAD -> main) echo AAA > 1.txt
# 9efabca (branch) echo aaa > 1.txt
# 8298b00 dvc init

# Confirming: `branch:1.txt` was never pushed
git --no-pager show branch:1.txt.dvc
# outs:
# - md5: 5c9597f3c8245907ea71a89d9d39d08e
#   size: 4
#   hash: md5
#   path: 1.txt

Expected

2 versions of 1.txt should have been pushed to remote, but only the HEAD version was.

Environment information

dvc doctor ``` DVC version: 3.56.0 (brew) -------------------------- Platform: Python 3.13.0 on macOS-15.1-arm64-arm-64bit-Mach-O Subprojects: dvc_data = 3.16.6 dvc_objects = 5.1.0 dvc_render = 1.0.2 dvc_task = 0.40.2 scmrepo = 3.3.8 Supports: azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.19.0), gdrive (pydrive2 = 1.20.0), gs (gcsfs = 2024.10.0), hdfs (fsspec = 2024.10.0, pyarrow = 17.0.0), http (aiohttp = 3.10.10, aiohttp-retry = 2.8.3), https (aiohttp = 3.10.10, aiohttp-retry = 2.8.3), oss (ossfs = 2023.12.0), s3 (s3fs = 2024.10.0, boto3 = 1.35.36), ssh (sshfs = 2024.9.0), webdav (webdav4 = 0.10.0), webdavs (webdav4 = 0.10.0), webhdfs (fsspec = 2024.10.0) Config: Global: /Users/ryan/Library/Application Support/dvc System: /opt/homebrew/share/dvc Cache types: reflink, hardlink, symlink Cache directory: apfs on /dev/disk3s1s1 Caches: local Remotes: local Workspace directory: apfs on /dev/disk3s1s1 Repo: dvc, git Repo.site_cache_dir: /opt/homebrew/var/cache/dvc/repo/387da8edde78dc277f79c374061cb146 ```
shcheklein commented 2 weeks ago

Hmm, I can't reproduce it. I'm getting:

(.venv) √ Projects/dvc-test-10617 % dvc push -A
Collecting                                                                                                                                                                                 |3.00 [00:00, 1.16kentry/s]
Pushing
2 files pushed
(.venv) √ Projects/dvc-test-10617 % tree /tmp/remote-10617
/tmp/remote-10617
└── files
    └── md5
        ├── 5c
        │   └── 9597f3c8245907ea71a89d9d39d08e
        └── 88
            └── 80cd8c1fb402585779766f681b868b
shcheklein commented 2 weeks ago

When you do git commit -m 'dvc init' are you sure you committed all DVC config changes?

Could you also try to run it with DVC installed via pip, and remote outside of $PWD (e.g. /tmp/remote) - just to see if it triggers this behavior somehow.

Also, could you try to drop /opt/homebrew/var/cache/dvc/repo/387da8edde78dc277f79c374061cb146 .

ryan-williams commented 2 weeks ago

Here is a repro in a GitHub Action.

Here's a repro in a python:3.11.8 Docker image:

git clone https://github.com/ryan-williams/dvc-push-bug
cd dvc-push-bug
docker build -t dvc-push-bug .
docker run --rm dvc-push-bug

Dockerfile, dvc-test.sh.

Output:

...
❌ missing /tmp/remote/files/md5/5c/9597f3c8245907ea71a89d9d39d08e
Full output ``` + mkdir dvc-test + cd dvc-test + git init Initialized empty Git repository in /src/dvc-test/.git/ + dvc init Initialized DVC repository. You can now commit the changes to git. +---------------------------------------------------------------------+ | | | DVC has enabled anonymous aggregate usage analytics. | | Read the analytics documentation (and how to opt-out) here: | | | | | +---------------------------------------------------------------------+ What's next? ------------ - Check out the documentation: - Get help and share ideas: - Star us on GitHub: + dvc config core.autostage true + remote=/tmp/remote + dvc remote add remote /tmp/remote + dvc remote default remote + git commit -m 'dvc init' [main (root-commit) a975465] dvc init 3 files changed, 6 insertions(+) create mode 100644 .dvc/.gitignore create mode 100644 .dvc/config create mode 100644 .dvcignore + echo aaa + dvc add 1.txt + git commit -m 'echo aaa > 1.txt' [main 841e736] echo aaa > 1.txt 2 files changed, 6 insertions(+) create mode 100644 .gitignore create mode 100644 1.txt.dvc + git branch branch + echo AAA + dvc add 1.txt + git commit -m 'echo AAA > 1.txt' [main 6e92bab] echo AAA > 1.txt 1 file changed, 1 insertion(+), 1 deletion(-) .dvc/cache/files/md5 ├── 5c │   └── 9597f3c8245907ea71a89d9d39d08e └── 88 └── 80cd8c1fb402585779766f681b868b 3 directories, 2 files + tree .dvc/cache/files/md5 + dvc push -A 1 file pushed + tree /tmp/remote/files/md5 /tmp/remote/files/md5 └── 88 └── 80cd8c1fb402585779766f681b868b 2 directories, 1 file + dvc push -a Everything is up to date. + git --no-pager log --oneline --decorate 6e92bab (HEAD -> main) echo AAA > 1.txt 841e736 (branch) echo aaa > 1.txt a975465 dvc init + git --no-pager show branch:1.txt.dvc outs: - md5: 5c9597f3c8245907ea71a89d9d39d08e size: 4 hash: md5 path: 1.txt ❌ missing /tmp/remote/files/md5/5c/9597f3c8245907ea71a89d9d39d08e + missing_path=/tmp/remote/files/md5/5c/9597f3c8245907ea71a89d9d39d08e + '[' -e /tmp/remote/files/md5/5c/9597f3c8245907ea71a89d9d39d08e ']' + echo '❌ missing /tmp/remote/files/md5/5c/9597f3c8245907ea71a89d9d39d08e' ```

Putting the remote under /tmp, or a subdir of the Git/DVC workdir, doesn't seem to matter, e.g. both of these fail:

docker run --rm dvc-push-bug                 # Default: /tmp/remote
docker run --rm -e /src/remote dvc-push-bug  # Alternate remote location
shcheklein commented 2 weeks ago

First of all thanks for an amazing work for making it reproducible. GH actions is 🔥 ! :)

Here is the fix for this behavior: https://github.com/shcheklein/dvc-push-bug/commit/8e5c2046988bdddb4731e40bef0612309fe811d0 it seems (I think as I mentioned above When you do git commit -m 'dvc init' are you sure you committed all DVC config changes?).

Please take a look and close the ticket if that works for you.

ryan-williams commented 2 weeks ago

Nice catch, GHA passed here with that fix 🙏

I also ran:

git add .dvc/config       # `.dvc/config` required
git commit -m 'dvc init'  # OK to leave out `.dvc{,/.git}ignore`

which also passed. Confirming my understanding:

I originally hit this issue in a larger project context, where I believe all relevant DVC configs were properly committed… I'll try that again and report back, but will close this for now. tysm for your help!