iterative / dvc

🦉 Data Versioning and ML Experiments
https://dvc.org
Apache License 2.0
13.96k stars 1.19k forks source link

checkout and fetch: .dvc stages are not resolved correctly when using --with-deps #9543

Open sjawhar opened 1 year ago

sjawhar commented 1 year ago

Bug Report

Description

Reproduce

How about something like this:

git init
dvc init
mkdir deps
echo 'foo' > deps/foo
echo 'bar' > deps/bar
dvc add deps
dvc remote add -d remote-upstream /tmp/dvc-remote-upstream
# dvc.yaml
stages:
  dummy:
    foreach:
      - foo
      - bar
    do:
      cmd: 'mkdir -p outs && echo "dep: `cat deps/${item}`" > outs/${item}'
      deps:
        - deps/${item}
      outs:
        - outs/${item}
dvc repro
git add .
git commit -m "init"
dvc push deps.dvc dvc.yaml

rm -rf .dvc/cache deps
dvc fetch --with-deps dummy@foo
dvc checkout --with-deps dummy@foo

Can then dvc import from this dummy repo to see the behavior for imported assets instead of added ones.

git init
dvc init
dvc import ../upstream outs
dvc remote add -d remote-downstream /tmp/dvc-remote-downstream
# dvc.yaml 
stages:
  dummy:
    foreach:
      - foo
      - bar
    do:
      cmd: 'mkdir -p next && echo "out: `cat outs/${item}`" > next/${item}'
      deps:
        - outs/${item}
      outs:
        - next/${item}
dvc repro
git add .
git commit -m "init"
dvc push outs.dvc dvc.yaml

rm -rf .dvc/cache outs
dvc fetch --with-deps dummy@foo
dvc checkout --with-deps dummy@foo

Expected

deps/foo (and not deps/bar) is fetched from the remote and checked out in the workspace

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.58.1 (pip)
-------------------------
Platform: Python 3.8.16 on Linux-6.2.6-76060206-generic-x86_64-with-glibc2.2.5
Subprojects:
        dvc_data = 0.51.0
        dvc_objects = 0.22.0
        dvc_render = 0.3.1
        dvc_task = 0.2.1
        scmrepo = 1.0.1
Supports:
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2023.3.0, boto3 = 1.24.59),
        ssh (sshfs = 2023.4.1)
Config:
        Global: /home/kernel/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: overlay on overlay
Caches: local
Remotes: local
Workspace directory: overlay on overlay
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/eef9a7bf3be924c283619f6ae6f6a95e

Additional Information (if any):

$ dvc fetch --verbose dummy@foo
2023-06-05 23:49:15,320 DEBUG: v2.58.1 (pip), CPython 3.8.16 on Linux-6.2.6-76060206-generic-x86_64-with-glibc2.2.5
2023-06-05 23:49:15,320 DEBUG: command: /usr/local/bin/dvc fetch --verbose dummy@foo
2023-06-05 23:49:15,677 DEBUG: Checking if stage 'dummy@foo' is in 'dvc.yaml'
2023-06-05 23:49:15,699 DEBUG: Preparing to transfer data from '/tmp/dvc-remote' to '/tmp/tmp.AFBUiLE9S0/.dvc/cache'
2023-06-05 23:49:15,699 DEBUG: Preparing to collect status from '/tmp/tmp.AFBUiLE9S0/.dvc/cache'
2023-06-05 23:49:15,699 DEBUG: Collecting status from '/tmp/tmp.AFBUiLE9S0/.dvc/cache'
2023-06-05 23:49:15,701 DEBUG: Preparing to collect status from '/tmp/dvc-remote'          
2023-06-05 23:49:15,701 DEBUG: Collecting status from '/tmp/dvc-remote'
1 file fetched                                                                             
2023-06-05 23:49:15,724 DEBUG: Analytics is enabled.                                       
2023-06-05 23:49:15,745 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpduxxmkgt']'
2023-06-05 23:49:15,746 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpduxxmkgt']'
sjawhar commented 1 year ago

CCing @efiop since I think it's highly related to his work on using Index for all the things. I just tested this with 3.0.0a2 and the unwanted behavior is still there for fetch and checkout.

efiop commented 1 year ago

@sjawhar Sorry, but I don't have time to look deep into it right now. Probably related to how --with-deps is resolved in targets_view, but one would need to take a closer look.

sjawhar commented 1 year ago

@sjawhar Sorry, but I don't have time to look deep into it right now. Probably related to how --with-deps is resolved in targets_view, but one would need to take a closer look.

Totally understandable. I have some ideas on how to address this, but I don't know what other work is being done that might conflict with my approach or render my fix irrelevant like the last PR I opened :sweat_smile: I'm motivated to fix this, though, so any advice you can offer would be much appreciated.

efiop commented 1 year ago

@sjawhar targets_view is not going to change significantly any time soon. checkout is also already migrated to a new arch. If you would look into those - i don't think your effort is going to be wasted.

sjawhar commented 10 months ago

@efiop I see the latest couple releases of DVC have some bugfixes specifically addressing imports. Is this issue resolved now? Can I finally upgrade to DVC 3?!

Nevermind, I just tested using the repro script above. Still the same problem :(