iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.36k stars 1.16k forks source link

bug: running exp with `--queue` always take workspace version for python deps #10419

Closed alekseik1 closed 1 month ago

alekseik1 commented 1 month ago

Bug Report

Issue name

exp run --queue: expriment runs with workspace version of code, ignoring changes made in experiment.

Description

When using imports with dvc exp run --queue, the file being imported is always on the workspace version regardless of its state when running experiment.

Reproduce

  1. Create empty git repo.
  2. dvc init.
  3. Create two files: main.py and dep.py

main.py:

from dep import my_str

print(my_str)

dep.py:

my_str = 'main'
  1. Create dvc.yaml
stages:
  main:
    cmd: python /home/aleksei/git/dvc-simple/main.py
    deps:
      - /home/aleksei/git/dvc-simple/main.py
      - /home/aleksei/git/dvc-simple/dep.py

Use absolute imports for --queue to work properly.

  1. Run dvc repro, you'll see main printed to stdout, that's ok.
  2. Change my_str = 'queue' to dep.py
  3. Run dvc exp run --name 'bug' --queue
  4. Change my_str = 'main' back in dep.py.
  5. Make sure git status says that dep.py is not changed.
  6. Run dvc queue start
  7. Check logs of the experiment.
  8. You'll see main printed to stdout, though you created experiment with "queue" in dep.py.

Expected

Expected stdout to be "queue", not "main".

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.50.1 (pip)
-------------------------
Platform: Python 3.10.13 on Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.15.1
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.2
Supports:
        http (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.5, aiohttp-retry = 2.8.3)
Config:
        Global: /home/aleksei/.config/dvc
        System: /etc/xdg/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/sdb
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/44d5031bae74ad1636ec6d61087b9602

Additional Information (if any):

dberenbaum commented 1 month ago

Use absolute imports for --queue to work properly.

This is the opposite of what --queue (and dvc in general) expects. dvc is built to work relative to your repo. Your dvc.yaml should look like this:

stages:
  main:
    cmd: python main.py
    deps:
      - main.py
      - dep.py

That should fix your problem since dvc will make a copy of the repo in a temp directory to run the queue and have copies of the dependencies relative to that temp directory, unlike now where you are always reading from those absolute paths, making the temp directory pointless.

alekseik1 commented 1 month ago

Thanks for reply! Indeed, I changed paths to relative and now everything works as expected. I still have problems with my original setup though. My original setup uses poetry and dvc - perhaps the problem is that I use cmd: poetry run python my_script.py and poetry uses files from "local package" rather than from file system (since I did not pass --no-root option). And these "local package" files are taken from workspace (symlinks, maybe?). Gonna do some more digging and come back.

alekseik1 commented 1 month ago

@dberenbaum I found a quite subtle bug when using imports and poetry - that seems to be the root of the problem. The setup is quite complicated so I pushed it to a small repo here with steps to reproduce.

It seems like dvc updates PYTHONPATH in a way that does not match package-level import like import my_package.,my_module as m but works fine with import my_module as m.

Could you please check out this repo and see if this bug persists on your machine?

dberenbaum commented 1 month ago

This looks like the expected behavior. import my_package.my_module is loading the package from its installed location. import my_module is loading the local package relative to your current directory.

To work as you expect, you would need to run poetry install inside your dvc pipeline to install all current local packages as part of the experiment.