iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.61k stars 1.18k forks source link

dvc push hangs silently #10483

Open davies-w opened 2 months ago

davies-w commented 2 months ago

Bug Report

dvc pull hangs silently

Description

We make some changes, that require a push. There's only a few files involved. dvc push just hangs silently.

(ds) mac:~/repos/candidate_seniority_patch/model-interfaces$ dvc status
measurement_pipelines/generate_report/dvc.yaml:capture_main_hash:                                                                                                                                  
        always changed
measurement_pipelines/generate_OC_report/dvc.yaml:capture_main_hash:
        always changed
measurement_pipelines/OneTenRegressionTest_OC/dvc.yaml:regression_test:
        changed outs:
                modified:           measurement_pipelines/OneTenRegressionTest_OC/data/industry_to_colour.pkl
data_pipelines/LightcastJobData/dvc.yaml:query_data:
        changed outs:
                not in cache:       data_pipelines/LightcastJobData/data/focus_job_posting

With --verbose on, ^C interrupts and shows this

(ds) mac:~/repos/candidate_seniority_patch/model-interfaces$ dvc push --verbose
2024-07-15 11:49:32,243 DEBUG: v3.51.2 (pip), CPython 3.11.6 on macOS-14.5-arm64-arm-64bit
2024-07-15 11:49:32,243 DEBUG: command: /Users/winton/.pyenv/versions/ds/bin/dvc push --verbose
^C2024-07-15 11:49:34,204 ERROR: interrupted by the user              
Traceback (most recent call last):
  File "/Users/winton/.pyenv/versions/ds/lib/python3.11/site-packages/dvc/cli/__init__.py", line 211, in main
    ret = cmd.do_run()
          ^^^^^^^^^^^^
  File "/Users/winton/.pyenv/versions/ds/lib/python3.11/site-packages/dvc/cli/command.py", line 27, in do_run
    return self.run()
           ^^^^^^^^^^
  File "/Users/winton/.pyenv/versions/ds/lib/python3.11/site-packages/dvc/commands/data_sync.py", line 64, in run
    processed_files_count = self.repo.push(
                            ^^^^^^^^^^^^^^^
  File "/Users/winton/.pyenv/versions/ds/lib/python3.11/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/winton/.pyenv/versions/ds/lib/python3.11/site-packages/dvc/repo/push.py", line 103, in push
    used_run_cache = self.stage_cache.push(remote) if run_cache else []
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/winton/.pyenv/versions/ds/lib/python3.11/site-packages/dvc/stage/cache.py", line 281, in push
    return self.transfer(self.repo.cache.legacy, dest_odb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/winton/.pyenv/versions/ds/lib/python3.11/site-packages/dvc/stage/cache.py", line 264, in transfer
    if to_fs.exists(key) and first(to_fs.find(key)):
       ^^^^^^^^^^^^^^^^^
  File "/Users/winton/.pyenv/versions/ds/lib/python3.11/site-packages/dvc_objects/fs/base.py", line 453, in exists
    return self.fs.exists(path)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/winton/.pyenv/versions/ds/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/winton/.pyenv/versions/ds/lib/python3.11/site-packages/fsspec/asyn.py", line 91, in sync
    if event.wait(1):
       ^^^^^^^^^^^^^
  File "/Users/winton/.pyenv/versions/3.11.6/lib/python3.11/threading.py", line 629, in wait
    signaled = self._cond.wait(timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/winton/.pyenv/versions/3.11.6/lib/python3.11/threading.py", line 331, in wait
    gotit = waiter.acquire(True, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

2024-07-15 11:49:34,211 DEBUG: Analytics is enabled.
2024-07-15 11:49:34,266 DEBUG: Trying to spawn ['daemon', 'analytics', '/var/folders/nr/spbjfgvs0sj47g0rvpm_kkqh0000gp/T/tmpqtuqji0u', '-v']
2024-07-15 11:49:34,272 DEBUG: Spawned ['daemon', 'analytics', '/var/folders/nr/spbjfgvs0sj47g0rvpm_kkqh0000gp/T/tmpqtuqji0u', '-v'] with pid 59917

Reproduce

I doubt this can be reproduced except in our enviroment. The problem is it works sometimes.

Expected

I expect some clear status to be shown.

Environment information

Output of dvc doctor:

$ dvc doctor

DVC version: 3.51.2 (pip)
-------------------------
Platform: Python 3.11.6 on macOS-14.5-arm64-arm-64bit
Subprojects:
        dvc_data = 3.15.1
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.6
Supports:
        http (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.6.1, boto3 = 1.34.131)
Config:
        Global: /Users/winton/Library/Application Support/dvc
        System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s3s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s3s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/875740855e4fead72d9038d90f145e70

Additional Information (if any):

Happens on both an ARM Mac and a Unix x86 box. We're trying to push to S3, but nothing suggests it get there.

shcheklein commented 2 months ago

I see that it's push (should we update the title of the issue)?

could you also try to run it like "dvc push --no-run-cache" and see if that helps?

davies-w commented 2 months ago

ok it finally ran - but I have no idea what was taking so long...

2024-07-15 12:01:27,055 DEBUG: command: /Users/winton/.pyenv/versions/ds/bin/dvc push --verbose Collecting |194 [00:00, 2.98kentry/s] 2024-07-15 12:12:41,544 DEBUG: Preparing to transfer data from '/Users/winton/repos/candidate_seniority_patch/model-interfaces/.dvc/cache/files/md5' to 's3://adeptid-dvc/dvcstore/files/md5' 2024-07-15 12:12:41,544 DEBUG: Preparing to collect status from 'adeptid-dvc/dvcstore/files/md5' .... 2024-07-15 12:12:49,264 DEBUG: Analytics is enabled. 2024-07-15 12:12:49,328 DEBUG: Trying to spawn ['daemon', 'analytics', '/var/folders/nr/spbjfgvs0sj47g0rvpm_kkqh0000gp/T/tmp77woewuz', '-v'] 2024-07-15 12:12:49,334 DEBUG: Spawned ['daemon', 'analytics', '/var/folders/nr/spbjfgvs0sj47g0rvpm_kkqh0000gp/T/tmp77woewuz', '-v'] with pid 61108

You can see it just sort of "hung" for 11 minutes between the start of the execution and it actually getting to log work being done.

davies-w commented 2 months ago

dvc push --no-run-cache seems to run instantly, but dvc push still hangs.

shcheklein commented 2 months ago

and if you run now it the second time, w/o --no-run-cache - is it still taking a long time to run?

davies-w commented 2 months ago

Yep still taking a long time!

shcheklein commented 2 months ago

I think it's a duplicate of https://github.com/iterative/dvc/issues/10449 . Please put a comment there that you are also hitting it.

davies-w commented 2 months ago

There's maybe two seperate issues here.

1) why is the cache taking so long 2) why's there no debugging information when it takes a long time.

shcheklein commented 2 months ago

why's there no debugging information when it takes a long time.

ideally it should not be taking so long time, but you are right some progress bur for pushing the cache is needed anyways