iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.6k stars 1.18k forks source link

dvc push: No progress bar or text displayed , even with --verbose #9666

Closed Taytay closed 1 year ago

Taytay commented 1 year ago

Bug Report

dvc push: No progress bar or text displayed , even with --verbose

Description

I am trying to push about 300MB of files to S3, and I'm on a slow connection. When I run dvc push, there is no progress notification of any sort. If I use --verbose, I get more output, but once bits start moving, there is no text output.

2023-06-27 00:08:47,449 DEBUG: v3.2.2 (pip), CPython 3.11.2 on macOS-12.6-arm64-arm-64bit
2023-06-27 00:08:47,449 DEBUG: command: /Users/taytay/my_project/env/bin/dvc push models/ynab/ai_categorize --verbose
2023-06-27 00:08:47,763 DEBUG: Checking if stage 'models/ynab/ai_categorize' is in 'dvc.yaml'
2023-06-27 00:08:47,840 DEBUG: Preparing to transfer data from '/Users/taytay/my_project/.dvc/cache/files/md5' to 'my_remote/dvc/files/md5'
2023-06-27 00:08:47,841 DEBUG: Preparing to collect status from 'my_remote/dvc/files/md5'
2023-06-27 00:08:47,841 DEBUG: Collecting status from 'my_remote/dvc/files/md5'
2023-06-27 00:08:47,842 DEBUG: Querying 6 oids via object_exists
2023-06-27 00:08:48,350 DEBUG: Querying 1 oids via object_exists
2023-06-27 00:08:49,614 DEBUG: Estimated remote size: 4096 files
2023-06-27 00:08:49,615 DEBUG: Querying '38' oids via traverse
2023-06-27 00:08:49,836 DEBUG: Preparing to collect status from '/Users/taytay/my_project/.dvc/cache/files/md5'
2023-06-27 00:08:49,837 DEBUG: Collecting status from '/Users/taytay/my_project/.dvc/cache/files/md5'

It appears to have hung at this point, but I can see via my network monitor that it's sending files.

This is on a Mac using the Fish shell. The remote is R2 from Cloudflare.

Reproduce

(Not sure what is necessary about the environment

  1. dvc push

Expected

Progress of some sort, via a progress bar or text, or some notification what dvc is up to.

Environment information

Output of dvc doctor:

dvc doctor                                                                                                                                                       ─╯
DVC version: 3.2.2 (pip)
------------------------
Platform: Python 3.11.2 on macOS-12.6-arm64-arm-64bit
Subprojects:
    dvc_data = 2.3.0
    dvc_objects = 0.23.0
    dvc_render = 0.5.3
    dvc_task = 0.3.0
    scmrepo = 1.0.4
Supports:
    http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
    https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
    s3 (s3fs = 2023.6.0, boto3 = 1.26.76)
Config:
    Global: /Users/taytay/Library/Application Support/dvc
    System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/421e053060dadb20df81a448aaccef85
pmrowla commented 1 year ago

Can you please try doing

DVC_IGNORE_ISATTY=1 dvc push

and see if you get progress bars?

Taytay commented 1 year ago

I'm afraid that didn't help. I've switched from fish shell to zsh to see if that made any difference, and it doesn't appear to. For what it's worth, I do see a progress bar on the right of the terminal when performing other actions, like "querying remote cache" and "checking cache". It's when it actually starts to upload files that it seems to go completely silent. It's weird

daavoo commented 1 year ago

Maybe related to https://github.com/iterative/dvc-data/pull/401

shcheklein commented 1 year ago

Hitting the same with a basic S3 in Codespaces:

https://github.com/iterative/dvc/assets/3659196/0cdc7d69-ab23-49ab-9f60-22237491bf72

dberenbaum commented 1 year ago

I get the same on dvc pull. To reproduce, set credentials for the AWS sandbox environment, clone https://github.com/dberenbaum/object-detection, and then run dvc pull. You should get a frozen state showing something like this:

Screenshot 2023-07-31 at 9 28 45 AM
dberenbaum commented 1 year ago

Not sure if this should be in a separate issue, but it's not only a lack of progress, but also very slow operations. Doing dvc pull on this cats-dogs dataset of 2800 images took over 5 minutes.

profile.zip

Screenshot 2023-07-31 at 11 22 43 AM
dberenbaum commented 1 year ago

Example of hanging during dvc push:

https://github.com/iterative/dvc/assets/2308172/e71a8cc5-4cf9-490a-afb4-a1eb253a1baf

To reproduce:

  1. Download the dataset from https://www.kaggle.com/datasets/jessicali9530/celeba-dataset?resource=download.
  2. Add it to a dvc repo with remote storage (I used s3)
  3. Push
mtiller commented 1 year ago

Out of curiousity, how have you (or anybody else) configured dvc to use R2? I couldn't get it to work.

shcheklein commented 1 year ago

@mtiller what error do you see? I haven't tried it myself, but recently I've seen someone solved it with region = auto in the config.

mtiller commented 1 year ago

IIRC, the issue was around authentication. But I suspect it was using the wrong endpoint. The strange thing about dvc (and you can see this in the example they have for using DigitalOcean Spaces) is that even when you specify an endpointurl, it tacks the bucket into the hostname (at least that is the only explanation I have for why the DigitalOcean thing works because the endpointurl they use in that example is not the one DO gives me. I suspect that R2 doesn't work because it doesn't follow this convention (and perhaps dvc is tacking the bucket name on when it shouldn't?).

Taytay commented 1 year ago

I'm using it with R2 actually. It's been a while since I configured it, but I don't recall anything being too odd about it. If you're still having trouble, I can look at my config.