iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.37k stars 1.16k forks source link

pull: S3FileSystem._get_file() got an unexpected keyword argument 'max_concurrency' #10358

Closed odulcy-mindee closed 3 months ago

odulcy-mindee commented 3 months ago

Bug Report

Description

Using latest version of dvc, I was unable to dvc pull my files from my s3 bucket.

Reproduce

docker run -it python:3.10 bash
pip3 install -U pip && pip3 install -U 'dvc[s3]'

Then, going to my directory and trying to dvc pull, I got:

Collecting                                                                                                                                                                                                        |9.44k [00:59,  159entry/s]
ERROR: failed to transfer '4e21b8b771445349f04f860c536a7c11' - S3FileSystem._get_file() got an unexpected keyword argument 'max_concurrency'
ERROR: failed to transfer 'fe8020d3a3adfc2be7159a66ae4372fd' - S3FileSystem._get_file() got an unexpected keyword argument 'max_concurrency'                                                                                                 
ERROR: failed to transfer 'f9e6306f0a4a9d11988d741ca1a13a2c' - S3FileSystem._get_file() got an unexpected keyword argument 'max_concurrency'                                                                                                 
ERROR: failed to transfer '2601da591e1620c0d71ba07d1ae8a1dd' - S3FileSystem._get_file() got an unexpected keyword argument 'max_concurrency'                                                                                                 
ERROR: failed to transfer '10ff65c8ec49f1117d8e766f8c331722' - S3FileSystem._get_file() got an unexpected keyword argument 'max_concurrency'
...

Note: Downgrading to dvc==3.42.0 dvc-s3==3.0.1 s3fs==2023.12.2 (it's a setup I got on another computer) fixed the issue.

Expected

Retrieve my files from my s3 bucket using dvc pull.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.48.4 (pip)
-------------------------
Platform: Python 3.10.13 on Linux-5.15.0-87-generic-x86_64-with-glibc2.36
Subprojects:
        dvc_data = 3.14.1
        dvc_objects = 5.1.0
        dvc_render = 1.0.1
        dvc_task = 0.3.0
        scmrepo = 3.3.0
Supports:
        http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.3.0, boto3 = 1.34.51)
Config:
        Global: /root/.config/dvc
        System: /etc/xdg/dvc

With the working environment:

DVC version: 3.42.0 (pip)
-------------------------
Platform: Python 3.10.13 on Linux-5.15.0-87-generic-x86_64-with-glibc2.36
Subprojects:
        dvc_data = 3.8.0
        dvc_objects = 3.0.6
        dvc_render = 1.0.1
        dvc_task = 0.3.0
        scmrepo = 2.1.1
Supports:
        http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2023.12.2, boto3 = 1.34.51)
Config:
        Global: /root/.config/dvc
        System: /etc/xdg/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/1d244e453b3cc19b4fd3b9e53662b62f
diegopso commented 3 months ago

Got the same problem, downgrading to a previous working setup also "fixed".

Broken setup:

DVC version: 3.48.4 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.14.1
        dvc_objects = 5.1.0
        dvc_render = 1.0.1
        dvc_task = 0.3.0
        scmrepo = 3.3.0
Supports:
        http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.3.0, boto3 = 1.34.51)
Config:
        Global: /home/diego/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdc
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/sdc
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/072e8e18abc0775da40fdf3953b2eca3
skshetry commented 3 months ago

Duplicate of https://github.com/iterative/dvc-s3/issues/80.

skshetry commented 3 months ago

Please pin s3fs to <=2024.2 until it is fixed upstream, we have a fix already waiting to be merged: https://github.com/fsspec/s3fs/pull/863.