iterative / dvc-ssh

SSH/SFTP plugin for dvc
Apache License 2.0
1 stars 3 forks source link

SSH: ValueError: Can't create any SFTP connections! #16

Closed gcoter closed 1 year ago

gcoter commented 2 years ago

Bug Report

I open this issue as a follow up to https://github.com/iterative/dvc/issues/6138

Description

dvc push raises an error when trying to push to an SFTP remote. It used to work with older versions. The SFTP remote I use is a personal Raspberry Pi server. I did not change anything on the server.

Reproduce

Unfortunately, since I use a private server, I don't know whether it would be easy to reproduce.

After updating dvc, I tried to run dvc push and I got these logs:

$ dvc push -v -a
2021-10-03 11:59:51,299 DEBUG: Preparing to transfer data from '.dvc/cache' to 'ssh://<REMOTE URL>/gcoter/music-generation-v2.dvc'
2021-10-03 11:59:51,300 DEBUG: Preparing to collect status from 'ssh://<REMOTE URL>/gcoter/music-generation-v2.dvc'
2021-10-03 11:59:51,318 DEBUG: Collecting status from 'ssh://<REMOTE URL>/gcoter/music-generation-v2.dvc'
2021-10-03 11:59:51,918 DEBUG: Querying 38 hashes via object_exists
2021-10-03 12:00:30,881 ERROR: unexpected error - Can't create any SFTP connections!                                                                                         
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/command/data_sync.py", line 57, in run
    processed_files_count = self.repo.push(
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 50, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/repo/push.py", line 48, in push
    pushed += self.cloud.push(obj_ids, jobs, remote=remote, odb=odb)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/data_cloud.py", line 85, in push
    return transfer(
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/objects/transfer.py", line 221, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 160, in compare_status
    dest_exists, dest_missing = status(
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 122, in status
    exists = hashes.intersection(
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 36, in _indexed_dir_hashes
    indexed_dir_exists.update(odb.list_hashes_exists(indexed_dirs))
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 420, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 411, in exists_with_progress
    ret = self.fs.exists(path_info)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/fs/fsspec_wrapper.py", line 96, in exists
    return self.fs.exists(self._with_bucket(path_info))
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 480, in _exists
    await self._info(path)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/sshfs/spec.py", line 135, in _info
    async with self._pool.get() as channel:
  File "/usr/lib/python3.8/contextlib.py", line 171, in __aenter__
    return await self.gen.__anext__()
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/sshfs/pools/soft.py", line 38, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!
------------------------------------------------------------
2021-10-03 12:00:31,971 DEBUG: Version info for developers:
DVC version: 2.7.2 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.4.0-88-generic-x86_64-with-glibc2.29
Supports:
    http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
    https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
    ssh (sshfs = 2021.9.0)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda5
Caches: local
Remotes: ssh
Workspace directory: ext4 on /dev/sda5
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-10-03 12:00:31,973 DEBUG: Analytics is enabled.
2021-10-03 12:00:32,040 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpkmytybwl']'
2021-10-03 12:00:32,043 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpkmytybwl']'

Expected

Since I did not change the configuration of my server and it used to work, I would expect dvc push to work.

Environment information

$ dvc doctor
DVC version: 2.7.2 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Supports:
        http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
        https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
        ssh (sshfs = 2021.9.0)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda5
Caches: local
Remotes: ssh
Workspace directory: ext4 on /dev/sda5
Repo: dvc, git
efiop commented 2 years ago

@gcoter Maybe you could try to reproduce it with a docker image with the same config options? So far we are not able to reproduce ourselves.

gcoter commented 2 years ago

@efiop Yes it is a good idea, actually I deployed the SFTP server as a docker container on my Raspberry Pi. However, the image I used has been built for ARM. I will try to reproduce the error locally on my computer with the original docker image.

sjawhar commented 2 years ago

I created a quick SSH server in a Docker container, mounted my DVC cache into it, and set it as the remote for my local project. Running dvc pull didn't result in an error, but it did open over 1000 connections (according to netstat -tn). Is that expected?

sjawhar commented 2 years ago

Also potentially relevant: when I get this error, it is always during the step of querying the remote cache. If I retry enough times and get past that step, then the actual up/downloading always succeeds.

karajan1001 commented 2 years ago

I created a quick SSH server in a Docker container, mounted my DVC cache into it, and set it as the remote for my local project. Running dvc pull didn't result in an error, but it did open over 1000 connections (according to netstat -tn). Is that expected?

Do you set any --job related config and how many cores are there in your computer.

sjawhar commented 2 years ago

Do you set any --job related config and how many cores are there in your computer.

No, I didn't use the --job flag. 8 cores, 16 threads.

karajan1001 commented 2 years ago

Looks like we need to add some limitations on the status query.

gcoter commented 2 years ago

Hi all, sorry for my late response! As I was about to try to reproduce this issue locally (as proposed by @efiop), I upgraded dvc to the last version (2.9.2) and now it works :slightly_smiling_face:

@sjawhar Maybe you can try it and confirm whether it solves the issue for you as well?

sjawhar commented 2 years ago

Unfortunately still an issue on 2.9.3

$ dvc pull --verbose --recursive pipelines/finger_tapping/
2021-12-29 21:15:54,657 DEBUG: Adding '/home/user/app/.dvc/config.local' to gitignore file.
2021-12-29 21:15:54,679 DEBUG: Adding '/home/user/app/.dvc/tmp' to gitignore file.
2021-12-29 21:15:54,679 DEBUG: Adding '/home/user/app/.dvc/cache' to gitignore file.
2021-12-29 21:15:54,687 DEBUG: Checking if stage 'pipelines/finger_tapping/' is in 'dvc.yaml'
2021-12-29 21:15:55,608 DEBUG: Preparing to transfer data from '/usr/data/project/dvc' to '/home/user/app/.dvc/cache'
2021-12-29 21:15:55,608 DEBUG: Preparing to collect status from '/home/user/app/.dvc/cache'
2021-12-29 21:15:55,619 DEBUG: Collecting status from '/home/user/app/.dvc/cache'
2021-12-29 21:15:55,735 DEBUG: Preparing to collect status from '/usr/data/project/dvc'                                                                                                                                                                                                                                                                                                                                   
2021-12-29 21:15:55,740 DEBUG: Collecting status from '/usr/data/project/dvc'
2021-12-29 21:15:55,856 DEBUG: Querying 126 hashes via object_exists
2021-12-29 21:16:14,510 ERROR: unexpected error - Can't create any SFTP connections!                                                                                                                                                                                                                                                                                                                                                              
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/command/data_sync.py", line 30, in run
    stats = self.repo.pull(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/pull.py", line 29, in pull
    processed_files_count = self.fetch(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 67, in fetch
    d, f = _fetch(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 87, in _fetch
    downloaded += repo.cloud.pull(obj_ids, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/data_cloud.py", line 114, in pull
    return transfer(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/transfer.py", line 153, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 164, in compare_status
    src_exists, src_missing = status(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 122, in status
    exists = hashes.intersection(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 36, in _indexed_dir_hashes
    indexed_dir_exists.update(odb.list_hashes_exists(indexed_dirs))
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 421, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 412, in exists_with_progress
    ret = self.fs.exists(fs_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/fsspec_wrapper.py", line 91, in exists
    return self.fs.exists(path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 549, in _exists
    await self._info(path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/spec.py", line 135, in _info
    async with self._pool.get() as channel:
  File "/usr/local/lib/python3.8/contextlib.py", line 171, in __aenter__
    return await self.gen.__anext__()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/pools/soft.py", line 38, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!
------------------------------------------------------------
2021-12-29 21:16:14,984 DEBUG: Adding '/home/user/app/.dvc/config.local' to gitignore file.
2021-12-29 21:16:14,990 DEBUG: Adding '/home/user/app/.dvc/tmp' to gitignore file.
2021-12-29 21:16:14,990 DEBUG: Adding '/home/user/app/.dvc/cache' to gitignore file.
2021-12-29 21:16:14,991 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>: [Errno 18] Invalid cross-device link
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/command/data_sync.py", line 30, in run
    stats = self.repo.pull(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/pull.py", line 29, in pull
    processed_files_count = self.fetch(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 67, in fetch
    d, f = _fetch(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 87, in _fetch
    downloaded += repo.cloud.pull(obj_ids, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/data_cloud.py", line 114, in pull
    return transfer(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/transfer.py", line 153, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 164, in compare_status
    src_exists, src_missing = status(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 122, in status
    exists = hashes.intersection(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 36, in _indexed_dir_hashes
    indexed_dir_exists.update(odb.list_hashes_exists(indexed_dirs))
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 421, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 412, in exists_with_progress
    ret = self.fs.exists(fs_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/fsspec_wrapper.py", line 91, in exists
    return self.fs.exists(path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 549, in _exists
    await self._info(path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/spec.py", line 135, in _info
    async with self._pool.get() as channel:
  File "/usr/local/lib/python3.8/contextlib.py", line 171, in __aenter__
    return await self.gen.__anext__()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/pools/soft.py", line 38, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 28, in _link
    func(from_path, to_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/local.py", line 148, in reflink
    System.reflink(from_info, to_info)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/system.py", line 112, in reflink
    System._reflink_linux(source, link_name)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/system.py", line 96, in _reflink_linux
    fcntl.ioctl(d.fileno(), FICLONE, s.fileno())
OSError: [Errno 18] Invalid cross-device link

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 69, in _try_links
    return _link(link, from_fs, from_path, to_fs, to_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 32, in _link
    raise OSError(
OSError: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 124, in _test_link
    _try_links([link], from_fs, from_file, to_fs, to_file)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 77, in _try_links
    raise OSError(
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2021-12-29 21:16:14,992 DEBUG: Removing '/home/user/.VxePWiE728u2v5gnSQT3vY.tmp'
2021-12-29 21:16:14,992 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] 'hardlink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>: [Errno 18] Invalid cross-device link: '/home/user/app/.dvc/cache/.RhJXijpmS46m4MKMHUFkXk.tmp' -> '/home/user/.VxePWiE728u2v5gnSQT3vY.tmp'
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/command/data_sync.py", line 30, in run
    stats = self.repo.pull(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/pull.py", line 29, in pull
    processed_files_count = self.fetch(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 67, in fetch
    d, f = _fetch(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 87, in _fetch
    downloaded += repo.cloud.pull(obj_ids, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/data_cloud.py", line 114, in pull
    return transfer(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/transfer.py", line 153, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 164, in compare_status
    src_exists, src_missing = status(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 122, in status
    exists = hashes.intersection(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 36, in _indexed_dir_hashes
    indexed_dir_exists.update(odb.list_hashes_exists(indexed_dirs))
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 421, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 412, in exists_with_progress
    ret = self.fs.exists(fs_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/fsspec_wrapper.py", line 91, in exists
    return self.fs.exists(path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 549, in _exists
    await self._info(path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/spec.py", line 135, in _info
    async with self._pool.get() as channel:
  File "/usr/local/lib/python3.8/contextlib.py", line 171, in __aenter__
    return await self.gen.__anext__()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/pools/soft.py", line 38, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 28, in _link
    func(from_path, to_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/local.py", line 141, in hardlink
    System.hardlink(from_info, to_info)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/system.py", line 39, in hardlink
    os.link(src, link_name)
OSError: [Errno 18] Invalid cross-device link: '/home/user/app/.dvc/cache/.RhJXijpmS46m4MKMHUFkXk.tmp' -> '/home/user/.VxePWiE728u2v5gnSQT3vY.tmp'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 69, in _try_links
    return _link(link, from_fs, from_path, to_fs, to_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 32, in _link
    raise OSError(
OSError: [Errno 95] 'hardlink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 124, in _test_link
    _try_links([link], from_fs, from_file, to_fs, to_file)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 77, in _try_links
    raise OSError(
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2021-12-29 21:16:14,993 DEBUG: Removing '/home/user/.VxePWiE728u2v5gnSQT3vY.tmp'
2021-12-29 21:16:14,993 DEBUG: Removing '/home/user/.VxePWiE728u2v5gnSQT3vY.tmp'
2021-12-29 21:16:14,993 DEBUG: Removing '/home/user/app/.dvc/cache/.RhJXijpmS46m4MKMHUFkXk.tmp'
2021-12-29 21:16:15,000 DEBUG: Version info for developers:
DVC version: 2.9.3 (pip)
---------------------------------
Platform: Python 3.8.8 on Linux-5.15.8-76051508-generic-x86_64-with-glibc2.2.5
Supports:
        hdfs (fsspec = 2021.11.1, pyarrow = 4.0.1),
        webhdfs (fsspec = 2021.11.1),
        http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
        ssh (sshfs = 2021.11.2)
Cache types: symlink
Cache directory: ext4 on /dev/mapper/data-root
Caches: local
Remotes: ssh, ssh
Workspace directory: ext4 on /dev/mapper/data-root
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-12-29 21:16:15,002 DEBUG: Analytics is enabled.
2021-12-29 21:16:15,085 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpiyk0nprz']'
2021-12-29 21:16:15,088 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpiyk0nprz']'
efiop commented 2 years ago

@sjawhar Could you try --jobs 1?

Also, how many dvc files do you have in pipelines/finger_tapping/? Could you run

find pipelines/finger_tapping -type f -name '*.dvc' | wc -l
mistermult commented 2 years ago

I had the same error. Unfortunately, it is hard to replicate:

I also have directories with tens of thousands of files in the repository. But they where pushed in the past and there was no error.

@gcoter @sjawhar Maybe you can try it with an much older version, e.g. 1.2.x. In the past, I did exactly the same steps that resulted in this error with the newest version. With 1.2.x it worked in the past.

gcoter commented 2 years ago

Hi @mistermult, thanks for your feedbacks šŸ™‚ Indeed, using an older version worked for me as well and I encountered this issue when using a more recent version.

But since I have upgraded DVC (https://github.com/iterative/dvc-ssh/issues/16), I don't have this issue anymore. Which version of DVC are you using? In my case, I think the issue disappeared after version 2.9.2.

gcoter commented 2 years ago

But it is weird because upgrading did not work for @sjawhar šŸ™

sjawhar commented 2 years ago

@sjawhar Could you try --jobs 1?

Also, how many dvc files do you have in pipelines/finger_tapping/? Could you run

find pipelines/finger_tapping -type f -name '*.dvc' | wc -l

I've tried with --jobs 1, still fails intermittently. Upgrading to 2.8.x helped a bit, it fails less often, but still intermittently. I can't yet upgrade to 2.9.x because that completely breaks on our infrastructure and I haven't had time to figure out why.

There are quite a few outputs in this repo, which might be why I have the issue. There are 18 or so .dvc files, which each track a directory that contains several files. Then each of those 18 directories gets processed through 10 or so stages (foreach), each of which also outputs a directory. So, lots of files, lots of directories.

sjawhar commented 2 years ago

I'm getting this error still with 2.9.5

ilankor commented 2 years ago

Hi, I am having the same issue with "unexpected error - Can't create any SFTP connections!" when running dvc push/pull.

Would appreciate any help!

pared commented 2 years ago

@ilankor reached out to us on discord:

stack trace:

2022-03-07 16:39:45,515 DEBUG: Preparing to transfer data from 'ssh://server/' to '.dvc/cache'
2022-03-07 16:39:45,516 DEBUG: Preparing to collect status from '.dvc/cache'
2022-03-07 16:39:46,512 DEBUG: Collecting status from '.dvc/cache'
2022-03-07 16:40:01,213 DEBUG: Preparing to collect status from 'ssh://server/'
2022-03-07 16:40:02,125 DEBUG: Collecting status from 'ssh://server'
2022-03-07 16:40:02,126 DEBUG: Querying 128 hashes via object_exists
2022-03-07 16:40:05,930 ERROR: unexpected error - Can't create any SFTP connections!
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/command/data_sync.py", line 41, in run
    glob=self.args.glob,
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/repo/__init__.py", line 50, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/repo/pull.py", line 38, in pull
    run_cache=run_cache,
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/repo/__init__.py", line 50, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/repo/fetch.py", line 72, in fetch
    odb=odb,
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/repo/fetch.py", line 87, in _fetch
    downloaded += repo.cloud.pull(obj_ids, **kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/data_cloud.py", line 121, in pull
    verify=odb.verify,
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/objects/transfer.py", line 153, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/objects/status.py", line 167, in compare_status
    src, obj_ids, index=src_index, **kwargs
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/objects/status.py", line 123, in status
    _indexed_dir_hashes(odb, index, dir_objs, name, cache_odb)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/objects/status.py", line 48, in _indexed_dir_hashes
    dir_exists.update(odb.list_hashes_exists(dir_hashes - dir_exists))
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/objects/db/base.py", line 415, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/objects/db/base.py", line 406, in exists_with_progress
    ret = self.fs.exists(path_info)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/fs/fsspec_wrapper.py", line 136, in exists
    return self.fs.exists(self._with_bucket(path_info))
  File "/home/eden/.local/lib/python3.6/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/home/eden/.local/lib/python3.6/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/eden/.local/lib/python3.6/site-packages/fsspec/asyn.py", line 555, in _exists
    await self._info(path)
  File "/home/eden/.local/lib/python3.6/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/sshfs/spec.py", line 135, in _info
    async with self._pool.get() as channel:
  File "/home/eden/.local/lib/python3.6/site-packages/sshfs/compat.py", line 23, in __aenter__
    return await self.gen.__anext__()
  File "/home/eden/.local/lib/python3.6/site-packages/sshfs/pools/soft.py", line 38, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!
------------------------------------------------------------
2022-03-07 16:40:06,057 DEBUG: Version info for developers:
DVC version: 2.8.1 (pip)
---------------------------------
Platform: Python 3.6.9 on Linux-4.15.0-169-generic-x86_64-with-Ubuntu-18.04-bionic
Supports:
        webhdfs (fsspec = 2022.1.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        ssh (sshfs = 2021.11.2)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda1
Caches: local
Remotes: ssh
Workspace directory: ext4 on /dev/sda1
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-03-07 16:40:06,058 DEBUG: Analytics is enabled.
2022-03-07 16:40:06,083 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpjgmju351']'
2022-03-07 16:40:06,094 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpjgmju351']'

discussion: https://discord.com/channels/485586884165107732/485596304961962003/950780018999562250

dberenbaum commented 2 years ago

@dtrifiro You might want to keep an eye on this one.

ilankor commented 2 years ago

Thanks! The strange thing is, other users can run "dvc push/pull" from their profiles and the same server

ulie50 commented 1 year ago

I am having the same issue with dvc 2.41.1 (dvc-ssh 2.20.0) when I try to push on the ssh server two csv files (each about 50 MB). The Server has OpenSSH_7.9p1 and I can transfer files with scp to the folder which I specified in dvc remote add('/opt/textminer/text_miner_dvc' in my case). Here is the stack trace:

2023-01-11 10:33:38,297 DEBUG: Preparing to transfer data from '/home/usr/git/textminer_api/.dvc/cache' to '/opt/textminer/text_miner_dvc'
2023-01-11 10:33:38,298 DEBUG: Preparing to collect status from '/opt/textminer/text_miner_dvc'
2023-01-11 10:33:38,298 DEBUG: Collecting status from '/opt/textminer/text_miner_dvc'
2023-01-11 10:33:38,794 ERROR: unexpected error - Can't create any SFTP connections!                                                                                                                   
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/commands/data_sync.py", line 59, in run
    processed_files_count = self.repo.push(
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/repo/push.py", line 92, in push
    result = self.cloud.push(
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/data_cloud.py", line 143, in push
    return self.transfer(
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/data_cloud.py", line 124, in transfer
    return transfer(src_odb, dest_odb, objs, **kwargs)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_data/hashfile/transfer.py", line 190, in transfer
    status = compare_status(
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 179, in compare_status
    dest_exists, dest_missing = status(
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 151, in status
    odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/db.py", line 367, in oids_exist
    remote_size, remote_oids = self._estimate_remote_size(
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/db.py", line 244, in _estimate_remote_size
    remote_oids = set(iter_with_pbar(oids))
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/db.py", line 234, in iter_with_pbar
    for oid in oids:
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/db.py", line 200, in _oids_with_limit
    for oid in self._list_oids(prefix):
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/db.py", line 190, in _list_oids
    for path in self._list_paths(prefix):
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/db.py", line 174, in _list_paths
    yield from self.fs.find(self.fs.path.join(*parts), prefix=bool(prefix))
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/fs/base.py", line 366, in find
    yield from self.fs.find(path)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/fsspec/asyn.py", line 113, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/fsspec/asyn.py", line 98, in sync
    raise return_result
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/fsspec/asyn.py", line 53, in _runner
    result[0] = await coro
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/fsspec/asyn.py", line 735, in _find
    async for _, dirs, files in self._walk(path, maxdepth, detail=True, **kwargs):
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/fsspec/asyn.py", line 607, in _walk
    listing = await self._ls(path, detail=True, **kwargs)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/sshfs/spec.py", line 197, in _ls
    async with self._pool.get() as channel:
  File "/usr/lib/python3.8/contextlib.py", line 171, in __aenter__
    return await self.gen.__anext__()
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/sshfs/pools/soft.py", line 38, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!
------------------------------------------------------------
2023-01-11 10:33:38,985 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2023-01-11 10:33:38,985 DEBUG: Removing '/home/usr/git/.EXtcfySXBt3evPxN663izM.tmp'
2023-01-11 10:33:38,985 DEBUG: Removing '/home/usr/git/.EXtcfySXBt3evPxN663izM.tmp'
2023-01-11 10:33:38,985 DEBUG: Removing '/home/usr/git/.EXtcfySXBt3evPxN663izM.tmp'
2023-01-11 10:33:38,986 DEBUG: Removing '/home/usr/git/textminer_api/.dvc/cache/.GLXQSdg6vWyNowTkYfQpk2.tmp'
2023-01-11 10:33:38,989 DEBUG: Version info for developers:
DVC version: 2.41.1 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.15.0-56-generic-x86_64-with-glibc2.29
Subprojects:
        dvc_data = 0.29.0
        dvc_objects = 0.14.1
        dvc_render = 0.0.17
        dvc_task = 0.1.9
        dvclive = 1.3.2
        scmrepo = 0.1.5
Supports:
        http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        ssh (sshfs = 2022.6.0)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p2
Caches: local
Remotes: ssh
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git
Cnly commented 1 year ago

Same here on dvc 2.43.1. Setting the log level to DEBUG2 for the ssh server gives lines like these:

debug1: channel 8: new [server-session]
debug2: session_new: allocate (allocated 8 max 10)
debug1: session_new: session 8
debug1: session_open: channel 8
debug1: session_open: session 8: link with channel 8
debug1: server_input_channel_open: confirm session
debug1: server_input_channel_open: ctype session rchan 9 win 2097152 max 32768
debug1: input_session_request
debug1: channel 9: new [server-session]
debug2: session_new: allocate (allocated 9 max 10)
debug1: session_new: session 9
debug1: session_open: channel 9
debug1: session_open: session 9: link with channel 9
debug1: server_input_channel_open: confirm session
debug1: server_input_channel_open: ctype session rchan 10 win 2097152 max 32768
debug1: input_session_request
debug2: channel: expanding 20
debug1: channel 10: new [server-session]
debug1: session_open: channel 10
error: no more sessions
debug1: session open failed, free channel 10
debug1: channel 10: free: server-session, nchannels 11
debug1: server_input_channel_open: failure session

A workaround is to increase the value of MaxSessions in sshd_config (default is 10).

efiop commented 1 year ago

@Cnly Thanks for the research and detailed report! Looks like we might need to tweak the max_sessions option for sshfs, or maybe we should lower the default (currently 10) in https://github.com/fsspec/sshfs/blob/b912e88d4a81d15cc660f3cb2f3a52480306d277/sshfs/spec.py#L28 or we might want to switch to SFTPHardChannelPool by default. The latter seems to be the best one. Maybe you could try adjusting it (just need to pass pool_type=SFTPHardChannelPool in https://github.com/iterative/dvc-ssh/blob/a0233830e777ccd5a8d3e2e66edc5d828dace067/dvc_ssh/__init__.py#L116 and contribute a patch if it works for you?

Cnly commented 1 year ago

@efiop Thanks for the quick response! Unfortunately pool_type=SFTPHardChannelPool doesn't seem to fix the issue.

...
  File "xxx/venv/lib/python3.8/site-packages/dvc_objects/executors.py", line 134, in batch_coros
    result = fut.result()
  File "xxx/venv/lib/python3.8/site-packages/fsspec/asyn.py", line 568, in _exists
    await self._info(path)
  File "xxx/venv/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "xxx/venv/lib/python3.8/site-packages/sshfs/spec.py", line 125, in _info
    async with self._pool.get() as channel:
  File "xxx/.pyenv/versions/3.8.12/lib/python3.8/contextlib.py", line 171, in __aenter__
    return await self.gen.__anext__()
  File "xxx/venv/lib/python3.8/site-packages/sshfs/pools/hard.py", line 28, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!

I also tried modifying _DEFAULT_MAX_SESSIONS in sshfs/spec.py, but it seems it still tried to create 10 sessions according to the ssh server logs.

efiop commented 1 year ago

Thanks, @Cnly ! šŸ™ Looks like we'll need a bit more research here.

jankrepl commented 1 year ago

We were getting the same issues and rm -rf .dvc/tmp/* solved them. Disclaimer, sorry if it has some other sideeffects:)

Cnly commented 1 year ago

To add to my workaround above, originally I increased MaxSessions to 20 which solved the problem, but now it's happening again and I have to increase it to 30.

haimat commented 1 year ago

For the records: With DVC version 2.43.2 (pip) I also encountered the "SSH: ValueError: Can't create any SFTP connections!" error message, but with the --jobs 1 flag the dvc push command worked again.

JohnAtl commented 1 year ago

I'm running dvc v2.47.0 and having this same issue. Used --jobs 1 as suggested above, and it worked. Running a plain dvc push afterwards, I get the error.

> dvc doctor
DVC version: 2.47.0 (deb)
-------------------------
Platform: Python 3.10.8 on Linux-5.10.0-21-amd64-x86_64-with-glibc2.31
Subprojects:

Supports:
    azure (adlfs = 2023.1.0, knack = 0.10.1, azure-identity = 1.12.0),
    gdrive (pydrive2 = 1.15.1),
    gs (gcsfs = 2023.3.0),
    hdfs (fsspec = 2023.3.0, pyarrow = 11.0.0),
    http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
    https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
    oss (ossfs = 2021.8.0),
    s3 (s3fs = 2023.3.0, boto3 = 1.24.59),
    ssh (sshfs = 2023.1.0),
    webdav (webdav4 = 0.9.8),
    webdavs (webdav4 = 0.9.8),
    webhdfs (fsspec = 2023.3.0)
Cache types: reflink, hardlink, symlink
Cache directory: btrfs on /dev/nvme1n1p1
Caches: local
Remotes: ssh
Workspace directory: btrfs on /dev/nvme1n1p1
Repo: dvc, git

neofetch:

john@john-OptiPlex-7040
-----------------------
OS: Debian GNU/Linux 11 (bullseye) x86_64
Host: OptiPlex 7040
Kernel: 5.10.0-21-amd64
Uptime: 3 days, 10 hours, 59 mins
Packages: 2876 (dpkg)
Shell: zsh /usr/bin/zsh: /home/john/anaconda3/envs/neurogram/lib/libtinfo.so.6: no versi
Resolution: 3840x2160
Terminal: /dev/pts/5
CPU: Intel i7-6700 (8) @ 4.000GHz
GPU: Intel HD Graphics 530
GPU: NVIDIA GeForce GT 1030
Memory: 14225MiB / 48074MiB
drozzy commented 1 year ago

I've traced the error to this dvc version: dvc==2.42.0

In DVC version 2.41.1 the dvc push still work. In DVC version 2.42.0 the push no longer works, and an error is produced: ERROR: unexpected error - Can't create any SFTP connections!

Error persists in the latest (as of today) version dvc == 2.50.1.

dberenbaum commented 1 year ago

@drozzy Do you get ERROR: unexpected error - Can't create any SFTP connections! with dvc push --jobs 1?

I've only taken a quick look based on the version numbers you provided. I'm wondering if it could be related to https://github.com/iterative/dvc-objects/pull/187, which was released in dvc-objects 0.18.1 -> dvc-data 0.34.0 -> dvc 2.42.0 šŸ˜“ .

pmrowla commented 1 year ago

It's probably https://github.com/iterative/dvc-objects/pull/180. We are much more aggressive than before about trying to keep the # of active coroutines saturated in the new(er) batching behavior. I think we probably only need to cap the the default fs._JOBS value in dvc-ssh so that it aligns with the underlying sshfs max_sessions/pool size.

johnyaku commented 1 year ago

Same problem with v2.51.

Adding --jobs=1 allowed the push to go through.

But pretty soon I am going to push about 15 TB of data to a backup remote and I'd like to be able to do this with maximum efficiency. Even if there is not a fix yet, I'd like to know how to calculate what value to give --jobs in order to maximise throughput but still work.

Otterpatsch commented 1 year ago

@johnyaku Are you depending on the changes from 2.41.1 to 2.51? If not i would suggest you simply install version 2.41.1 and then push those 15TB

johnyaku commented 1 year ago

The 15 TB is split across two caches. (We had to move the cache when the old volume filled up, but the new volume has had gc run on it so some of the earlier data is not there any more. It is all on a GCS remote, and so we could pull it all and then push it to the SSH remote, but the egress costs would be a killer.

Instead I want to exploit --allow-missing to push to the new remote from two different instances of the registry -- one pointing at the old cache, and the other pointing at the new cache. Between them we should be able to get all of the data onto the new remote, without any egress costs. We'll keep the GCS as insurance, and for when we compute on GCP, but for on-prem compute we will use the new remote.

dberenbaum commented 1 year ago

It's probably iterative/dvc-objects#180. We are much more aggressive than before about trying to keep the # of active coroutines saturated in the new(er) batching behavior. I think we probably only need to cap the the default fs._JOBS value in dvc-ssh so that it aligns with the underlying sshfs max_sessions/pool size.

@pmrowla What's the level of effort to do this?

pmrowla commented 1 year ago

Capping the default jobs value is a one-line change, but the real issue here is that with SSH a lot of it depends on the user's network as well as the actual SSH server they are connecting to. There is not really a one size fits all default value that will work for everyone, which is why the --jobs option exists.

pmrowla commented 1 year ago

To clarify for anyone experiencing this issue, the default --jobs value is 4 times the # of CPU cores available on a user's machine. In older DVC releases, the default value for SSH remotes used to be limited to 4 (independent of the # of CPU cores). In current releases, the SSH specific limit is lifted, since 4 is too low and will artificially limit performance for a lot of users.

If you experience the "cannot create SFTP connections" error, the suggested fix is to try running with smaller --jobs values. The suggestion in this thread to use --jobs=1 will generally fix the issue, but with significantly worse performance than if use higher --jobs values. Ideally, you want to use the largest --jobs value that does not cause issues with your particular client + SSH server setup (and it will likely require some trial and error to find the ideal value).

Once you've identified a value that works for you, you can also set it as a part of your remote config so that you don't need to specify --jobs on the command line each time.

$ dvc remote modify [--local] my-ssh-remote jobs 4

(the use of --local is optional, but this config option is likely specific to your particular machine and probably does not need to be git committed in the default repo-wide .dvc/config)

JohnAtl commented 1 year ago

I believe the number of sessions is set by the MaxSessions parameter in /etc/ssh/sshd_config on the server, and I believe the default is 10. Setting the --jobs parameter to 4x the number of cores is grossly over the limit of 10 for all but one- or two-core cpus. For example, my cpu has 24 cores, 32 threads, which would be 96 or 128 simultaneous connections.

A safe default for --jobs would seem to be 8. This would allow for two other ssh/sftp connections from other applications. Those running high capacity servers could increase --jobs as they see fit, and as their MaxSessions allows.

daavoo commented 1 year ago

Capping the default jobs value is a one-line change, but the real issue here is that with SSH a lot of it depends on the user's network as well as the actual SSH server they are connecting to. There is not really a one size fits all default value that will work for everyone, which is why the --jobs option exists.

Putting https://github.com/iterative/dvc-ssh/issues/16#issuecomment-1491774178 in the docs would be nice but, given the number of reports here and in other channels, I think we should also go back to a more conservative default number and prevent errors for the average user / default case.

daavoo commented 1 year ago

A safe default for --jobs would seem to be 8. This would allow for two other ssh/sftp connections from other applications. Those running high capacity servers could increase --jobs as they see fit, and as their MaxSessions allows.

8 or more conservative 4 sound good to me.

@pmrowla did we get many complains about performance when it was capped at 4?

pmrowla commented 1 year ago

I believe the number of sessions is set by the MaxSessions parameter in /etc/ssh/sshd_config on the server, and I believe the default is 10.

This is correct, and we do have client level maximum sessions value which is set to have a limit of 10, regardless of your --jobs setting

Setting the --jobs parameter to 4x the number of cores is grossly over the limit of 10 for all but one- or two-core cpus. For example, my cpu has 24 cores, 32 threads, which would be 96 or 128 simultaneous connections.

A safe default for --jobs would seem to be 8. This would allow for two other ssh/sftp connections from other applications. Those running high capacity servers could increase --jobs as they see fit, and as their MaxSessions allows.

@pmrowla did we get many complains about performance when it was capped at 4?

These questions are related, and no, it was not changed due to performance complaints. The issue is that the way --jobs setting in DVC works has changed vs the old behavior now that we use fsspec under the hood, and it seemed unnecessary to maintain separate defaults per DVC remote type.

Previously, --jobs was a hard limit for the number of parallel threads (with a single SFTP session per thread) used for network transfers.

The way it works now is that --jobs is a limit for the number of asyncio coroutines fsspec will allow to be active at a time. This batch of active coroutines is then also throttled by the underlying filesystem, which should be using a connection pool of whatever size the filesystem implementation decides. In sshfs, the way this is supposed to work is that we divide up any network requests between sessions in our pool, which is currently SFTPSoftChannelPool(max_sessions=10).

So in theory, it should still be safe to have a relatively high number of jobs. --jobs=128 doesn't mean that we try to open 128 simultaneous SFTP sessions, it means that we keep 128 queued requests at a time, that are actually only handled up to 10 at a time (via our session pool).

In practice, it may be that there is a problem with session pool implementations in sshfs where it doesn't properly handle cases where the # of active coroutines is larger than the pool size, which is why I initially suggested that we just use fs.config.max_sessions as the allowed maximum for --jobs.

daavoo commented 1 year ago

The way it works now is that --jobs is a limit for the number of asyncio coroutines fsspec will allow to be active at a time. This batch of active coroutines is then also throttled by the underlying filesystem, which should be using a connection pool of whatever size the filesystem implementation decides. In sshfs, the way this is supposed to work is that we divide up any network requests between sessions in our pool, which is currently SFTPSoftChannelPool(max_sessions=10).

So in theory, it should still be safe to have a relatively high number of jobs. --jobs=128 does mean that we try to open 128 simultaneous SFTP sessions, it means that we keep 128 queued requests at a time, that are actually only handled up to 10 at a time (via our session pool).

Thanks for the explanation šŸ™

In practice, it may be that there is a problem with session pool implementations in sshfs where it doesn't properly handle cases where the # of active coroutines is larger than the pool size, which is why I initially suggested that we just use fs.config.max_sessions as the allowed maximum for --jobs.

The code appears to be 2 years untouched and we heavily changed the usage upstream, may be worth dedicating some time to review the implementation on top of that change

pmrowla commented 1 year ago

I think we should also consider exposing max_sessions as an SSH remote config option as well (assuming that we look into fixing the pool behavior), given that there is a distinct difference between --jobs and the session count now, and that the server-side limit is based on the session count

drozzy commented 1 year ago

As per my earlier comment, the only semi-viable solution is to use the old version of dvc 2.41.1. This makes the git hooks that invoke dvc push automatically work, and those are installed by dvc.

However, the old version of dvc 2.41.1 now breaks the VSCode dvc plugin. image

dberenbaum commented 1 year ago

@drozzy Does the suggestion above to use dvc remote modify [--local] my-ssh-remote jobs 4 solve your issue? This should essentially match the behavior in 2.41.1.

drozzy commented 1 year ago

@dberenbaum Yes, dvc remote modify my-ssh-remote jobs 4 fixed the issue. Thank you.

pmrowla commented 1 year ago

There was a bug in the sshfs soft channel pool handling that caused this issue for cases where jobs exceeded the server's MaxSession count. This will be fixed in the next sshfs/dvc-ssh release.

After the fix, it should no longer be necessary for most users to set --jobs for SSH remotes (and it will be safe to use the default number of jobs even with a high CPU core count). The soft channel pool will open as many channels as allowed by the server (up to the sshfs default of 10) and then divide up to --jobs # of coroutines between available pool channels as expected.

I think it is still worth exposing max_sessions to control the pool behavior. In some situations users may want to explicitly set this to a value lower than the server's MaxSessions in order to ensure that some number of SSH sessions are not used by DVC (i.e. to leave some dedicated number of sessions available for user ssh shell connections)

efiop commented 1 year ago

@pmrowla Thank you for looking into it! šŸ”„

pmrowla commented 1 year ago

This fix will be available in the next DVC release, in the meantime users using pip installations can also get the fix with

$ pip install dvc-ssh==2.22.1
Cnly commented 1 year ago

Great work, thank you! Can confirm this fixes my case.