Open Jalmenara opened 1 year ago
Hi @Jalmenara , thank you for detailed report!
Would you be willing to try using https://github.com/ronf/asyncssh directly to see if you could narrow it down to either asyncssh or dvc up the stack?
Hello @efiop . Thank you for the quick answer. Indeed, I think that I have found something with asyncssh
, although we must confirm. These are the details:
pip install asyncssh
on the conda environment. It does nothing, since
all "Requirement already satisfied" (this is the output; you can see the versions):Requirement already satisfied: asyncssh in /home/.conda/envs/env_JAR/lib/python3.8/site-packages (2.13.1)
Requirement already satisfied: typing-extensions>=3.6 in /home/.conda/envs/env_JAR/lib/python3.8/site-packages (from asyncssh) (4.4.0)
Requirement already satisfied: cryptography>=3.1 in /home/.conda/envs/env_JAR/lib/python3.8/site-packages (from asyncssh) (38.0.4)
Requirement already satisfied: cffi>=1.12 in /home/.conda/envs/env_JAR/lib/python3.8/site-packages (from cryptography>=3.1->asyncssh) (1.15.1)
Requirement already satisfied: pycparser in /home/.conda/envs/env_JAR/lib/python3.8/site-packages (from cffi>=1.12->cryptography>=3.1->asyncssh) (2.21)
Then, I adapted some basic examples of the docs and tested them: https://asyncssh.readthedocs.io/en/stable/#client-examples
b0
folder I mentioned in the opening post)
import asyncio, asyncssh, sys
async def run_client() -> None: async with asyncssh.connect('destination') as conn: result = await conn.run('ls /work/projects/models-bin/', check=True) print(result.stdout, end='')
try: asyncio.get_event_loop().run_until_complete(run_client()) except (OSError, asyncssh.Error) as exc: sys.exit('SSH connection failed: ' + str(exc))
3. Copy through SFTP: It ~~fails. We have found something~~ runs (see update).
The following runs fine (direct copy of the hash file with `sftp.get()`):
import asyncio, asyncssh, sys
async def run_client() -> None: async with asyncssh.connect('destination') as conn: async with conn.start_sftp_client() as sftp: await sftp.get('/work/projects/models-bin/b0/26324c6904b2a9cb4b88d6d61c81d1')
try: asyncio.get_event_loop().run_until_complete(run_client()) except (OSError, asyncssh.Error) as exc: sys.exit('SFTP operation failed: ' + str(exc))
~~However, changing the `get` line to the whole folder (recursively) throws an error:~~ (see update below)
await sftp.get('/work/projects/models-bin/', preserve=True, recurse=True) # Causes an error
```console
Exception has occurred: SystemExit
SFTP operation failed: [Errno 2] No such file or directory: b''
~~So it seems that there is a problem with the browsing of the folders
(it looks for some weird b''
directory, without the 0
at the end).
I think this is weird, since the situation is the basic one for dvc
.~~
OK, small update. I tested sftp.get
without the slash at the end and it worked. Also, the preserve
option makes no difference.
await sftp.get('/work/projects/models-bin', recurse=True)
This led me to try to remove the end slash at the .dvc/config
file:
[core]
remote = bin-remote
['remote "bin-remote"']
url = ssh://destination/work/projects/models-bin
However, this did not solve the problem.
How does exactly dvc
uses asyncssh
? What is the exact copying method
used under the hood?
Additional info:
scp
: it works
import asyncio, asyncssh, sys
async def run_client() -> None: await asyncssh.scp('destination:/work/projects/models-bin', '.', recurse=True)
try: asyncio.get_event_loop().run_until_complete(run_client()) except (OSError, asyncssh.Error) as exc: sys.exit('SFTP operation failed: ' + str(exc))
@Jalmenara Thanks for trying it out! We use asyncssh through https://github.com/fsspec/sshfs and to get a file we use this https://github.com/fsspec/sshfs/blob/a62fd30cfcf55ef74345a0cc398f5779a1577ffa/sshfs/spec.py#L171 But your original traceback seems to point to just connect
ing that was failing.
@efiop , thanks for your time!
OK, so the thing is... how is it possible that connecting is failing, if both scp
and sftp
methods of asyncssh work fine? What do you mean exactly by connect
ing?
@Jalmenara I mean your initial log seems to point us to connection failing:
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/asyncssh/connection.py", line 8042, in connect
return await asyncio.wait_for(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/tasks.py", line 455, in wait_for
return await fut
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/asyncssh/connection.py", line 430, in _connect
_, session = await loop.create_connection(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/base_events.py", line 986, in create_connection
infos = await self._ensure_resolved(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/base_events.py", line 1365, in _ensure_resolved
return await loop.getaddrinfo(host, port, family=family, type=type,
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/base_events.py", line 825, in getaddrinfo
return await self.run_in_executor(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
Are you sure you are still getting the same kind of error?
Ah, alright. Sorry for my bad understanding... @efiop
Yes, i am still getting the same error, even after the tests with scp
, sftp
, etc. This is another log from dvc fetch -v
(I just got it). How could I debug the call to asyncio.create_connection()
in the context of the dvc fetch
command?
2023-05-22 18:29:08,530 DEBUG: v2.57.2 (pip), CPython 3.8.15 on Linux-5.10.0-0.bpo.7-amd64-x86_64-with-glibc2.10
2023-05-22 18:29:08,530 DEBUG: command: /home/pepito/.conda/envs/env_JAR/bin/dvc fetch -v
2023-05-22 18:29:09,384 DEBUG: Preparing to transfer data from '/work/projects/models-bin/mag_vv_cry-sdyn/' to '/projects/pepito/mag_vv_cry-sdyn/.dvc/cache'
2023-05-22 18:29:09,384 DEBUG: Preparing to collect status from '/projects/pepito/mag_vv_cry-sdyn/.dvc/cache'
2023-05-22 18:29:09,384 DEBUG: Collecting status from '/projects/pepito/mag_vv_cry-sdyn/.dvc/cache'
2023-05-22 18:29:09,390 DEBUG: Preparing to collect status from '/work/projects/models-bin/mag_vv_cry-sdyn/'
2023-05-22 18:29:09,390 DEBUG: Collecting status from '/work/projects/models-bin/mag_vv_cry-sdyn/'
2023-05-22 18:29:09,925 ERROR: unexpected error - [Errno -2] Name or service not known
Traceback (most recent call last):
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/cli/__init__.py", line 210, in main
ret = cmd.do_run()
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/cli/command.py", line 26, in do_run
return self.run()
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/commands/data_sync.py", line 84, in run
processed_files_count = self.repo.fetch(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/repo/__init__.py", line 65, in wrapper
return f(repo, *args, **kwargs)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/repo/fetch.py", line 86, in fetch
d, f = _fetch(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/repo/fetch.py", line 166, in _fetch
d, f = repo.cloud.pull(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/data_cloud.py", line 181, in pull
return self.transfer(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/data_cloud.py", line 135, in transfer
return transfer(src_odb, dest_odb, objs, **kwargs)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_data/hashfile/transfer.py", line 203, in transfer
status = compare_status(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 189, in compare_status
src_exists, src_missing = status(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 149, in status
odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/db.py", line 411, in oids_exist
remote_size, remote_oids = self._estimate_remote_size(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/db.py", line 293, in _estimate_remote_size
remote_oids = set(iter_with_pbar(oids))
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/db.py", line 283, in iter_with_pbar
for oid in oids:
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/db.py", line 249, in _oids_with_limit
for oid in self._list_oids(prefixes=prefixes, jobs=jobs):
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/db.py", line 236, in _list_oids
for path in self._list_prefixes(prefixes=prefixes, jobs=jobs):
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/db.py", line 216, in _list_prefixes
yield from self.fs.find(paths, batch_size=jobs, prefix=prefix)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/fs/base.py", line 429, in find
yield from self.fs.find(path)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/funcy/objects.py", line 50, in __get__
return prop.__get__(instance, type)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/funcy/objects.py", line 28, in __get__
res = instance.__dict__[self.fget.__name__] = self.fget(instance)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_ssh/__init__.py", line 119, in fs
return _SSHFileSystem(**self.fs_args)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/fsspec/spec.py", line 76, in __call__
obj = super().__call__(*args, **kwargs)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/sshfs/spec.py", line 66, in __init__
self._client, self._pool = self.connect(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/fsspec/asyn.py", line 113, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/fsspec/asyn.py", line 98, in sync
raise return_result
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/fsspec/asyn.py", line 53, in _runner
result[0] = await coro
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
return fut.result()
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
return await func(*args, **kwargs)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/sshfs/spec.py", line 83, in _connect
client = await self._stack.enter_async_context(_raw_client)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/contextlib.py", line 568, in enter_async_context
result = await _cm_type.__aenter__(cm)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/asyncssh/misc.py", line 274, in __aenter__
self._coro_result = await self._coro
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/asyncssh/connection.py", line 8042, in connect
return await asyncio.wait_for(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/tasks.py", line 455, in wait_for
return await fut
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/asyncssh/connection.py", line 430, in _connect
_, session = await loop.create_connection(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/base_events.py", line 986, in create_connection
infos = await self._ensure_resolved(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/base_events.py", line 1365, in _ensure_resolved
return await loop.getaddrinfo(host, port, family=family, type=type,
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/base_events.py", line 825, in getaddrinfo
return await self.run_in_executor(
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
2023-05-22 18:29:09,991 DEBUG: Version info for developers:
DVC version: 2.57.2 (pip)
-------------------------
Platform: Python 3.8.15 on Linux-5.10.0-0.bpo.7-amd64-x86_64-with-glibc2.10
Subprojects:
dvc_data = 0.51.0
dvc_objects = 0.22.0
dvc_render = 0.4.0
dvc_task = 0.2.1
scmrepo = 1.0.3
Supports:
http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
ssh (sshfs = 2023.4.1),
webdav (webdav4 = 0.9.8),
webdavs (webdav4 = 0.9.8)
Config:
Global: /home/pepito/.config/dvc
System: /etc/xdg/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: ssh
Workspace directory: xfs on /dev/etherd/e1.2
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/d2366a2b2ee6d6ec2fdbd02e25c33a47
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-05-22 18:29:09,992 DEBUG: Analytics is enabled.
2023-05-22 18:29:10,030 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpqj37j1ff']'
2023-05-22 18:29:10,032 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpqj37j1ff']'
If you run dvc with --pdb
(like dvc fetch --pdb ...
) you it will drop you into a PDB shell when the exception is raised.
The getaddrinfo
call is failing, you should check that host
and port
are what you would expect given your remote and SSH config.
i.e. assuming the PDB session shows you the traceback for
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
you can just do
(Pdb) host
to see the value of host (and same for the other variables at that point in the traceback)
Bug Report
Description
dvc fetch
&pull
commands do not work with ssh remote even though the same path works withscp
,rsync
andsftp
+get
Reproduce
I do not see a clear way to make this problem reproducible. In fact, I tried to post this as a Discord question, but my description was too long for a message. Hopefully it can be understood properly. The situation is as follows:
My team and I are facing a problem when using dvc ssh/sftp tools. We are setting up a dvc remote shared by two companies to work on a project: clientA & contractorB (I belong to the latter). The remote is hosted by the clientA at a location accessible by us through ssh. There is an intermediate proxy server, but we have sorted that out using the
ProxyJump
option in~/.ssh/config
, like this:The dvc remote is stored at
destination
. We have also configured the ssh keys on the servers of clientA, so that no password prompts are needed.On the
dvc
side, we have configured the remote in our repo with the following.dvc/config
:However, the
dvc fetch/pull
commands fail, with the following prompt:The first thing I did was ensuring that the paths were written correctly. For instance, I removed the
:
in the url of the.dvc/config
, between "destination" and "/work":This did not solve the problem.
Next, to discard issues with the ssh/sftp connections, I decided to copy manually the
/work/projects/models-bin/
folder from clientA to my contractorB's computer using three different methods:scp
,rsync
andsftp
.The three methods work properly: the "dvc-like" files appear at contractorB side (e.g.,
models-bin/b0/26324c6904b2a9cb4b88d6d61c81d1
). This is what led me to think that the issue might be on the dvc side, and not so much on the connection or the paths. Note that the path used in the three manual methods is exactly the same (copy-pasted).Expected
Be able to locate the files.
Environment information
Output of
dvc doctor
:Additional Information:
Output of
dvc fetch -v
: