iterative / dvc-ssh

SSH/SFTP plugin for dvc
Apache License 2.0
1 stars 3 forks source link

import: remote file can be downloaded via `dvc get` but can't be downloaded with import #19

Open hv10 opened 1 year ago

hv10 commented 1 year ago

Bug Report

Description

When using a dvc repo as a data registry which uses an sshfs as remote we ran into issues importing data into depended repositories, specifically importing a directory which includes several files, while the directory itself is the object tracked added to the remote as a stage output.

Reproduce

  1. have a repo that uses sshfs as default remote (using ssh-key auth)
  2. init new repo
  3. dvc import a directory --> leads to PermissionError
  4. dvc get the same directory --> works as expected

Expected

dvc import and dvc get should both be able to pull the directory and files in question.

Environment information

Output of dvc doctor:

DVC version: 2.29.0 (pip)
---------------------------------
Platform: Python 3.10.6 on macOS-12.6-arm64-arm-64bit
Subprojects:
    dvc_data = 0.14.0
    dvc_objects = 0.5.0
    dvc_render = 0.0.11
    dvc_task = 0.1.3
    dvclive = 0.11.0
    scmrepo = 0.1.1
Supports:
    http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
    https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
    ssh (sshfs = 2022.6.0)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s3s1
Caches: local
Remotes: ssh, local
Workspace directory: apfs on /dev/disk3s3s1
Repo: dvc, git

Additional Information (if any):

Additional Info will be added when I tried the Debugging method from the Wiki. Sadly I can not publish the output for dvc import -v <directory> for now as I would have to clean it from references to our cluster. I know its fairly vague, but I do not have a clue where the issue could come from, therefore I would be happy for any pointers on where to look.

hv10 commented 1 year ago

A note: the issue exists both on linux and macos for me, I guess it's some configuration error I am unaware of, will investigate.

daavoo commented 1 year ago

Hi @hv10 ! could you share some more details on the permission error you are getting?

The traceback with the detailed exception and point in code should not contain any sensitive information

hv10 commented 1 year ago
dvc import -v git@gitlab.com:<repo_name> <file_path>
2022-10-06 12:28:17,754 ERROR: failed to transfer 'md5: ba9dbba77caf30f81ecba107250ee945' - Permission denied: Permission denied
------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/sshfs/spec.py", line 91, in _connect
    client = await self._stack.enter_async_context(_raw_client)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/contextlib.py", line 619, in enter_async_context
    result = await _cm_type.__aenter__(cm)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/asyncssh/misc.py", line 274, in __aenter__
    self._coro_result = await self._coro
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/asyncssh/connection.py", line 7834, in connect
    return await asyncio.wait_for(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/asyncio/tasks.py", line 408, in wait_for
    return await fut
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/asyncssh/connection.py", line 447, in _connect
    await options.waiter
asyncssh.misc.PermissionDenied: Permission denied

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_data/hashfile/transfer.py", line 31, in wrapper
    func(path, *args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_data/hashfile/transfer.py", line 170, in func
    return dest.add(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_data/hashfile/db/__init__.py", line 105, in add
    super().add(path, fs, oid, hardlink=hardlink, callback=callback)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/db.py", line 136, in add
    generic.transfer(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 101, in transfer
    _try_links(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 64, in _try_links
    return copy(from_fs, from_path, to_fs, to_path, callback=callback)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 44, in copy
    return from_fs.get_file(from_path, tmp_file, callback=callback)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 387, in get_file
    self.fs.get_file(from_info, to_info, callback=callback, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/fs/dvc.py", line 317, in get_file
    dvc_fs.get_file(dvc_path, lpath, callback=callback, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 387, in get_file
    self.fs.get_file(from_info, to_info, callback=callback, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_data/fs.py", line 117, in get_file
    fs.get_file(path, lpath, callback=callback, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 387, in get_file
    self.fs.get_file(from_info, to_info, callback=callback, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/funcy/objects.py", line 50, in __get__
    return prop.__get__(instance, type)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/funcy/objects.py", line 28, in __get__
    res = instance.__dict__[self.fget.__name__] = self.fget(instance)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_ssh/__init__.py", line 114, in fs
    return _SSHFileSystem(**self.fs_args)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/fsspec/spec.py", line 76, in __call__
    obj = super().__call__(*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/sshfs/spec.py", line 76, in __init__
    self._client, self._pool = self.connect(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/fsspec/asyn.py", line 111, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/fsspec/asyn.py", line 96, in sync
    raise return_result
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/fsspec/asyn.py", line 53, in _runner
    result[0] = await coro
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/sshfs/utils.py", line 29, in wrapper
    raise PermissionError(exc.reason) from exc
PermissionError: Permission denied
------------------------------------------------------------
2022-10-06 12:28:17,831 DEBUG: Removing '/Users/hv10/Software_Projects/my-import-repo/.dvc/cache/7d/.Mx8DLLzEwpNXvtbr3fnVZm.tmp'
2022-10-06 12:28:17,831 ERROR: failed to transfer 'md5: 7d5bb9babeffeb0ce9016e905756208b' - Permission denied: Permission denied
------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/sshfs/spec.py", line 91, in _connect
    client = await self._stack.enter_async_context(_raw_client)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/contextlib.py", line 619, in enter_async_context
    result = await _cm_type.__aenter__(cm)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/asyncssh/misc.py", line 274, in __aenter__
    self._coro_result = await self._coro
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/asyncssh/connection.py", line 7834, in connect
    return await asyncio.wait_for(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/asyncio/tasks.py", line 408, in wait_for
    return await fut
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/asyncssh/connection.py", line 447, in _connect
    await options.waiter
asyncssh.misc.PermissionDenied: Permission denied

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_data/hashfile/transfer.py", line 31, in wrapper
    func(path, *args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_data/hashfile/transfer.py", line 170, in func
    return dest.add(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_data/hashfile/db/__init__.py", line 105, in add
    super().add(path, fs, oid, hardlink=hardlink, callback=callback)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/db.py", line 136, in add
    generic.transfer(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 101, in transfer
    _try_links(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 64, in _try_links
    return copy(from_fs, from_path, to_fs, to_path, callback=callback)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 44, in copy
    return from_fs.get_file(from_path, tmp_file, callback=callback)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 387, in get_file
    self.fs.get_file(from_info, to_info, callback=callback, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/fs/dvc.py", line 317, in get_file
    dvc_fs.get_file(dvc_path, lpath, callback=callback, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 387, in get_file
    self.fs.get_file(from_info, to_info, callback=callback, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_data/fs.py", line 117, in get_file
    fs.get_file(path, lpath, callback=callback, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 387, in get_file
    self.fs.get_file(from_info, to_info, callback=callback, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/funcy/objects.py", line 50, in __get__
    return prop.__get__(instance, type)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/funcy/objects.py", line 28, in __get__
    res = instance.__dict__[self.fget.__name__] = self.fget(instance)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_ssh/__init__.py", line 114, in fs
    return _SSHFileSystem(**self.fs_args)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/fsspec/spec.py", line 76, in __call__
    obj = super().__call__(*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/sshfs/spec.py", line 76, in __init__
    self._client, self._pool = self.connect(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/fsspec/asyn.py", line 111, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/fsspec/asyn.py", line 96, in sync
    raise return_result
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/fsspec/asyn.py", line 53, in _runner
    result[0] = await coro
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/sshfs/utils.py", line 29, in wrapper
    raise PermissionError(exc.reason) from exc
PermissionError: Permission denied
------------------------------------------------------------
2022-10-06 12:28:17,832 DEBUG: failed to upload full contents of 'md5: ba73d37455dff7222084843515b58115.dir', aborting .dir file upload
2022-10-06 12:28:17,833 DEBUG: failed to upload 'memory://.jnXQjWdbRH6yReZSCxKzAv.tmp' to '/Users/hv10/Software_Projects/my-import-repo/.dvc/cache/ba/73d37455dff7222084843515b58115.dir'
2022-10-06 12:28:17,834 ERROR: failed to import '/data/artifacts/cases_rki' from '<my-data-repo (git@host adress)>'. - 3 files failed to transfer: 3 transfer failed
------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/data_cloud.py", line 114, in transfer
    return transfer(src_odb, dest_odb, objs, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_data/hashfile/transfer.py", line 184, in transfer
    _do_transfer(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc_data/hashfile/transfer.py", line 119, in _do_transfer
    raise TransferError(total_fails)
dvc_data.hashfile.transfer.TransferError: 3 transfer failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/commands/imp.py", line 15, in run
    self.repo.imp(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/repo/imp.py", line 6, in imp
    return self.imp_url(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/repo/scm_context.py", line 156, in run
    return method(repo, *args, **kw)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/repo/imp_url.py", line 88, in imp_url
    stage.run(jobs=jobs, no_download=no_download)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/funcy/decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/stage/decorators.py", line 43, in rwlocked
    return call()
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/funcy/decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/stage/__init__.py", line 554, in run
    self._sync_import(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/funcy/decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/stage/decorators.py", line 43, in rwlocked
    return call()
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/funcy/decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/stage/__init__.py", line 580, in _sync_import
    sync_import(self, dry, force, jobs, no_download)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/stage/imports.py", line 60, in sync_import
    stage.deps[0].download(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/dependency/repo.py", line 69, in download
    self.repo.cloud.pull(objs, jobs=jobs, odb=odb)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/data_cloud.py", line 162, in pull
    return self.transfer(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/my-data-repo/lib/python3.10/site-packages/dvc/data_cloud.py", line 116, in transfer
    raise FileTransferError(exc.fails) from exc
dvc.exceptions.FileTransferError: 3 files failed to transfer
------------------------------------------------------------
2022-10-06 12:28:17,840 DEBUG: Analytics is enabled.
2022-10-06 12:28:17,895 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/jl/qpj5k6q55896ggyklz87ysb80000gn/T/tmp_0cal_wi']'
2022-10-06 12:28:17,897 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/jl/qpj5k6q55896ggyklz87ysb80000gn/T/tmp_0cal_wi']'

I had to remove some of the output - specifically the mentions of the repository. I hope that it will help nonetheless.

hv10 commented 1 year ago

When running the dvc getcommand, I now get a PermissionDenied error as well, but the contained files of the folder got downloaded anyways... yesterday they got downloaded without the error.

hv10 commented 1 year ago

The folder contains a .gitignore file which does not get downloaded when using dvc get, the two stage outputs contained in the folder do get downloaded.

edwardwbarber commented 1 year ago

@hv10 your traceback includes the same asyncssh.misc.PermissionDenied: Permission denied error folks reported in iterative/dvc#7702 for window users using ssh-agent (granted, supposedly that is a windows-only issue). Have you tried downgrading to 2.9.5? Not a long-term solution, but better than nothing.

hv10 commented 1 year ago

Referencing the issue iterative/dvc#7702 in question, it seems that the issue is similar indeed. When running the snippet provided by a comment in the issue (dvc/issues/7702) it seems that within my environment the keys do not get picked up either.

I'll try downgrading next. Any major downsides to that? Any features I would dearly miss?

edwardwbarber commented 1 year ago

Interesting - you might want to comment in that thread if the ssh-agent issue is affecting you on linux/macOS too.

2.9.5 was before dvc data status existed. It's also not compatible with the dvc VSCode extension, if you use that. I haven't done an exhaustive check of the release notes, but those are the main two that have stood out to me.

daavoo commented 1 year ago

I am not really sure this is the same issue as iterative/dvc#7702 because of:

A note: the issue exists both on linux and macos for me, I guess it's some configuration error I am unaware of, will investigate.

And:

dvc get the same directory --> works as expected