iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.63k stars 1.18k forks source link

dvc move fails when cache dir is read-only #4232

Closed florianspecker closed 6 months ago

florianspecker commented 4 years ago
$ dvc move -v f6cf36bfd9b914e4_890986178457_00_122.png subfolder/
2020-07-18 13:12:12,250 DEBUG: fetched: [(3,)]
2020-07-18 13:12:12,696 DEBUG: Adding 'subfolder/f6cf36bfd9b914e4_890986178457_00_122.png' to 'subfolder/.gitignore'.
2020-07-18 13:12:12,700 DEBUG: Path '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/a2/d0ac500784b02d3636c4b9075a08bc' inode '224330060'
2020-07-18 13:12:12,701 DEBUG: fetched: []
2020-07-18 13:12:14,486 DEBUG: Path '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/a2/d0ac500784b02d3636c4b9075a08bc' inode '224330060'
2020-07-18 13:12:14,487 DEBUG: fetched: []
2020-07-18 13:12:14,487 DEBUG: cache '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/a2/d0ac500784b02d3636c4b9075a08bc' expected 'a2d0ac500784b02d3636c4b9075a08bc' actual 'a2d0ac500784b02d3636c4b9075a08bc'
2020-07-18 13:12:14,488 DEBUG: Path '/Users/florian/Downloads/test/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/subfolder/f6cf36bfd9b914e4_890986178457_00_122.png' inode '19290743'
2020-07-18 13:12:14,488 DEBUG: fetched: []
2020-07-18 13:12:14,523 DEBUG: Path 'subfolder/f6cf36bfd9b914e4_890986178457_00_122.png' inode '19290743'
2020-07-18 13:12:14,524 DEBUG: fetched: []
2020-07-18 13:12:14,524 DEBUG: {}
2020-07-18 13:12:14,524 DEBUG: Output 'subfolder/f6cf36bfd9b914e4_890986178457_00_122.png' didn't change. Skipping saving.
2020-07-18 13:12:14,524 DEBUG: Saving 'subfolder/f6cf36bfd9b914e4_890986178457_00_122.png' to '../../../../../tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/a2/d0ac500784b02d3636c4b9075a08bc'.
2020-07-18 13:12:14,525 DEBUG: Path '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/a2/d0ac500784b02d3636c4b9075a08bc' inode '224330060'
2020-07-18 13:12:14,525 DEBUG: fetched: [('1580290111000000000', '18671818', 'a2d0ac500784b02d3636c4b9075a08bc', '1595070734487113984')]
2020-07-18 13:12:14,525 DEBUG: cache '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/a2/d0ac500784b02d3636c4b9075a08bc' expected 'a2d0ac500784b02d3636c4b9075a08bc' actual 'a2d0ac500784b02d3636c4b9075a08bc'
2020-07-18 13:12:14,530 DEBUG: fetched: [(3,)]
2020-07-18 13:12:14,531 ERROR: unexpected error - [Errno 13] Permission denied: '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/.cache_type_test_file'
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/main.py", line 53, in main
    ret = cmd.run()
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/command/move.py", line 14, in run
    self.repo.move(self.args.src, self.args.dst)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/repo/__init__.py", line 36, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/repo/move.py", line 70, in move
    out.move(to_out)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/output/base.py", line 361, in move
    self.commit()
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/output/base.py", line 285, in commit
    self.cache.save(self.path_info, self.cache.tree, self.info)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/remote/base.py", line 1168, in save
    return self._save(path_info, tree, hash_, save_link, **kwargs)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/remote/base.py", line 1176, in _save
    return self._save_file(path_info, tree, hash_, save_link, **kwargs)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/remote/base.py", line 1086, in _save_file
    elif self.tree.iscopy(path_info) and self._cache_is_copy(
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/remote/base.py", line 1128, in _cache_is_copy
    with self.tree.open(test_cache_file, "wb") as fobj:
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/remote/local.py", line 74, in open
    return open(path_info, mode=mode, encoding=encoding)
PermissionError: [Errno 13] Permission denied: '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/.cache_type_test_file'
------------------------------------------------------------

/Volumes/dvc is a read-only NFS mount. The issue looks very similar to https://github.com/iterative/dvc/issues/3510

$ more .dvc/config
[core]
analytics = false
remote = datasetmaster
['remote "datasetmaster"']
url = s3://scandit-datasets/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral
profile = datasets

$ more .dvc/config.local
[cache]
    dir = /tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral
    type = symlink

Background on our setup: DVC pushes images to S3. Our dataset engineers have the .dvc/config as pasted above. The same config is used for our ML training runs that are being executed on AWS. Our ML developers need the same datasets, and so do the ML trainings that are executed onprem (we have our own GPUs; AWS is only used for peaks). Both these use cases have .dvc/config.local as pasted above (in addition to the same .dvc/config as everybody else). It points to a NFS share, where we sync the content of our S3 bucket. It is read-only to make sure it stays consistent.

Environment: DVC version: 1.1.10 Python version: 3.8.4 Platform: macOS-10.15.5-x86_64-i386-64bit Binary: False Package: brew Supported remotes: azure, gdrive, gs, http, https, s3, ssh, oss Repo: dvc, git

efiop commented 4 years ago

Hi @florianspecker ! Could you please remove duplicated // from cache dir path in the config and try again? Just making sure it is not that litte thing that is causing us troubles :slightly_smiling_face:

efiop commented 4 years ago

@florianspecker What are the permissions on /Volumes and /Volumes/dvc btw? Is your user able to list those dirs at all?

florianspecker commented 4 years ago

Hi @efiop thanks a lot for getting back to me so quickly! I went through all the details again, and embarrassingly enough, I've made a mess of my different folders (the issue showed first in a folder with sensitive customer data, so I reproduced it with a less critical dataset). Sorry for that - I updated the output and config above.

Regarding permissions: /tmp/dvc-data is a NFS read-only mount. My user has enough permissions to list and read everything, but writing is not possible.

$ ls -l /tmp/dvc-data/ | grep eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral
drwxr-xr-x root wheel  4 KB Wed Jan 29 10:40:03 2020 eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral

It looks like the problem is caused by attempting to write .cache_type_test_file which is not needed in this case (and fails because of the NFS ro mount).

efiop commented 4 years ago

It looks like the problem is caused by attempting to write .cache_type_test_file which is not needed in this case (and fails because of the NFS ro mount).

@florianspecker Nice catch! Indeed, looks like the cause. We can definitely handle it better.

As a workaround, could you create that file by-hand(contents don't matter, could even create an empty one) and try again?

florianspecker commented 4 years ago

@efiop thanks! Sure, no problem - now dvc move fails when trying to remove .cache_type_test_file:

$ dvc move -v  f6cf36bfd9b914e4_890986178457_00_121.png subfolder/
2020-07-20 14:58:23,798 DEBUG: '/Users/florian/Downloads/test/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/.dvc/tmp/updater' is outdated(
2020-07-20 14:58:23,946 DEBUG: Trying to spawn '['/usr/local/Cellar/dvc/1.1.10/libexec/bin/python3.8', '/usr/local/bin/dvc', 'daemon', '-q', 'updater']'
2020-07-20 14:58:23,948 DEBUG: Spawned '['/usr/local/Cellar/dvc/1.1.10/libexec/bin/python3.8', '/usr/local/bin/dvc', 'daemon', '-q', 'updater']'
2020-07-20 14:58:23,953 DEBUG: fetched: [(3,)]
2020-07-20 14:58:24,535 DEBUG: Adding 'subfolder/f6cf36bfd9b914e4_890986178457_00_121.png' to 'subfolder/.gitignore'.
2020-07-20 14:58:24,562 DEBUG: Path '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/eb/0c23f515f0eb928a04867bdaf3b247' inode '224330129'
2020-07-20 14:58:24,563 DEBUG: fetched: []
2020-07-20 14:58:26,210 DEBUG: Path '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/eb/0c23f515f0eb928a04867bdaf3b247' inode '224330129'
2020-07-20 14:58:26,210 DEBUG: fetched: []
2020-07-20 14:58:26,211 DEBUG: cache '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/eb/0c23f515f0eb928a04867bdaf3b247' expected 'eb0c23f515f0eb928a04867bdaf3b247' actual 'eb0c23f515f0eb928a04867bdaf3b247'
2020-07-20 14:58:26,216 DEBUG: Path '/Users/florian/Downloads/test/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/subfolder/f6cf36bfd9b914e4_890986178457_00_121.png' inode '19290708'
2020-07-20 14:58:26,216 DEBUG: fetched: []
2020-07-20 14:58:26,254 DEBUG: Path 'subfolder/f6cf36bfd9b914e4_890986178457_00_121.png' inode '19290708'
2020-07-20 14:58:26,254 DEBUG: fetched: []
2020-07-20 14:58:26,254 DEBUG: {}
2020-07-20 14:58:26,254 DEBUG: Output 'subfolder/f6cf36bfd9b914e4_890986178457_00_121.png' didn't change. Skipping saving.
2020-07-20 14:58:26,255 DEBUG: Saving 'subfolder/f6cf36bfd9b914e4_890986178457_00_121.png' to '../../../../../tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/eb/0c23f515f0eb928a04867bdaf3b247'.
2020-07-20 14:58:26,256 DEBUG: Path '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/eb/0c23f515f0eb928a04867bdaf3b247' inode '224330129'
2020-07-20 14:58:26,256 DEBUG: fetched: [('1580289796000000000', '19038923', 'eb0c23f515f0eb928a04867bdaf3b247', '1595249906210976000')]
2020-07-20 14:58:26,256 DEBUG: cache '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/eb/0c23f515f0eb928a04867bdaf3b247' expected 'eb0c23f515f0eb928a04867bdaf3b247' actual 'eb0c23f515f0eb928a04867bdaf3b247'
2020-07-20 14:58:26,264 DEBUG: Created 'symlink': ../../../../../tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/.cache_type_test_file -> subfolder/.d3nrBmmBLpeMYhqp8R6jt6
2020-07-20 14:58:26,264 DEBUG: Removing '/Users/florian/Downloads/test/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/subfolder/.d3nrBmmBLpeMYhqp8R6jt6'
2020-07-20 14:58:26,280 DEBUG: Removing '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/.cache_type_test_file'
2020-07-20 14:58:26,301 DEBUG: fetched: [(5,)]
2020-07-20 14:58:26,302 ERROR: unexpected error - [Errno 30] Read-only file system: '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/.cache_type_test_file'
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/utils/fs.py", line 127, in _unlink
    os.unlink(path)
OSError: [Errno 30] Read-only file system: '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/.cache_type_test_file'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/main.py", line 53, in main
    ret = cmd.run()
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/command/move.py", line 14, in run
    self.repo.move(self.args.src, self.args.dst)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/repo/__init__.py", line 36, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/repo/move.py", line 70, in move
    out.move(to_out)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/output/base.py", line 361, in move
    self.commit()
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/output/base.py", line 285, in commit
    self.cache.save(self.path_info, self.cache.tree, self.info)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/remote/base.py", line 1168, in save
    return self._save(path_info, tree, hash_, save_link, **kwargs)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/remote/base.py", line 1176, in _save
    return self._save_file(path_info, tree, hash_, save_link, **kwargs)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/remote/base.py", line 1086, in _save_file
    elif self.tree.iscopy(path_info) and self._cache_is_copy(
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/remote/base.py", line 1134, in _cache_is_copy
    self.tree.remove(test_cache_file)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/remote/local.py", line 121, in remove
    remove(path)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/utils/fs.py", line 139, in remove
    _unlink(path, _chmod)
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/utils/fs.py", line 129, in _unlink
    onerror(os.unlink, path, sys.exc_info())
  File "/usr/local/Cellar/dvc/1.1.10/libexec/lib/python3.8/site-packages/dvc/utils/fs.py", line 116, in _chmod
    os.chmod(p, perm)
OSError: [Errno 30] Read-only file system: '/tmp/dvc-data/eac-sot-generic-sm_960u1-1d-ean13-4k-small-30fps-no_occlusion-spiral/.cache_type_test_file'
efiop commented 4 years ago

@florianspecker Thanks! So no workaround for now other than making it not read-only :slightly_frowning_face: Need to take a closer look and solve properly. Is it blocking you right now or are you able to use a non-read-only fs for now?

florianspecker commented 4 years ago

Our users can't switch the file system to read/write. But it's not super urgent, as there are not many move operations, and we can always move manually (i.e. move the image, move the .dvc file, remove entries in the source .gitignore, add entries to the destination .gitignore).

clementperon commented 2 years ago

Got the same error when doing a dvc pull:

I can try to bissect this:

$> dvc --version 
2.10.1
$ dvc pull -f /home/clement/XXX/data/sample_02/sample.pcap.dvc
ERROR: unexpected error - [Errno 30] Read-only file system: '/home/clement/XXX/.dvc/remote-cache/.LntrY6EbX7ucmjTfwdnDT6.tmp'                                                                                                                          
Traceback (most recent call last):
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/cli/__init__.py", line 89, in main
    ret = cmd.do_run()
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/commands/data_sync.py", line 31, in run
    stats = self.repo.pull(
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/repo/pull.py", line 46, in pull
    stats = self.checkout(
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/repo/checkout.py", line 98, in checkout
    result = stage.checkout(
  File "/home/clement/.local/lib/python3.9/site-packages/funcy/decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/stage/decorators.py", line 36, in rwlocked
    return call()
  File "/home/clement/.local/lib/python3.9/site-packages/funcy/decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/stage/__init__.py", line 573, in checkout
    key, outs = self._checkout(
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/stage/__init__.py", line 585, in _checkout
    result = out.checkout(**kwargs)
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/output.py", line 747, in checkout
    modified = checkout(
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/data/checkout.py", line 240, in checkout
    _checkout(
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/data/checkout.py", line 161, in _checkout
    links = test_links(cache.cache_types, cache.fs, cache.fs_path, fs, fs_path)
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/fs/utils.py", line 154, in test_links
    with from_fs.open(from_file, "wb") as fobj:
  File "/home/clement/.local/lib/python3.9/site-packages/dvc/fs/local.py", line 31, in open
    return open(path, mode=mode, encoding=encoding)
OSError: [Errno 30] Read-only file system: '/home/clement/XXX/.dvc/remote-cache/.LntrY6EbX7ucmjTfwdnDT6.tmp'
dberenbaum commented 2 years ago

@clementperon Sorry for the delay. Is '/home/clement/XXX/.dvc/remote-cache/` the dvc remote from which you are trying to pull?

clementperon commented 2 years ago

@dberenbaum yes it's a mount point either sshfs or nfs mounted in Read-only.

I shared data throurgh multiple clients, each client mount the data folder in sshfs or nfs and I don't want them to write in this folder.

Unfortunately dvc try to write a tmp file in this folder... Is it possible to change the location where this tmp file is located ?

mount point: NFS: 10.98.X.Y:/media/raid/dvc on /home/clement/work/XXX/.dvc/remote-cache type nfs4 (ro,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.98.X.Z,local_lock=none,addr=10.98.X.Y)

SSHFS user@10.98.X.Y:/media/raid/dvc on /home/clement/work/XXX/.dvc/remote-cache type fuse.sshfs (ro,nosuid,nodev,relatime,user_id=1000,group_id=1000)

dberenbaum commented 2 years ago

Unfortunately I'm not aware of any workarounds at the moment.

florianspecker commented 2 years ago

For how we use dvc, staying with version 2.1.0 does the trick. It's slow, but at least it works.

dberenbaum commented 2 years ago

For how we use dvc, staying with version 2.1.0 does the trick. It's slow, but at least it works.

@florianspecker Is there some regression you have noted after 2.1.0? Your original issue was with 1.1.10. Was it fixed at some point?

florianspecker commented 2 years ago

@dberenbaum yes with later versions it's not just dvc move that's broken, but also dvc pull. I didn't bother reporting on it as I was under the impression that our setup with read-only mounts was too specific to be of general interest.