Closed guerrapin closed 5 years ago
Hi @guerrapin !
I receive a little bit different error, but none the less, can confirm the bug:
#!/bin/bash
set -e
set -x
rm -rf myrepo
mkdir myrepo
cd myrepo
git init
dvc init
mkdir dir
echo foo > dir/foo
dvc add dir
dvc remote add -d upstream $(mktemp -d)
dvc push
rm -rf dir
rm -rf .dvc/cache
dvc pull
Looks like something is wrong with directory linking. Looking into it right now.
Correction: the error is the same.
Got the same error, if it helps here is the output from dvc pull -v
Debug: fetched: [(921540, '1543417823157699328', '97eaf6e4e0cc4a84934b8f1f8f331417', '1543430639580068864')]
Debug: Inode '921540', mtime '1543417823157699328', actual mtime '1543417823157699328'.
Debug: UPDATE state SET timestamp = "1543430639622790144" WHERE inode = 921540
Debug: File '/home/vsionai/projects/vsionai/tmp/data-science-example/.dvc/cache/97/eaf6e4e0cc4a84934b8f1f8f331417', md5 '97eaf6e4e0cc4a84934b8f1f8f331417', actual '97eaf6e4e0cc4a84934b8f1f8f331417'
Checking out 'data/processed' with cache 'f05c98a20a3d3e282bd36a3e9f41278f.dir'.
Linking directory 'data/processed'.
Debug: SELECT count from state_info WHERE rowid=1
Debug: fetched: [(32,)]
Debug: UPDATE state_info SET count = 32 WHERE rowid = 1
Error: Traceback (most recent call last):
File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/command/data_sync.py", line 29, in do_run
force=self.args.force)
File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/project.py", line 789, in pull
self.checkout(target=target, with_deps=with_deps, force=force)
File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/project.py", line 500, in checkout
stage.checkout(force=force)
File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/stage.py", line 483, in checkout
out.checkout(force=force)
File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/output/local.py", line 69, in checkout
force=force)
File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/remote/local.py", line 329, in checkout
if force or self._already_cached(p):
File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/remote/local.py", line 349, in _already_cached
return not self.changed_cache(current_md5)
File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/remote/local.py", line 112, in changed_cache
if self.changed_cache_file(md5):
File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/remote/local.py", line 95, in changed_cache_file
if self.state.changed(cache, md5=md5):
File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/state.py", line 88, in changed
actual = self.update(path)
File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/state.py", line 323, in update
return self._do_update(path)[0]
File "/home/vsionai/.local/lib/python3.6/site-packages/dvc/state.py", line 274, in _do_update
if not os.path.exists(path):
File "/usr/lib/python3.6/genericpath.py", line 19, in exists
os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
Error: Failed to pull data from the cloud: stat: path should be string, bytes, os.PathLike or integer, not NoneType
With data.dvc containing:
deps:
- md5: 3dcb1133b71ee29e89a27c952c8831e3.dir
path: data/raw
- md5: c46da892a15ff5c9425bd3fddfee1a14
path: src/data/split_dataset.py
md5: ded22f652f899bd36c8822c9da6747f2
outs:
- cache: true
md5: f05c98a20a3d3e282bd36a3e9f41278f.dir
path: data/processed```
Thanks for reporting the errors, @guerrapin, @stvogel, it was very helpful! :slightly_smiling_face:
I submitted a patch at https://github.com/iterative/dvc/pull/1378 and hope to release it today.
A work around for this could be using the --force
option, just making sure your working directory is not dirty (with changes that could be overwritten or removed when the checkout
happends; always be careful when using the --force
).
Thanks to @mroutis for a lightning fast :zap: fix! 0.21.2 is out, please upgrade. :tada:
I'm blown away. 0.21.2 really fixed it. Thanks a lot ... you're lighning fast! Do you guys ever sleep? :-)
Wow, many thanks guys for fixing this so quickly ! :) It works fine now.
Got the same the problem. The update fixed it and everything works perfectly. thanks 🙏🏼
I am having the same issue while using it with Google Drive and DVC version 1.1.7
Any resolution for this. Please
@imflash217 Please share full log. And also $ dvc version
output, please :slightly_smiling_face:
Hi @efiop ,
please find below the log and version.
My data directory has a folder data/doppler/*.wav
with multiple wav files. I want to track all wav files in this doppler
directory.
$ dvc version
DVC version: 1.1.7
Python version: 3.8.3
Platform: macOS-10.15.5-x86_64-i386-64bit
Binary: False
Package: brew
Supported remotes: azure, gdrive, gs, http, https, s3, ssh, oss
@guerrapin Looks like an unrelated issue. Could you please create a new issue for it so we could continue there? Please provide full error too with dvc pull -v
.
Thanks @efiop,
I will create a new issue. Btw, here is the result of dvc pull -v
❯ dvc pull -v
2020-07-15 17:14:02,013 DEBUG: '/Users/imflash217/Google Drive/MS@NCState/research@NCState/research@ejlobaton/fPCG/.dvc/tmp/updater' is outdated(
2020-07-15 17:14:02,139 DEBUG: Trying to spawn '['/usr/local/Cellar/dvc/1.1.7/libexec/bin/python3.8', '/usr/local/bin/dvc', 'daemon', '-q', 'updater']'
2020-07-15 17:14:02,141 DEBUG: Spawned '['/usr/local/Cellar/dvc/1.1.7/libexec/bin/python3.8', '/usr/local/bin/dvc', 'daemon', '-q', 'updater']'
2020-07-15 17:14:02,147 DEBUG: fetched: [(3,)]
2020-07-15 17:14:02,158 DEBUG: Assuming '/Users/imflash217/Google Drive/MS@NCState/research@NCState/research@ejlobaton/fPCG/.dvc/cache/da/984e1320d7ec524e58cb963fb97140.dir' is unchanged since it is read-only
2020-07-15 17:14:02,159 DEBUG: Assuming '/Users/imflash217/Google Drive/MS@NCState/research@NCState/research@ejlobaton/fPCG/.dvc/cache/da/984e1320d7ec524e58cb963fb97140.dir' is unchanged since it is read-only
2020-07-15 17:14:02,163 DEBUG: Preparing to download data from 'gdrive://1CIFrE2mUTFjLpIC3ZGW0gZDUg8B3iz4m'
2020-07-15 17:14:02,163 DEBUG: Preparing to collect status from gdrive://1CIFrE2mUTFjLpIC3ZGW0gZDUg8B3iz4m
2020-07-15 17:14:02,163 DEBUG: Collecting information from local cache...
2020-07-15 17:14:02,165 DEBUG: cache '/Users/imflash217/Google Drive/MS@NCState/research@NCState/research@ejlobaton/fPCG/.dvc/cache/2e/69e0192deb5acc75e9de58fd61197d' expected '2e69e0192deb5acc75e9de58fd61197d' actual 'None'
2020-07-15 17:14:02,166 DEBUG: cache '/Users/imflash217/Google Drive/MS@NCState/research@NCState/research@ejlobaton/fPCG/.dvc/cache/b6/099fbec776b7b0570cbcf758c3ec42' expected 'b6099fbec776b7b0570cbcf758c3ec42' actual 'None'
2020-07-15 17:14:02,176 DEBUG: Assuming '/Users/imflash217/Google Drive/MS@NCState/research@NCState/research@ejlobaton/fPCG/.dvc/cache/da/984e1320d7ec524e58cb963fb97140.dir' is unchanged since it is read-only
2020-07-15 17:14:02,176 DEBUG: cache '/Users/imflash217/Google Drive/MS@NCState/research@NCState/research@ejlobaton/fPCG/.dvc/cache/2e/56e6243baef4a58c986e45a51bb466' expected '2e56e6243baef4a58c986e45a51bb466' actual 'None'
2020-07-15 17:14:02,177 DEBUG: cache '/Users/imflash217/Google Drive/MS@NCState/research@NCState/research@ejlobaton/fPCG/.dvc/cache/10/66debeac6e415bdd2cf665b53c2053' expected '1066debeac6e415bdd2cf665b53c2053' actual 'None'
2020-07-15 17:14:02,177 DEBUG: cache '/Users/imflash217/Google Drive/MS@NCState/research@NCState/research@ejlobaton/fPCG/.dvc/cache/d4/1d8cd98f00b204e9800998ecf8427e' expected 'd41d8cd98f00b204e9800998ecf8427e' actual 'None'
2020-07-15 17:14:02,178 DEBUG: Collecting information from remote cache...
2020-07-15 17:14:02,178 DEBUG: Querying 1 hashes via object_exists
2020-07-15 17:14:02,510 DEBUG: GDrive remote auth with config '{'client_config_backend': 'settings', 'client_config_file': 'client_secrets.json', 'save_credentials': True, 'oauth_scope': ['https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/drive.appdata'], 'save_credentials_backend': 'file', 'save_credentials_file': '/Users/imflash217/Google Drive/MS@NCState/research@NCState/research@ejlobaton/fPCG/.dvc/tmp/gdrive-user-credentials.json', 'get_refresh_token': True, 'client_config': {'client_id': '710796635688-iivsgbgsb6uv1fap6635dhvuei09o66c.apps.googleusercontent.com', 'client_secret': 'a1Fz5bwhdbndhsabdh9uTSKDLJXv', 'auth_uri': 'https://accounts.google.com/o/oauth2/auth', 'token_uri': 'https://oauth2.googleapis.com/token', 'revoke_uri': 'https://oauth2.googleapis.com/revoke', 'redirect_uri': ''}}'.
2020-07-15 17:14:04,357 DEBUG: Indexing new .dir 'da984e1320d7ec524e58cb963fb97140.dir' with '5' nested files
2020-07-15 17:14:04,361 DEBUG: Downloading 'gdrive://1CIFrE2mUTFjLpIC3ZGW0gZDUg8B3iz4m/d4/1d8cd98f00b204e9800998ecf8427e' to '.dvc/cache/d4/1d8cd98f00b204e9800998ecf8427e'
2020-07-15 17:14:04,362 DEBUG: Downloading 'gdrive://1CIFrE2mUTFjLpIC3ZGW0gZDUg8B3iz4m/2e/69e0192deb5acc75e9de58fd61197d' to '.dvc/cache/2e/69e0192deb5acc75e9de58fd61197d'
2020-07-15 17:14:04,363 DEBUG: Downloading 'gdrive://1CIFrE2mUTFjLpIC3ZGW0gZDUg8B3iz4m/b6/099fbec776b7b0570cbcf758c3ec42' to '.dvc/cache/b6/099fbec776b7b0570cbcf758c3ec42'
2020-07-15 17:14:04,368 DEBUG: Downloading 'gdrive://1CIFrE2mUTFjLpIC3ZGW0gZDUg8B3iz4m/10/66debeac6e415bdd2cf665b53c2053' to '.dvc/cache/10/66debeac6e415bdd2cf665b53c2053'
2020-07-15 17:14:04,368 DEBUG: Downloading 'gdrive://1CIFrE2mUTFjLpIC3ZGW0gZDUg8B3iz4m/2e/56e6243baef4a58c986e45a51bb466' to '.dvc/cache/2e/56e6243baef4a58c986e45a51bb466'
2020-07-15 17:14:05,163 ERROR: failed to download 'gdrive://1CIFrE2mUTFjLpIC3ZGW0gZDUg8B3iz4m/d4/1d8cd98f00b204e9800998ecf8427e' to '.dvc/cache/d4/1d8cd98f00b204e9800998ecf8427e' - <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/137HimUcCkoFRJMIVDkkfOJAZ8lHIEHhI?alt=media returned "Request range not satisfiable">
------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/pydrive2/files.py", line 339, in GetContentFile
download(fd, files.get_media(fileId=file_id))
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/pydrive2/files.py", line 329, in download
status, done = downloader.next_chunk()
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/googleapiclient/http.py", line 749, in next_chunk
raise HttpError(resp, content, uri=self._uri)
googleapiclient.errors.HttpError: <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/137HimUcCkoFRJMIVDkkfOJAZ8lHIEHhI?alt=media returned "Request range not satisfiable">
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/dvc/remote/local.py", line 328, in wrapper
func(from_info, to_info, *args, **kwargs)
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/dvc/remote/base.py", line 460, in download
return self._download_file(
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/dvc/remote/base.py", line 518, in _download_file
self._download( # noqa, pylint: disable=no-member
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/dvc/remote/gdrive.py", line 582, in _download
self._gdrive_download_file(item_id, to_file, name, no_progress_bar)
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/funcy/decorators.py", line 39, in wrapper
return deco(call, *dargs, **dkwargs)
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/funcy/flow.py", line 122, in retry
return call()
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/funcy/decorators.py", line 60, in __call__
return self._func(*self._args, **self._kwargs)
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/dvc/remote/gdrive.py", line 392, in _gdrive_download_file
gdrive_file.GetContentFile(to_file, callback=pbar.update_to)
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/pydrive2/auth.py", line 84, in _decorated
return decoratee(self, *args, **kwargs)
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/pydrive2/files.py", line 346, in GetContentFile
raise exc
pydrive2.files.ApiRequestError: <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/137HimUcCkoFRJMIVDkkfOJAZ8lHIEHhI?alt=media returned "Request range not satisfiable">
------------------------------------------------------------
2020-07-15 17:14:08,867 DEBUG: fetched: [(101,)]
Everything is up to date.
2020-07-15 17:14:08,869 ERROR: failed to pull data from the cloud - 1 files failed to download
------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/dvc/command/data_sync.py", line 26, in run
stats = self.repo.pull(
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/dvc/repo/__init__.py", line 36, in wrapper
ret = f(repo, *args, **kwargs)
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/dvc/repo/pull.py", line 25, in pull
processed_files_count = self._fetch( # pylint: disable=protected-access
File "/usr/local/Cellar/dvc/1.1.7/libexec/lib/python3.8/site-packages/dvc/repo/fetch.py", line 73, in _fetch
raise DownloadError(failed)
dvc.exceptions.DownloadError: 1 files failed to download
------------------------------------------------------------
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
Is it possibly due to dvc remote
being set to google-drive folder? I am experiencing this in gdrive cases only.
@imflash217 yes, clearly you are using gdrive remote. Is it unexpected?
@imflash217 yes, clearly you are using gdrive remote. Is it unexpected?
No. I have setup gdrive as remote intentionally.
@efiop , Seems like the issue might be because unintentionally my Mac system create an empty file called Icon
and DVC was tracking that but gdrive
failed to track it as it was empty & hence during dvc pull
DVC could not pull the Icon file as it was not in my gdrive remote. Let me test this and see if I am correct.
Similar issue happened to me while dvc pull
could not download an empty file pushed previously from another machine.
I could confirm that the requested file d4/1d8cd98f00b204e9800998ecf8427e
exists in GDrive but is of size 0.
dvc version:
╰─± dvc version
DVC version: 1.2.0
Python version: 3.6.10
Platform: Linux-4.18.0-25-generic-x86_64-with-debian-buster-sid
Binary: False
Package: pip
Supported remotes: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache: reflink - not supported, hardlink - supported, symlink - supported
Repo: dvc, git
dvc pull -v:
2020-07-26 16:43:19,259 ERROR: failed to download 'gdrive://1GH3MnX_s1TerZvTUYXCfkpMeULk00RNt/d4/1d8cd98f00b204e9800998ecf8427e' to '.dvc/cache/d4/1d8cd98f00b204e9800998ecf8427e' - <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/1wB601uNCV7fYmXFULr4dUpTUSDh4l47A?alt=media returned "Request range not satisfiable">
------------------------------------------------------------
Traceback (most recent call last):
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/pydrive2/files.py", line 339, in GetContentFile
download(fd, files.get_media(fileId=file_id))
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/pydrive2/files.py", line 329, in download
status, done = downloader.next_chunk()
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/googleapiclient/http.py", line 749, in next_chunk
raise HttpError(resp, content, uri=self._uri)
googleapiclient.errors.HttpError: <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/1wB601uNCV7fYmXFULr4dUpTUSDh4l47A?alt=media returned "Request range not satisfiable">
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/dvc/cache/local.py", line 30, in wrapper
func(from_info, to_info, *args, **kwargs)
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/dvc/tree/base.py", line 430, in download
from_info, to_info, name, no_progress_bar, file_mode, dir_mode
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/dvc/tree/base.py", line 488, in _download_file
from_info, tmp_file, name=name, no_progress_bar=no_progress_bar
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/dvc/tree/gdrive.py", line 581, in _download
self._gdrive_download_file(item_id, to_file, name, no_progress_bar)
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/funcy/decorators.py", line 39, in wrapper
return deco(call, *dargs, **dkwargs)
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/funcy/flow.py", line 122, in retry
return call()
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/funcy/decorators.py", line 60, in __call__
return self._func(*self._args, **self._kwargs)
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/dvc/tree/gdrive.py", line 391, in _gdrive_download_file
gdrive_file.GetContentFile(to_file, callback=pbar.update_to)
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/pydrive2/auth.py", line 84, in _decorated
return decoratee(self, *args, **kwargs)
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/pydrive2/files.py", line 346, in GetContentFile
raise exc
pydrive2.files.ApiRequestError: <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/1wB601uNCV7fYmXFULr4dUpTUSDh4l47A?alt=media returned "Request range not satisfiable">
------------------------------------------------------------
2020-07-26 16:43:19,263 DEBUG: fetched: [(226,)]
Everything is up to date.
2020-07-26 16:43:19,264 ERROR: failed to pull data from the cloud - 1 files failed to download
------------------------------------------------------------
Traceback (most recent call last):
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/dvc/command/data_sync.py", line 36, in run
run_cache=self.args.run_cache,
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/dvc/repo/__init__.py", line 34, in wrapper
ret = f(repo, *args, **kwargs)
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/dvc/repo/pull.py", line 34, in pull
run_cache=run_cache,
File "/home/user/.local/pipx/venvs/dvc/lib/python3.6/site-packages/dvc/repo/fetch.py", line 73, in _fetch
raise DownloadError(failed)
dvc.exceptions.DownloadError: 1 files failed to download
------------------------------------------------------------
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
@imflash217 @we-taper Created https://github.com/iterative/dvc/issues/4286 . Let's move the discussion there. Thanks for the feedback!
When pulling data from remote storage, I execute the following command:
dvc pull train.dvc
with content of the file: train.dvc
Then obtain the following error:
It seems that it happens because of the output of
train.dvc
is a directory. It works fine when it's a file.some infos: dvc==0.21.0 installed with pip macOS 10.14