huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
1.84k stars 473 forks source link

OSError for Consistency check failed and the `force_download=True` doesn't work. #2372

Open sci-m-wang opened 1 week ago

sci-m-wang commented 1 week ago

Describe the bug

When I download the dataset "tyqiangz/multilingual-sentiments", I met the error that "OSError: Consistency check failed: file should be of size 0 but has size 6226 (multilingual-sentiments.py)." Thus, I follow the guide to add force_download=True in my code. The code:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="tyqiangz/multilingual-sentiments", force_download=True, repo_type="dataset", local_dir="tyqiangz/multilingual-sentiments")

Then, the error appear again.

OSError: Consistency check failed: file should be of size 0 but has size 6226 (multilingual-sentiments.py).
We are sorry for the inconvenience. Please retry with `force_download=True`.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.

Reproduction

No response

Logs

No response

System info

- huggingface_hub version: 0.16.4
- Platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.10
- Python version: 3.8.8
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /datas/wangm/.cache/huggingface/token
- Has saved token ?: True
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.0.0+cu117
- Jinja2: 2.11.3
- Graphviz: N/A
- Pydot: N/A
- Pillow: 8.2.0
- hf_transfer: N/A
- gradio: 3.36.1
- tensorboard: N/A
- numpy: 1.24.4
- pydantic: 2.0.3
- aiohttp: 3.8.4
- ENDPOINT: https://hf-mirror.com
- HUGGINGFACE_HUB_CACHE: /datas/wangm/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /datas/wangm/.cache/huggingface/assets
- HF_TOKEN_PATH: /datas/wangm/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
Wauplin commented 1 week ago

Hi @sci-m-wang, sorry you're facing this issue :confused: The file you're trying to download actually exists and is indeed 6226B long. I noticed your huggingface_hub version is far outdated (latest version being 0.23.4). Could you update this dependency and retry? Asking because many bug fixes have been introduced since then.

As a side note, if your goal is to load this dataset, I would advice you to use the datasets library directly (dataset = datasets.load_dataset("tyqiangz/multilingual-sentiments")

sci-m-wang commented 1 week ago

Actually, though I updated my huggingface_hub to 0.23.4, this issue still exists. I've downloaded this dataset manually, but the problem still needs to be fixed.