huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.29k stars 2.7k forks source link

datasets.exceptions.DatasetNotFoundError for private dataset #7194

Closed kdutia closed 1 month ago

kdutia commented 1 month ago

Describe the bug

The following Python code tries to download a private dataset and fails with the error datasets.exceptions.DatasetNotFoundError: Dataset 'ClimatePolicyRadar/all-document-text-data-weekly' doesn't exist on the Hub or cannot be accessed.. Downloading a public dataset doesn't work.

from datasets import load_dataset
_ = load_dataset("ClimatePolicyRadar/all-document-text-data-weekly")

This seems to be just an issue with my machine config as the code above works with a colleague's machine. So far I have tried:

My output of huggingface-cli whoami:

kdutia
orgs:  ClimatePolicyRadar

Steps to reproduce the bug

python
Python 3.12.2 (main, Feb  6 2024, 20:19:44) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from datasets import load_dataset
>>> _ = load_dataset("ClimatePolicyRadar/all-document-text-data-weekly")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kalyan/Library/Caches/pypoetry/virtualenvs/open-data-cnKQNmjn-py3.12/lib/python3.12/site-packages/datasets/load.py", line 2074, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kalyan/Library/Caches/pypoetry/virtualenvs/open-data-cnKQNmjn-py3.12/lib/python3.12/site-packages/datasets/load.py", line 1795, in load_dataset_builder
    dataset_module = dataset_module_factory(
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kalyan/Library/Caches/pypoetry/virtualenvs/open-data-cnKQNmjn-py3.12/lib/python3.12/site-packages/datasets/load.py", line 1659, in dataset_module_factory
    raise e1 from None
  File "/Users/kalyan/Library/Caches/pypoetry/virtualenvs/open-data-cnKQNmjn-py3.12/lib/python3.12/site-packages/datasets/load.py", line 1597, in dataset_module_factory
    raise DatasetNotFoundError(f"Dataset '{path}' doesn't exist on the Hub or cannot be accessed.") from e
datasets.exceptions.DatasetNotFoundError: Dataset 'ClimatePolicyRadar/all-document-text-data-weekly' doesn't exist on the Hub or cannot be accessed.
>>>

Expected behavior

The dataset downloads successfully.

Environment info

From huggingface-cli env:

- huggingface_hub version: 0.25.1
- Platform: macOS-14.2.1-arm64-arm-64bit
- Python version: 3.12.2
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: /Users/kalyan/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: kdutia
- Configured git credential helpers: osxkeychain
- FastAI: N/A
- Tensorflow: N/A
- Torch: N/A
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: N/A
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 2.1.1
- pydantic: N/A
- aiohttp: 3.10.8
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /Users/kalyan/.cache/huggingface/hub
- HF_ASSETS_CACHE: /Users/kalyan/.cache/huggingface/assets
- HF_TOKEN_PATH: /Users/kalyan/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10

from datasets-cli env:

- `datasets` version: 3.0.1
- Platform: macOS-14.2.1-arm64-arm-64bit
- Python version: 3.12.2
- `huggingface_hub` version: 0.25.1
- PyArrow version: 17.0.0
- Pandas version: 2.2.3
- `fsspec` version: 2024.6.1
amansingh2116 commented 1 month ago

Actually there is no such dataset available, that is why you are getting that error.

davanstrien commented 1 month ago

Fixed with @kdutia in Slack chat. Generating a new token fixed this issue.