huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.12k stars 2.66k forks source link

`ImportError`: cannot import name 'insecure_hashlib' from 'huggingface_hub.utils' (.../huggingface_hub/utils/__init__.py) #6563

Closed wasertech closed 9 months ago

wasertech commented 9 months ago

Describe the bug

Yep its not there anymore.

+ python /home/trainer/sft_train.py --model_name cognitivecomputations/dolphin-2.2.1-mistral-7b --dataset_name wasertech/OneOS --load_in_4bit --use_peft --batch_size 4 --num_train_epochs 1 --learning_rate 1.41e-5 --gradient_accumulation_steps 8 --seq_length 4096 --output_dir output --log_with wandb
Traceback (most recent call last):
  File "/home/trainer/sft_train.py", line 22, in <module>
    from datasets import load_dataset
  File "/home/trainer/llm-train/lib/python3.8/site-packages/datasets/__init__.py", line 22, in <module>
    from .arrow_dataset import Dataset
  File "/home/trainer/llm-train/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 66, in <module>
    from .arrow_reader import ArrowReader
  File "/home/trainer/llm-train/lib/python3.8/site-packages/datasets/arrow_reader.py", line 30, in <module>
    from .download.download_config import DownloadConfig
  File "/home/trainer/llm-train/lib/python3.8/site-packages/datasets/download/__init__.py", line 9, in <module>
    from .download_manager import DownloadManager, DownloadMode
  File "/home/trainer/llm-train/lib/python3.8/site-packages/datasets/download/download_manager.py", line 31, in <module>
    from ..utils import tqdm as hf_tqdm
  File "/home/trainer/llm-train/lib/python3.8/site-packages/datasets/utils/__init__.py", line 19, in <module>
    from .info_utils import VerificationMode
  File "/home/trainer/llm-train/lib/python3.8/site-packages/datasets/utils/info_utils.py", line 5, in <module>
    from huggingface_hub.utils import insecure_hashlib
ImportError: cannot import name 'insecure_hashlib' from 'huggingface_hub.utils' (/home/trainer/llm-train/lib/python3.8/site-packages/huggingface_hub/utils/__init__.py)

Steps to reproduce the bug

Using datasets==2.16.1 and huggingface_hub== 0.17.3, load a dataset with load_dataset.

Expected behavior

The dataset should be (downloaded - if needed - and) returned.

Environment info

trainer@a311ae86939e:/mnt$ pip show datasets
Name: datasets
Version: 2.16.1
Summary: HuggingFace community-driven open-source library of datasets
Home-page: https://github.com/huggingface/datasets
Author: HuggingFace Inc.
Author-email: thomas@huggingface.co
License: Apache 2.0
Location: /home/trainer/llm-train/lib/python3.8/site-packages
Requires: packaging, pyyaml, multiprocess, pyarrow-hotfix, pandas, pyarrow, xxhash, dill, numpy, aiohttp, tqdm, fsspec, requests, filelock, huggingface-hub
Required-by: trl, lm-eval, evaluate

trainer@a311ae86939e:/mnt$ pip show huggingface_hub
Name: huggingface-hub
Version: 0.17.3
Summary: Client library to download and publish models, datasets and other repos on the huggingface.co hub
Home-page: https://github.com/huggingface/huggingface_hub
Author: Hugging Face, Inc.
Author-email: julien@huggingface.co
License: Apache
Location: /home/trainer/llm-train/lib/python3.8/site-packages
Requires: requests, pyyaml, packaging, typing-extensions, tqdm, filelock, fsspec
Required-by: transformers, tokenizers, peft, evaluate, datasets, accelerate

trainer@a311ae86939e:/mnt$ huggingface-cli env

Copy-and-paste the text below in your GitHub issue.

- huggingface_hub version: 0.17.3
- Platform: Linux-6.5.13-7-MANJARO-x86_64-with-glibc2.29
- Python version: 3.8.10
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/trainer/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: wasertech
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.1.2
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A
- Pillow: 10.2.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.24.4
- pydantic: N/A
- aiohttp: 3.9.1
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /home/trainer/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /home/trainer/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/trainer/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
wasertech commented 9 months ago

@Wauplin Do you happen to know what's up?

wasertech commented 9 months ago

Installing datasets from main did the trick so I guess it will be fixed in the next release.

NVM https://github.com/huggingface/datasets/blob/d26abadce0b884db32382b92422d8a6aa997d40a/src/datasets/utils/info_utils.py#L5

Wauplin commented 9 months ago

@wasertech upgrading huggingface_hub to a newer version should fix your issue. Latest version is 0.20.2.

wasertech commented 9 months ago

Ha yes I had pinned tokenizers to an old version so it downgraded huggingface_hub. Note to myself keep HuggingFace modules relatively close together chronologically release wise.

Wauplin commented 9 months ago

Glad to know your problem's solved!

wasertech commented 9 months ago

@Wauplin Thanks for your insight 👍

mercury1216063891 commented 7 months ago

pip install --upgrade huggingface-hub