huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.31k stars 2.7k forks source link

load_dataset ignores cached datasets and tries to hit HF Hub, resulting in API rate limit errors #7086

Open tginart opened 3 months ago

tginart commented 3 months ago

Describe the bug

I have been running lm-eval-harness a lot which has results in an API rate limit. This seems strange, since all of the data should be cached locally. I have in fact verified this.

Steps to reproduce the bug

  1. Be Me
  2. Run load_dataset("TAUR-Lab/MuSR")
  3. Hit rate limit error
  4. Dataset is in .cache/huggingface/datasets
  5. ???

Expected behavior

We should not run into API rate limits if we have cached the dataset

Environment info

datasets 2.16.0 python 3.10.4