load_dataset ignores cached datasets and tries to hit HF Hub, resulting in API rate limit errors

huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

https://huggingface.co/docs/datasets

Apache License 2.0

19.31k stars 2.7k forks source link

load_dataset ignores cached datasets and tries to hit HF Hub, resulting in API rate limit errors #7086

Open tginart opened 3 months ago

tginart commented 3 months ago

Describe the bug

I have been running lm-eval-harness a lot which has results in an API rate limit. This seems strange, since all of the data should be cached locally. I have in fact verified this.

Steps to reproduce the bug

Be Me
Run load_dataset("TAUR-Lab/MuSR")
Hit rate limit error
Dataset is in .cache/huggingface/datasets
???

Expected behavior

We should not run into API rate limits if we have cached the dataset

Environment info

datasets 2.16.0 python 3.10.4