huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
18.97k stars 2.62k forks source link

load_dataset on AWS lambda throws OSError(30, 'Read-only file system') error #7029

Open sugam-nexusflow opened 2 months ago

sugam-nexusflow commented 2 months ago

Describe the bug

I'm using AWS lambda to run a python application. I run the load_dataset function with cache_dir="/tmp" and is still throws the OSError(30, 'Read-only file system') error. Is even updated all the HF envs to point to /tmp dir but the issue still persists. I can confirm that the I can write to /tmp directory.

Steps to reproduce the bug

d = load_dataset(
      path=hugging_face_link,
      split=split,
      token=token,
      cache_dir="/tmp/hugging_face_cache",
 )

Expected behavior

Everything written to the file system as part of the load_datasets function should be in the /tmp directory.

Environment info

datasets version: 2.16.1 Platform: Linux-5.10.216-225.855.amzn2.x86_64-x86_64-with-glibc2.26 Python version: 3.11.9 huggingface_hub version: 0.19.4 PyArrow version: 16.1.0 Pandas version: 2.2.2 fsspec version: 2023.10.0

lhoestq commented 1 month ago

hi ! can you share the full stack trace ? this should help locate what files is not written in the cache_dir