huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.22k stars 2.69k forks source link

CI is broken due to hf-internal-testing/dataset_with_script #6796

Closed albertvillanova closed 6 months ago

albertvillanova commented 6 months ago

CI is broken for test_load_dataset_distributed_with_script. See: https://github.com/huggingface/datasets/actions/runs/8614926216/job/23609378127

FAILED tests/test_load.py::test_load_dataset_distributed_with_script[None] - assert False
 +  where False = all(<generator object test_load_dataset_distributed_with_script.<locals>.<genexpr> at 0x7f0c741de3b0>)
FAILED tests/test_load.py::test_load_dataset_distributed_with_script[force_redownload] - assert False
 +  where False = all(<generator object test_load_dataset_distributed_with_script.<locals>.<genexpr> at 0x7f0be45f6ea0>)
albertvillanova commented 6 months ago

Finally:

Maybe related to hf-internal-testing/dataset_with_script dataset: https://huggingface.co/datasets/hf-internal-testing/dataset_with_script

albertvillanova commented 6 months ago

This URL: https://datasets-server.huggingface.co/parquet?dataset=hf-internal-testing/dataset_with_script raises:

{"error":"The dataset viewer doesn't support this dataset because it runs arbitrary python code. Please open a discussion in the discussion tab if you think this is an error and tag @lhoestq and @severo."}

Was there a recent change on the Hub enforcing this behavior?

albertvillanova commented 6 months ago

OK, I just saw this PR:

Once merged and deployed, it should fix the issue.

albertvillanova commented 6 months ago

Once the script-dataset has been allowed in the dataset-viewer, we should fix our test to make the CI pass.

I am addressing this.