huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.18k stars 354 forks source link

Wrong exception handling when loading dataset from local disk #173

Open ganler opened 3 weeks ago

ganler commented 3 weeks ago

https://github.com/huggingface/alignment-handbook/blob/606d2e954fd17999af40e6fb4f712055ca11b2f0/src/alignment/data.py#L216-L221

Actual exception is ValueError:

[rank5]: Traceback (most recent call last):
[rank5]:   File "run_sft.py", line 251, in <module>
[rank5]:     main()
[rank5]:   File "run_sft.py", line 86, in main
[rank5]:     raw_datasets = get_datasets(
[rank5]:   File "miniconda3/envs/handbook/lib/python3.10/site-packages/alignment/data.py", line 169, in get_datasets
[rank5]:     raw_datasets = mix_datasets(
[rank5]:   File "miniconda3/envs/handbook/lib/python3.10/site-packages/alignment/data.py", line 218, in mix_datasets
[rank5]:     dataset = load_dataset(ds, ds_config, split=split)
[rank5]:   File "miniconda3/envs/handbook/lib/python3.10/site-packages/datasets/load.py", line 2570, in load_dataset
[rank5]:     raise ValueError(
[rank5]: ValueError: You are trying to load a dataset that was saved using `save_to_disk`. Please use `load_from_disk` instead.

Dataset version:

❯ pip show datasets
Name: datasets
Version: 2.19.1
Summary: HuggingFace community-driven open-source library of datasets
Home-page: https://github.com/huggingface/datasets
Author: HuggingFace Inc.
Author-email: thomas@huggingface.co
License: Apache 2.0
Location: /home/ec2-user/miniconda3/envs/handbook/lib/python3.10/site-packages
Requires: aiohttp, dill, filelock, fsspec, huggingface-hub, multiprocess, numpy, packaging, pandas, pyarrow, pyarrow-hotfix, pyyaml, requests, tqdm, xxhash
Required-by: alignment-handbook, evaluate, trl

Also tried the latest 2.19.2 and got the same error. Need to broaden the exceptions to capture.