Hi,
I've been using this project and it's still very helpful, thanks a lot!
Since there's been more activity in the issues lately, I'd like to document a quick fix for other users:
When loading the gold-labeled preference dataset, it crashes with the following error:
In [2]: load_dataset('tlc4418/1.4b-policy_preference_data_gold_labelled', split=["train", "validation"])
[...]
File /usr/local/lib/python3.10/dist-packages/datasets/arrow_reader.py:480, in _rel_to_abs_instr(rel_instr, name2len)
478 split = rel_instr.splitname
479 if split not in name2len:
--> 480 raise ValueError(f'Unknown split "{split}". Should be one of {list(name2len)}.')
481 num_examples = name2len[split]
482 from_ = rel_instr.from_
ValueError: Unknown split "validation". Should be one of ['train'].
This is caused by the file structure of the tlc4418/1.4b-policy_preference_data_gold_labelled dataset.
In [15]: train, val = load_dataset('1.4b-policy_preference_data_gold_labelled', split=["train","validation"])
Repo card metadata block was not found. Setting CardData to empty.
In [16]:
The huggingface pullrequest API doesn't work for me, so I'll just leave this here as an issue for people who might encounter the same bug.
Hi, I've been using this project and it's still very helpful, thanks a lot! Since there's been more activity in the issues lately, I'd like to document a quick fix for other users:
When loading the gold-labeled preference dataset, it crashes with the following error:
This is caused by the file structure of the
tlc4418/1.4b-policy_preference_data_gold_labelled
dataset.The original folder structure is
Downloading the dataset and putting
val.json
in its own subfolder fixes this issue, i.e.:The huggingface pullrequest API doesn't work for me, so I'll just leave this here as an issue for people who might encounter the same bug.