uclaml / SPPO

The official implementation of Self-Play Preference Optimization (SPPO)
https://uclaml.github.io/SPPO/
Apache License 2.0
500 stars 62 forks source link

ConnectionError: Couldn't reach 'synthetic_data_llama-3-8b-instruct-sppo-iter3_score' on the Hub (ConnectionError) #2

Open xinghuang2050 opened 5 months ago

xinghuang2050 commented 5 months ago

Great work!
I commented all the push_to_hub in the code. Is synthetic_data_llama-3-8b-instruct-sppo-iter3_score dataset generated by PairRM?

rank4: Traceback (most recent call last): rank4: File "/training-data/huangxing/software/SPPO/sppo/run_dpo.py", line 249, in

rank4: File "/training-data/huangxing/software/SPPO/sppo/run_dpo.py", line 43, in main rank4: main_inner(model_args, data_args, training_args) rank4: File "/training-data/huangxing/software/SPPO/sppo/run_dpo.py", line 78, in main_inner rank4: raw_datasets = get_datasets(data_args, splits=["train"])

rank4: File "/training-data/huangxing/software/SPPO/sppo/alignment/data.py", line 164, in get_datasets rank4: raw_datasets = mix_datasets(dataset_mixer, splits=splits, shuffle=shuffle)

rank4: File "/training-data/huangxing/software/SPPO/sppo/alignment/data.py", line 189, in mix_datasets rank4: dataset = load_dataset(ds, split=split)

rank4: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 2129, in load_dataset rank4: builder_instance = load_dataset_builder(

rank4: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 1815, in load_dataset_builder rank4: dataset_module = dataset_module_factory(

rank4: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 1512, in dataset_module_factory rank4: raise e1 from None rank4: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 1468, in dataset_module_factory rank4: raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({type(e).name})") rank4: ConnectionError: Couldn't reach 'synthetic_data_llama-3-8b-instruct-sppo-iter3_score' on the Hub (ConnectionError)

angelahzyuan commented 5 months ago

Great work! I commented all the push_to_hub in the code. Is synthetic_data_llama-3-8b-instruct-sppo-iter3_score dataset generated by PairRM?

Great work! I commented all the push_to_hub in the code. Is synthetic_data_llama-3-8b-instruct-sppo-iter3_score dataset generated by PairRM?

Hi,

This file should appear in your local folder (under where you started the script) if the generation pipeline has run successfully. Please check for any errors in the generation process.

angelahzyuan commented 5 months ago

Yes. It is generated by vllm and PairRM and is automatically included in our pipeline.