princeton-nlp / SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward
MIT License
626 stars 36 forks source link

How to use local dataset #41

Open mazhengyufreedom opened 1 month ago

mazhengyufreedom commented 1 month ago

Since my training environment could not connect to the internet, I download the model and dataset and save them in the local disk. The arguments: model path: ModelArguments(base_model_revision=None, model_name_or_path='/home/models/huggingface/llama-3-8b--777cfbd-C11', model_revision='main', model_code_revision=None, torch_dtype=None, tokenizer_name_or_path=None, trust_remote_code=False, use_flash_attention_2=True, use_peft=False, lora_r=16, lora_alpha=32, lora_dropout=0.05, lora_target_modules=None, lora_modules_to_save=None, load_in_8bit=False, load_in_4bit=False, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False) data path: DataArguments(chat_template=None, dataset_mixer=/home/SimPO/data/ultrafeedback_binarized', text_column='text', dataset_splits=['train_prefs', 'test_prefs'], dataset_configs=None, preprocessing_num_workers=12, truncation_side=None, auto_insert_empty_system_msg=True)

But there is an error when runing the scripts: Traceback (most recent call last): File "/home/SimPO/scripts/run_simpo.py", line 319, in main() File "/home/SimPO/scripts/run_simpo.py", line 162, in main raw_datasets = get_datasets( File "/usr/local/lib/python3.10/dist-packages/alignment/data.py", line 170, in get_datasets raw_datasets = mix_datasets( File "/usr/local/lib/python3.10/dist-packages/alignment/data.py", line 215, in mix_datasets for ds, frac in dataset_mixer.items(): AttributeError: 'str' object has no attribute 'items'

I think there maybe something wrong when I use the local data path, how could I fix it?

yumeng5 commented 1 month ago

Hi,

Could you check your alignment-handbook version? We used 0.4.0.dev0 and didn't encounter similar errors in our runs.

Best, Yu

mazhengyufreedom commented 1 month ago

Hi,

Could you check your alignment-handbook version? We used 0.4.0.dev0 and didn't encounter similar errors in our runs.

Best, Yu

The version is 0.2.0, where did u find the version 0.4.0?

mazhengyufreedom commented 1 month ago

And another question is about accelerate_configs, I found 3 yaml files: deepspeed_zero3、fsdp and multi_gpu. If I want to change number of gpus from 4 to 8, which paramer should I change? 'num_processes' in deepspeed_zero3? What is multi_gpu.yaml used for?