Closed may012345 closed 5 months ago
if you have merged lora adapters to the base model, just use --model_name_or_path only, otherwise, put your sft lora to --adapter_name_or_path
@hiyouga 貌似这个问题仍然存在啊。 把examples\lora_single_gpu\llama3_lora_ppo.yaml文件里的第一行。改成本地模型然后命名文件为local_lora_ppo.yaml 接着直接在ipynb里运行
yaml_file = "examples\lora_single_gpu\local_lora_ppo.yaml"
!llamafactory-cli train $yaml_file
得到
Loading checkpoint shards: 100%|██████████| 3/3 [00:26<00:00, 8.74s/it]
[INFO|modeling_utils.py:4280] 2024-05-30 20:00:57,486 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4288] 2024-05-30 20:00:57,486 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at \\wsl.localhost\Ubuntu-22.04/home/xzpwsl2/my/work/kaggle_code/deepseek-math/kaggle/input/deepseek-math.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:915] 2024-05-30 20:00:57,496 >> loading configuration file \\wsl.localhost\Ubuntu-22.04/home/xzpwsl2/my/work/kaggle_code/deepseek-math/kaggle/input/deepseek-math\generation_config.json
[INFO|configuration_utils.py:962] 2024-05-30 20:00:57,496 >> Generate config GenerationConfig {
"bos_token_id": 100000,
"eos_token_id": 100001
}
Traceback (most recent call last):
File "D:\my\env\python3.10.10\lib\site-packages\peft\config.py", line 197, in _get_peft_type
config_file = hf_hub_download(
File "D:\my\env\python3.10.10\lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn
validate_repo_id(arg_value)
File "D:\my\env\python3.10.10\lib\site-packages\huggingface_hub\utils\_validators.py", line 154, in validate_repo_id
raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'saves/llama3-8b/lora/reward'. Use `repo_type` argument if needed.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\my\env\python3.10.10\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\my\env\python3.10.10\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\my\env\python3.10.10\Scripts\llamafactory-cli.exe\__main__.py", line 7, in <module>
sys.exit(main())
File "D:\my\work\LLM\LLaMA-Factory\LLaMA-Factory\src\llamafactory\cli.py", line 65, in main
run_exp()
File "D:\my\work\LLM\LLaMA-Factory\LLaMA-Factory\src\llamafactory\train\tuner.py", line 37, in run_exp
run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "D:\my\work\LLM\LLaMA-Factory\LLaMA-Factory\src\llamafactory\train\ppo\workflow.py", line 40, in run_ppo
reward_model = create_reward_model(model, model_args, finetuning_args)
File "D:\my\work\LLM\LLaMA-Factory\LLaMA-Factory\src\llamafactory\train\utils.py", line 123, in create_reward_model
model.pretrained_model.load_adapter(finetuning_args.reward_model, "reward")
File "D:\my\env\python3.10.10\lib\site-packages\peft\peft_model.py", line 970, in load_adapter
PeftConfig._get_peft_type(
File "D:\my\env\python3.10.10\lib\site-packages\peft\config.py", line 203, in _get_peft_type
raise ValueError(f"Can't find '{CONFIG_NAME}' at '{model_id}'")
ValueError: Can't find 'adapter_config.json' at 'saves/llama3-8b/lora/reward'
代码版本:c4f50865ad798e1e99044480e1ab05abefc30224 [c4f5086]
是不是必须sft rm 然后才能ppo 不能直接ppo?这个错误的原因是直接ppo了?
examples/inference/llama3_lora_sft.yaml
@sankexin This issue arises in the PPO fine-tuning, and I am also asking about PPO fine-tuning. I know SFT can run. Do you mean you need to add this adapter path? But your example is sft. sft itself is not a problem
reward model training is necessary to ppo training.
Reminder
Reproduction
want to train dpo
!/bin/bash
CUDA_VISIBLE_DEVICES=0 python ../../src/train_bash.py \ --stage dpo \ --do_train \ --model_name_or_path BASE_MODEL \ --adapter_name_or_path FINETUNED_MODEL \ --create_new_adapter \ --dataset orca_rlhf \ --dataset_dir ../../data \ --template default \ --finetuning_type lora \ --lora_target q_proj,v_proj \ --output_dir ../../saves/LLaMA2-7B/lora/dpo \ --overwrite_cache \ --overwrite_output_dir \ --cutoff_len 1024 \ --preprocessing_num_workers 16 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 8 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --warmup_steps 20 \ --save_steps 100 \ --eval_steps 100 \ --evaluation_strategy steps \ --load_best_model_at_end \ --learning_rate 1e-5 \ --num_train_epochs 1.0 \ --max_samples 1000 \ --val_size 0.1 \ --dpo_ftx 1.0 \ --plot_loss \ --fp16
Expected behavior
error. cannot run dpo. Should I just put FINETUNED_MODEL at "--model_name_or_path" and delete "--adapter_name_or_path"?
System Info
ValueError: Can't find 'adapter_config.json' at '/path/to/FINETUNED_MODEL'
Others
No response