!/bin/bash

CUDA_VISIBLE_DEVICES=0 python ../../src/train_bash.py \ --stage dpo \ --do_train \ --model_name_or_path BASE_MODEL \ --adapter_name_or_path FINETUNED_MODEL \ --create_new_adapter \ --dataset orca_rlhf \ --dataset_dir ../../data \ --template default \ --finetuning_type lora \ --lora_target q_proj,v_proj \ --output_dir ../../saves/LLaMA2-7B/lora/dpo \ --overwrite_cache \ --overwrite_output_dir \ --cutoff_len 1024 \ --preprocessing_num_workers 16 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 8 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --warmup_steps 20 \ --save_steps 100 \ --eval_steps 100 \ --evaluation_strategy steps \ --load_best_model_at_end \ --learning_rate 1e-5 \ --num_train_epochs 1.0 \ --max_samples 1000 \ --val_size 0.1 \ --dpo_ftx 1.0 \ --plot_loss \ --fp16

Expected behavior

error. cannot run dpo. Should I just put FINETUNED_MODEL at "--model_name_or_path" and delete "--adapter_name_or_path"?

System Info

ValueError: Can't find 'adapter_config.json' at '/path/to/FINETUNED_MODEL'

Others

No response

hiyouga commented 5 months ago

if you have merged lora adapters to the base model, just use --model_name_or_path only, otherwise, put your sft lora to --adapter_name_or_path

xiezhipeng-git commented 4 months ago

@hiyouga 貌似这个问题仍然存在啊。把examples\lora_single_gpu\llama3_lora_ppo.yaml文件里的第一行。改成本地模型然后命名文件为local_lora_ppo.yaml 接着直接在ipynb里运行

yaml_file = "examples\lora_single_gpu\local_lora_ppo.yaml"
!llamafactory-cli train $yaml_file

得到

Loading checkpoint shards: 100%|██████████| 3/3 [00:26<00:00,  8.74s/it]
[INFO|modeling_utils.py:4280] 2024-05-30 20:00:57,486 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4288] 2024-05-30 20:00:57,486 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at \\wsl.localhost\Ubuntu-22.04/home/xzpwsl2/my/work/kaggle_code/deepseek-math/kaggle/input/deepseek-math.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:915] 2024-05-30 20:00:57,496 >> loading configuration file \\wsl.localhost\Ubuntu-22.04/home/xzpwsl2/my/work/kaggle_code/deepseek-math/kaggle/input/deepseek-math\generation_config.json
[INFO|configuration_utils.py:962] 2024-05-30 20:00:57,496 >> Generate config GenerationConfig {
  "bos_token_id": 100000,
  "eos_token_id": 100001
}

Traceback (most recent call last):
  File "D:\my\env\python3.10.10\lib\site-packages\peft\config.py", line 197, in _get_peft_type
    config_file = hf_hub_download(
  File "D:\my\env\python3.10.10\lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)
  File "D:\my\env\python3.10.10\lib\site-packages\huggingface_hub\utils\_validators.py", line 154, in validate_repo_id
    raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'saves/llama3-8b/lora/reward'. Use `repo_type` argument if needed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\my\env\python3.10.10\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\my\env\python3.10.10\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\my\env\python3.10.10\Scripts\llamafactory-cli.exe\__main__.py", line 7, in <module>
    sys.exit(main())
  File "D:\my\work\LLM\LLaMA-Factory\LLaMA-Factory\src\llamafactory\cli.py", line 65, in main
    run_exp()
  File "D:\my\work\LLM\LLaMA-Factory\LLaMA-Factory\src\llamafactory\train\tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "D:\my\work\LLM\LLaMA-Factory\LLaMA-Factory\src\llamafactory\train\ppo\workflow.py", line 40, in run_ppo
    reward_model = create_reward_model(model, model_args, finetuning_args)
  File "D:\my\work\LLM\LLaMA-Factory\LLaMA-Factory\src\llamafactory\train\utils.py", line 123, in create_reward_model
    model.pretrained_model.load_adapter(finetuning_args.reward_model, "reward")
  File "D:\my\env\python3.10.10\lib\site-packages\peft\peft_model.py", line 970, in load_adapter
    PeftConfig._get_peft_type(
  File "D:\my\env\python3.10.10\lib\site-packages\peft\config.py", line 203, in _get_peft_type
    raise ValueError(f"Can't find '{CONFIG_NAME}' at '{model_id}'")
ValueError: Can't find 'adapter_config.json' at 'saves/llama3-8b/lora/reward'

代码版本：c4f50865ad798e1e99044480e1ab05abefc30224 [c4f5086]

是不是必须sft rm 然后才能ppo 不能直接ppo?这个错误的原因是直接ppo了？

sankexin commented 4 months ago

examples/inference/llama3_lora_sft.yaml

adapter_name_or_path: saves/llama3-8b/lora/sft

xiezhipeng-git commented 4 months ago

@sankexin This issue arises in the PPO fine-tuning, and I am also asking about PPO fine-tuning. I know SFT can run. Do you mean you need to add this adapter path? But your example is sft. sft itself is not a problem

hiyouga commented 4 months ago

reward model training is necessary to ppo training.

hiyouga / LLaMA-Factory

Can't find 'adapter_config.json' #3466

Reminder

Reproduction

!/bin/bash

Expected behavior

System Info

Others

adapter_name_or_path: saves/llama3-8b/lora/sft