Open DA21S321D opened 2 months ago
Sorry for the confusion. We do not use neg_detach
but use boundary
. I have fixed this bug. You can pull it and try to run the code again.
Yesterday, I set them both True to run. I used Step 2 FineTuned Llama-2-7B (finetuned and merged) in Step 2(refered in the picture) to generate the Policy Refinement Data. Then i use
./finetune/run_policy_refinement.py
and merged Lora. But after finishing the Step 3, i run the Step 3 FineTuned Llama-2-7B using run_open_LLM.py
to compare with other models, it perform far worse than the Step 2 FineTuned Llama-2-7B . Will these two augments be the problems affecting that ?
I think you should follow the settings in my code. You may just pull the project and run it again with llama-2-13b-chat-hf
.
I found another missing augment in run_policy_refinement.py at line 112, the "REWARD": in_args.reward
is not preset. It seems this augment has not been used in the code, just found it by the way.
Thank you for your feedback. This augment is not used in the policy refinement procedure. I have fixed this bug.
neg_detach
andboundary
are not preset, how should i set them True or False ?