Closed paulcx closed 11 months ago
Yes, you can merge any LLMs by the merge_llms_instruct_math_code script as long as they are fine-tuned from the same pre-trained backbone. For example, in our merge_llms_instruct_math_code script, apart from WizardMath, you can also merge WizardLM and Code Alpaca by setting merge_instruct
and merge_code
to True
. You can even merge WizardLM, WizardMath, and Code Alpaca by setting merge_instruct
, merge_math
, and merge_code
to True
.
If you want to merge two llama models, you should 1) download the two llama model files as well as their pre-trained model files; 2) replace the original model infos with your llama models' infos (e.g., their names, file paths) into the merge_llms_instruct_math_code script. I think this will work for your case.
a few things to confirm:
merging_method_name
to average_merging
), you only need two sft models since this method is simple and does not use the pt model. Otherwise, you need three models.what's the difference of merging two w/o pt model? what's the use of pt model in merging?
Some model merging methods (e.g., Task Arithmetic and TIES-Merging) use the difference between an sft and a pt model to denote the task vector. Then, they operate on the task vector. While some methods not (e.g., Average Merging). You can refer to the references for more details.
Close this issue now.
Please feel free to reopen it when there are any further questions.
Does merge_llms_instruct_math_code script can be applied for merging llms other than WizardMath? For example, how to merge two llama models?