如何对齐论文中 LM&Math&Code融合的指标

RDXSun commented 8 months ago

CUDA_VISIBLE_DEVICES=1,2 nohup python merge_llms_instruct_math_code.py --merge_instruct --merge_math --merge_code --merging_method_name task_arithmetic --use_task_arithmetic--wtight_mask_rate 0.2 --mask_apply_method task_arithmetic --tensor_parallel_size 1 & 我的指令是这个，测出来的gsm8k准确率为0.33813495072024263，是哪个参数不对

yule-BUAA commented 8 months ago

当合并 LM&Math&Code 三个任务时，如论文中所述，我们分别尝试了average_merging和task_arithmetic两类方法，并取较好的结果作为最终指标。这两个方法的运行命令分别为 python merge_llms_instruct_math_code.py --merge_instruct --merge_math --merge_code --merging_method_name average_merging --tensor_parallel_size 1和 python merge_llms_instruct_math_code.py --merge_instruct --merge_math --merge_code --merging_method_name task_arithmetic --scaling_coefficient 1.0 --tensor_parallel_size 1 我们在论文中也报告了上述两种方法使用DARE时的最优表现，运行命令分别为 python merge_llms_instruct_math_code.py --merge_instruct --merge_math --merge_code --merging_method_name mask_merging --use_weight_rescale --weight_mask_rate 0.5 --mask_apply_method average_merging --tensor_parallel_size 1和 python merge_llms_instruct_math_code.py --merge_instruct --merge_math --merge_code --merging_method_name mask_merging --use_weight_rescale --weight_mask_rate 0.1 --mask_apply_method task_arithmetic --scaling_coefficient 1.0 --tensor_parallel_size 1

RDXSun commented 8 months ago

我跑了最后一条命令，得到的gsm8k的正确率也只有 python merge_llms_instruct_math_code.py --merge_instruct --merge_math --merge_code --merging_method_name mask_merging --use_weight_rescale --weight_mask_rate 0.1 --mask_apply_method task_arithmetic --scaling_coefficient 1.0 --tensor_parallel_size 1

yule-BUAA commented 8 months ago

我运行最后一条命令时，得到的gsm8k的结果是44.58，这或许和你使用的库函数版本有关，请尝试调整vllm版本为0.1.4，transformer版本为4.33.1。
论文中在gsm8k上报告的结果是通过average_merging方法得到的，请尝试第一条或者第三条命令来得到融合LM&Math&Code的模型性能。

yule-BUAA commented 8 months ago

这个issue先关闭了哈。

如果后续有问题可以随时重启这个issue。

yule-BUAA / MergeLM

如何对齐论文中 LM&Math&Code融合的指标 #21