pipilurj / bootstrapped-preference-optimization-BPO

code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
Apache License 2.0
45 stars 1 forks source link

About training config and time #8

Closed Artanic30 closed 2 months ago

Artanic30 commented 2 months ago

Hi, thanks for your work! I want to verify some parameters in provided training scripts. First, the lora rank in provided scripts is 32 as listed below:

--lora_r 32 \
--lora_alpha 256 \

However, I found the lora rank is 64 in paper section 5.1. So I'm wondering which one is the correct lora rank for reproducing the paper results.

Secondly, I notice that the provided scripts requires pretrain_mm_mlp_adapter which indicate the finetuning starts from LLaVA 1.5 pretrained weight. However, from my understanding, the training should start with LLaVA1.5 instruction finetuned weight. Could you provide more information on this issue?

By the way, I'm currently running the code with 7B model in 4 A100 GPUs, but the reported training time with 8 A40 is around 28 hours. I'm wondering if anyone else face the same issue.

pipilurj commented 2 months ago

Hi, thanks a lot for your interest in our work! The lora rank is indeed 64, as mentioned in the paper. But setting the rank to 32 does not hurt the performance much. --pretrain_mm_mlp_adapter can indeed be deleted. And yes, we start from the LLaVA after SFT stage. Thanks a lot for helping us identify the issues!

Artanic30 commented 2 months ago

Thanks for the reply.