haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.26k stars 2.24k forks source link

[Question] Anyone could explain to me what does pretrain_mm_mlp_adapter means in lora file? 请问pretrain_mm_mlp_adapter是干什么用的啊? #1419

Open fisher75 opened 7 months ago

fisher75 commented 7 months ago

Question

Hi, guys, (1) does anyone know what pretrain_mm_mlp_adapter means in the Lora file? (2) btw I'm trying to lora the 1.6-7b models, which file should I use? finetune_task_lora.sh or finetune_lora.sh?

Thanks!

bkuster0 commented 6 months ago

(this is speculation/my understanding, not 100% accurate answer) 1) The "pretrain_mlp_adapter" is the file for the multi-layer perceptron weights. (the output tokens of the CLIP encoder are converted into "visual" tokens that are same dimensionality as the "text" tokens, by the MLP adapter. The adapter can be fine-tuned in 1.5, however in 1.6:

2) I do believe you should use the finetune_task_lora, since (from my knowledge), in LLaVa-1.6 models, there is no separate adapter weights file.