haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.28k stars 2.24k forks source link

[Question] How to use the fine-tuned model? #808

Open ZY123-GOOD opened 1 year ago

ZY123-GOOD commented 1 year ago

Question

I have two questions.

  1. I follow the instruction in scripts/v1.5 to pre-train and fine-tune the model. After pre-training, I get the mm_projector.bin; and after fine-tuning I get adapter_model.bin without new mm_projector.bin. I want to know is the projector frozen during the LoRa fine-tuning process.
  2. When "python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./liuhaotian --model-base /home/yaozhu/LLaVA/LLaVA_codes/vicuna" I got :
Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at /home/yaozhu/LLaVA/LLaVA_codes/vicuna and are newly initialized: ['model.mm_projector.0.weight', 'model.mm_projector.2.bias', 'model.mm_projector.2.weight', 'model.mm_projector.0.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

May I ask how to correctly load the mm_projector?

ZY123-GOOD commented 1 year ago

After carefully reading the code, the issue has been resolved. I'd like to double-check my understanding. During the pretraining phase, the mm_projector was trained. In the finetuning phase, both the mm_projector and the language model were fine-tuned, and the mm_projector was saved as non_lora_trainables.bin. When loading the model in the inference process, the language model was loaded in the first stage, without the mm_projector being loaded. In the second stage, when merging the adapter and the language model, the mm_projector was loaded from non_lora_trainables.bin. Is my understanding correct? Thanks a lot.

Eric-is-good commented 12 months ago

How did you solve the problem? I am using my own language model, but there is also a situation where weight cannot be loaded

curiousNick1 commented 11 months ago

@ZY123-GOOD Can you please check my problem when loading merged-lora checkpoint? It seems that the merged checkpoint has the same format as Vicuna-v1.5-7b and it has no information about the vision tower as well as the mm-projector. After I add configs about Vision tower and projector(non_lora_trainable.bin) to the config file, there would be an error saying 'the weight is on the meta device' But the code could run with the original llava projector(mm_projector.bin), I wonder if these two bin files has some differences?

yuezih commented 10 months ago

@ZY123-GOOD @Eric-is-good @curiousNick1 Hi all, FYI: https://github.com/haotian-liu/LLaVA/issues/474#issuecomment-1760890593

:)