haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
19.6k stars 2.16k forks source link

[Question] size mismatch for mm_projector #744

Open TonyUSTC opened 11 months ago

TonyUSTC commented 11 months ago

Question

when i run cmd below, got size mismatch. CUDA_VISIBLE_DEVICES=1 python -m llava.serve.cli --model-path liuhaotian/llava-v1.5-13b-lora --model-base liuhaotian/vicuna-13b-v1.5 --image-file ./images/view.jpg --load-4bit

raceback (most recent call last): File "/apdcephfs/share_1157259/users/tttaozhang/tools/anaconda3/envs/llava/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/apdcephfs/share_1157259/users/tttaozhang/tools/anaconda3/envs/llava/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/apdcephfs/private_tttaozhang/llm/LLaVA/llava/serve/cli_ttao.py", line 132, in main(args) File "/apdcephfs/private_tttaozhang/llm/LLaVA/llava/serve/cli_ttao.py", line 33, in main tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, args.load_8bit, args.load_4bit, device=args.device) File "/apdcephfs/private_tttaozhang/llm/LLaVA/llava/model/builder.py", line 72, in load_pretrained_model model.load_state_dict(non_lora_trainables, strict=False) File "/apdcephfs/share_1157259/users/tttaozhang/tools/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LlavaLlamaForCausalLM: size mismatch for model.mm_projector.0.weight: copying a param with shape torch.Size([5120, 1024]) from checkpoint, the shape in current model is torch.Size([2621440, 1]). size mismatch for model.mm_projector.2.weight: copying a param with shape torch.Size([5120, 5120]) from checkpoint, the shape in current model is torch.Size([13107200, 1]).

TonyUSTC commented 11 months ago

removed '--load-4bit', the code runs fine. However, how can I modify the code when using 4-bit quantization with lora weight?

LumenYoung commented 11 months ago

I also have the same problem. I would like to believe that we didn't find the right model_base. But I can't find the instruction on the right one other than the information from the technical report.

haotian-liu commented 11 months ago

removed '--load-4bit', the code runs fine. However, how can I modify the code when using 4-bit quantization with lora weight?

You can first create merged lora weights, then load with 4bit: https://github.com/haotian-liu/LLaVA/blob/main/docs/LoRA.md#create-merged-checkpoints

It seems that QLora does not support this either: https://github.com/artidoro/qlora/blob/main/examples/guanaco_7B_demo_colab.ipynb

LumenYoung commented 11 months ago

removed '--load-4bit', the code runs fine. However, how can I modify the code when using 4-bit quantization with lora weight?

You can first create merged lora weights, then load with 4bit: https://github.com/haotian-liu/LLaVA/blob/main/docs/LoRA.md#create-merged-checkpoints

It seems that QLora does not support this either: https://github.com/artidoro/qlora/blob/main/examples/guanaco_7B_demo_colab.ipynb

Hello Haotian, Thanks for the reply. But I would like to confirm which is the base model of liuhaotian/llava-v1.5-13b-lora? Because there is no clear documentation on this. I could also open a PR for the documentation if you can provide a way for me to validate the model_base of lora models manually.

haotian-liu commented 11 months ago

Hi @LumenYoung The base model is Vicuna v1.5.

LumenYoung commented 10 months ago

Hi @LumenYoung The base model is Vicuna v1.5.

Thanks for the reply @haotian-liu , but the liuhaotian/vicuna-13b-v1.5 seems not existing. I checked the huggingface and the only similar one is lmsys/vicuna-13b-v1.5 or liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-13b-v1.5. How exactly is the model base defined?

Apparently there is no liuhaotian/vicuna-13b-v1.5 at your huggingface repo right now. I wonder how @TonyUSTC managed to run with the command he provides.

LumenYoung commented 10 months ago

As a follow up, the model base is lmsys/vicuna-13b-v1.5, should anyone was uncertain about the it.

curiousNick1 commented 9 months ago

@haotian-liu Would you please check my problem when loading merged-lora checkpoint? It seems that the merged checkpoint has the same format as Vicuna-v1.5-7b and it has no information about the vision tower as well as the mm-projector. After I add configs about Vision tower and projector(non_lora_trainable.bin) to the config file, there would be an error saying 'ValueError:the weight is on the meta device, we need a 'value' to put in 0 ' when launching model worker. But the code could run with the original llava projector(mm_projector.bin), I wonder if these two projector files has some differences? How could I load the merged-lora checkpoint as well as the finetuned projector?

zhyhome commented 6 months ago

@haotian-liu Would you please check my problem when loading merged-lora checkpoint? It seems that the merged checkpoint has the same format as Vicuna-v1.5-7b and it has no information about the vision tower as well as the mm-projector. After I add configs about Vision tower and projector(non_lora_trainable.bin) to the config file, there would be an error saying 'ValueError:the weight is on the meta device, we need a 'value' to put in 0 ' when launching model worker. But the code could run with the original llava projector(mm_projector.bin), I wonder if these two projector files has some differences? How could I load the merged-lora checkpoint as well as the finetuned projector? How did you solve this problem?