microsoft / LLaVA-Med

Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
Other
1.29k stars 148 forks source link

when use "microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224" , shape mismatch error #51

Closed jzssz closed 3 months ago

jzssz commented 4 months ago

① when i use "openai/clip-vit-large-patch14" , no error reported. ② when i use "microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224" , error reported (image_features.shape=[bs=2,196,768], self.mm_projector is 1024x4096, so 768 mismatch 1024, so how to solve it? thanks a lot.) :

Traceback (most recent call last): File "/home/llava-med/LLaVA-Med-main/llava/train/train_mem.py", line 13, in train() File "/home/llava-med/LLaVA-Med-main/llava/train/train.py", line 596, in train trainer.train() File "/home/anaconda3/envs/llava-med/lib/python3.10/site-packages/transformers/trainer.py", line 1644, in train return inner_training_loop( File "/home/anaconda3/envs/llava-med/lib/python3.10/site-packages/transformers/trainer.py", line 1909, in _inner_trai ning_loop tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/llava-med/lib/python3.10/site-packages/transformers/trainer.py", line 2657, in training_st ep loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/llava-med/lib/python3.10/site-packages/transformers/trainer.py", line 2689, in compute_los s outputs = model(inputs) File "/home/anaconda3/envs/llava-med/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_im pl return forward_call(*args, *kwargs) File "/home/anaconda3/envs/llava-med/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1156, in fo rward output = self._run_ddp_forward(inputs, kwargs) File "/home/anaconda3/envs/llava-med/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1110, in _r un_ddp_forward return module_to_run(*inputs[0], kwargs[0]) # type: ignore[index] File "/home/anaconda3/envs/llava-med/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_im pl return forward_call(*args, *kwargs) File "/home/llava-med/LLaVA-Med-main/llava/model/llava.py", line 315, in forward outputs = self.model( File "/home/anaconda3/envs/llava-med/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_im pl return forward_call(args, kwargs) File "/home/llava-med/LLaVA-Med-main/llava/model/llava.py", line 226, in forward image_features = self.mm_projector(image_features) File "/home/anaconda3/envs/llava-med/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/anaconda3/envs/llava-med/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (392x768 and 1024x4096)

tj-zhu commented 2 months ago

@jzssz Sorry to bother you, but can I ask how did you solve the issue? Thank you.