mbzuai-oryx / GeoChat

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
https://mbzuai-oryx.github.io/GeoChat
356 stars 23 forks source link

CLIPVisionTower unable to obtain model #31

Open chuznhiwu opened 3 months ago

chuznhiwu commented 3 months ago

Hello! I use finetune_lora.sh and set as feliows: --model_name_or_path liuhaotian/llava-v1.5-7b \ --vision_tower openai/clip-vit-large-patch14-336 \

and got these:

File "/home/wucz/remote-sensing/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 97, in init self.clip_interpolate_embeddings(image_size=504, patch_size=14) File "/home/wucz/remote-sensing/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 34, in clip_interpolate_embeddings n, seq_length, hidden_dim = pos_embedding.shape ValueError: not enough values to unpack (expected 3, got 2)

    pos_embedding = state_dict['weight']
    print(pos_embedding.shape)  【torch.Size([0])】
    pos_embedding = pos_embedding.unsqueeze(0)
    print(pos_embedding.shape)  【torch.Size([1, 0])】
    n, seq_length, hidden_dim = pos_embedding.shape

Where did I set the wrong settings that caused me to not read the model?