iflytek / VLE

VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)
Apache License 2.0
176 stars 11 forks source link

KeyError: 'pi' #6

Open zhousteven opened 10 months ago

zhousteven commented 10 months ago

System enviroment: Ubuntu20.04 torch 2.0.0+cu118 torchvision 0.15.1+cu118

Commandline:

run_vqav2_ft.py --train_config_file=vqa_train_config.json

Error Description as below: /home/steven/anaconda3/envs/nlp/bin/python /home/steven/workstore/nlp/VLE-main/run_vqav2_ft.py --train_config_file=vqa_train_config.json /home/steven/workstore/nlp/VLE-main/run_vqav2_ft.py:76: SyntaxWarning: "is" with a literal. Did you mean "=="? max_epochs=_config["max_epoch"] if max_steps is -1 else 1000, 2023-09-20 13:52:41.071596: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Global seed set to 0 Some weights of VLEForVQA were not initialized from the model checkpoint at hfl/vle-base and are newly initialized: ['vqa_classifier.1.bias', 'vqa_classifier.3.bias', 'vqa_classifier.3.weight', 'vqa_classifier.0.bias', 'vqa_classifier.1.weight', 'vqa_classifier.0.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "/home/steven/workstore/nlp/VLE-main/run_vqav2_ft.py", line 107, in main(train_config) File "/home/steven/workstore/nlp/VLE-main/run_vqav2_ft.py", line 25, in main model = VLEForVQA_PL(_config) File "/home/steven/workstore/nlp/VLE-main/vqav2_train_module.py", line 73, in init new_state_dict = extend_position_embedding(self.model.state_dict(), patch_size, config["image_size"]) File "/home/steven/workstore/nlp/VLE-main/models/VLE/modeling_vle.py", line 124, in extend_position_embedding state_dict[keys['pi'][0]] = torch.arange(grid_after*grid_after + 1).unsqueeze(0) KeyError: 'pi'

Process finished with exit code 1

when i debug step by step, i found that cannot found key value "vision_model.embeddings.position_ids" in parameters list of model 'vle-base'. Any body encounter same question as me? please kindly help to solve this problem! Tks~

zhousteven commented 10 months ago

@ymcui @GoGoJoestar @airaria waiting for your response

GoGoJoestar commented 10 months ago

The pi was added to keys if there is a parameter ends with vision_model.embeddings.position_ids. The related parameter model.vle.vision_model.vision_model.embeddings.position_ids is a buffer registered in the class transformers.models.clip.modeling_clip.CLIPVisionEmbeddings, you can check if it in the model vle-base as following:

for n,b in model.named_buffers():
    print(n, b.shape)

If it not in the model, you could check the related code of CLIP model in transformers, or try reinstalling transformers, re-pull VLE repo.