State_dicts mismatch when trying to reproduce evaluation

RealAntonVoronov commented 2 years ago

Hello and thank you for your great work. I have a problem trying to reproduce evaluation on ActivityNet-QA. I have downloaded fine-tuned checkpoints and trying to run eval_vqa.sh. However I get the error during model creation. As far as I have understood the logs this error is due to state_dicts missmatch. Here are the logs: 2022-09-01T17:03:12 | tasks.shared_utils: Creating model 2022-09-01T17:03:17 | models.model_retrieval_base: Loading vit pre-trained weights from huggingface microsoft/beit-base-patch16-224-pt22k-ft22k. 2022-09-01T17:03:19 | models.model_retrieval_base: Init new model with new image size 224, and load weights. 2022-09-01T17:03:22 | models.model_retrieval_base: _IncompatibleKeys(missing_keys=['encoder.layer.0.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.1.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.2.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.3.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.4.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.5.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.6.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.7.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.8.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.9.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.10.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.11.attention.attention.relative_position_bias.relative_position_index'], unexpected_keys=[]) 2022-09-01T17:03:22 | models.model_retrieval_base: Build text_encoder bert-base-uncased 2022-09-01T17:03:23 | models.model_retrieval_base: Build text_encoder bert-base-uncased, done! 2022-09-01T17:03:23 | models.model_vqa: Build text_decoder bert-base-uncased 2022-09-01T17:03:24 | models.model_vqa: Build text_decoder bert-base-uncased, done! 2022-09-01T17:03:25 | utils.optimizer: optimizer -- lr=1e-05 wd=0.02 len(p)=208 2022-09-01T17:03:25 | utils.optimizer: optimizer -- lr=1e-05 wd=0 len(p)=329 2022-09-01T17:03:25 | tasks.shared_utils: Loading checkpoint from anet_qa/ft_anet_qa_singularity_17m.pth 2022-09-01T17:03:25 | models.utils: Load temporal_embeddings, lengths: 64-->1 Traceback (most recent call last): File "tasks/vqa.py", line 295, in <module> main(cfg) File "tasks/vqa.py", line 188, in main find_unused_parameters=True File "/home/jovyan/voronov/video_qa/singularity/tasks/shared_utils.py", line 85, in setup_model layer_num = int(encoder_keys[4]) ValueError: invalid literal for int() with base 10: 'attention'

So the problem arises during the creation of the decoder. Can you help me figure out what might cause this issue?

I have installed all pip packages into the clean environment according to your environment.yaml file.

RealAntonVoronov commented 2 years ago

Ok, the bug seems being fixed by simply changing 4 to 3 in lines 85 and 91 of tasks/shared_utils.py . But I still wonder what might have caused this indexing mismatch as I use the same Transformers version as stated in environment.yaml. And I am still not hundred% sure that this fix doesn't break anything later on. Because even as I manage to run eval code on test I am getting accuracy of 11% which is 4 times lower than reported in the paper.

Moreover I am a bit worried about this _IncompatibleKeys(missing_keys=) messages that seem to list all modules in the model. Where does this come from? Is it an expected behaviour?

jayleicn commented 2 years ago

Hi @RealAntonVoronov, This error is fixed in the latest commit ee77824

jayleicn / singularity

State_dicts mismatch when trying to reproduce evaluation #18