WARNING 2022-11-28T08:24:01 | py.warnings: /miniconda3/envs/sl/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1639180594101/work/aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
2022-11-28T08:24:02 | models.model_retrieval_base: Init new model with new image size 224, and load weights.
2022-11-28T08:24:05 | models.model_retrieval_base: _IncompatibleKeys(missing_keys=['encoder.layer.0.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.1.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.2.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.3.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.4.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.5.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.6.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.7.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.8.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.9.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.10.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.11.attention.attention.relative_position_bias.relative_position_index'], unexpected_keys=[])
2022-11-28T08:24:05 | models.model_retrieval_base: Build text_encoder bert-base-uncased
2022-11-28T08:24:10 | models.model_retrieval_base: Build text_encoder bert-base-uncased, done!
2022-11-28T08:24:10 | models.model_vqa: Build text_decoder bert-base-uncased
2022-11-28T08:24:14 | models.model_vqa: Build text_decoder bert-base-uncased, done!
2022-11-28T08:24:14 | utils.optimizer: optimizer -- lr=1e-05 wd=0.02 len(p)=208
2022-11-28T08:24:14 | utils.optimizer: optimizer -- lr=1e-05 wd=0 len(p)=329
2022-11-28T08:24:14 | tasks.shared_utils: Loading checkpoint from /singularity/ckpts_and_logs/ft_msrvtt_qa_singularity_17m.pth
2022-11-28T08:24:21 | models.utils: Load temporal_embeddings, lengths: 64-->1
Traceback (most recent call last):
File "tasks/vqa.py", line 295, in
main(cfg)
File "tasks/vqa.py", line 188, in main
find_unused_parameters=True
File "/singularity/ckpts_and_logs/qa_msrvtt/msrvtt/code/singularity/tasks/shared_utils.py", line 85, in setup_model
layer_num = int(encoder_keys[4])
ValueError: invalid literal for int() with base 10: 'attention'
I am trying to train msrvtt-qa, but got this error.
--command
bash scripts/train_vqa.sh msrvtt msrvtt 1 local pretrained_path=/singularity/ckpts_and_logs/ft_msrvtt_qa_singularity_17m.pth
Did I miss something?
-----logs
2022-11-28T08:23:48 | main: config: {'dataset_name': 'msrvtt', 'data_root': '${oc.env:SL_DATA_DIR}/videos', 'anno_root_downstream': '${oc.env:SL_DATA_DIR}/anno_downstream', 'train_file': [['${anno_root_downstream}/msrvtt_qa_train.json', '${data_root}/msrvtt_2fps_224', 'video']], 'test_types': ['val'], 'test_file': {'val': ['${anno_root_downstream}/msrvtt_qa_val.json', '${data_root}/msrvtt_2fps_224', 'video'], 'test': ['${anno_root_downstream}/msrvtt_qa_test.json', '${data_root}/msrvtt_2fps_224', 'video']}, 'stop_key': 'val', 'answer_list': '${anno_root_downstream}/msrvtt_qa_answer_list.json', 'text_encoder': 'bert-base-uncased', 'text_decoder': 'bert-base-uncased', 'bert_config': 'configs/config_bert.json', 'vit_type': 'beit', 'vit_zoo': {'beit': 'microsoft/beit-base-patch16-224-pt22k-ft22k'}, 'vit_name_or_pretrained_path': '${vit_zoo[${vit_type}]}', 'temporal_vision_encoder': {'enable': False, 'num_layers': 2, 'update_pooler_embed': False}, 'add_temporal_embed': False, 'image_res': 224, 'embed_dim': 256, 'video_input': {'num_frames': 1, 'reader': 'decord', 'sample_type': 'rand', 'num_frames_test': 4, 'sample_type_test': 'middle'}, 'max_q_len': 25, 'max_a_len': 5, 'batch_size': {'image': 128, 'video': 32}, 'batch_size_test': {'image': 64, 'video': 64}, 'k_test': 128, 'temp': 0.07, 'eos': '[SEP]', 'optimizer': {'opt': 'adamW', 'lr': 1e-05, 'opt_betas': [0.9, 0.999], 'weight_decay': 0.02, 'max_grad_norm': -1, 'different_lr': {'enable': False, 'module_names': [], 'lr': 0.001}}, 'scheduler': {'sched': 'cosine', 'epochs': 10, 'min_lr_multi': 0.1, 'warmup_epochs': 0.5}, 'output_dir': '/singularity/ckpts_and_logs/qa_msrvtt/msrvtt', 'pretrained_path': '/singularity/ckpts_and_logs/ft_msrvtt_qa_singularity_17m.pth', 'resume': False, 'evaluate': False, 'eval_frame_ensemble': 'concat', 'device': 'cuda', 'seed': 42, 'log_freq': 100, 'dist_url': 'env://', 'distributed': True, 'fp16': True, 'debug': False, 'num_workers': 16, 'wandb': {'enable': True, 'entity': None, 'project': 'sb_qa_msrvtt'}, 'rank': 0, 'world_size': 1, 'gpu': 0, 'dist_backend': 'nccl', 'result_dir': '/singularity/ckpts_and_logs/qa_msrvtt/msrvtt'} 2022-11-28T08:23:48 | main: train_file: [['${anno_root_downstream}/msrvtt_qa_train.json', '${data_root}/msrvtt_2fps_224', 'video']] 2022-11-28T08:23:48 | main: Creating vqa QA datasets Loading /singularity/data/anno_downstream/msrvtt_qa_train.json: 100%|█| 158581/158581 [ Loading /singularity/data/anno_downstream/msrvtt_qa_val.json: 100%|█| 12278/12278 [00:0 Loading /singularity/data/anno_downstream/msrvtt_qa_test.json: 100%|█| 72821/72821 [00: 2022-11-28T08:23:49 | tasks.shared_utils: Creating model 2022-11-28T08:23:56 | models.model_retrieval_base: Loading vit pre-trained weights from huggingface microsoft/beit-base-patch16-224-pt22k-ft22k. WARNING 2022-11-28T08:24:01 | py.warnings: /miniconda3/envs/sl/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1639180594101/work/aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
WARNING 2022-11-28T08:24:01 | py.warnings: /miniconda3/envs/sl/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1639180594101/work/aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
2022-11-28T08:24:02 | models.model_retrieval_base: Init new model with new image size 224, and load weights. 2022-11-28T08:24:05 | models.model_retrieval_base: _IncompatibleKeys(missing_keys=['encoder.layer.0.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.1.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.2.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.3.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.4.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.5.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.6.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.7.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.8.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.9.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.10.attention.attention.relative_position_bias.relative_position_index', 'encoder.layer.11.attention.attention.relative_position_bias.relative_position_index'], unexpected_keys=[]) 2022-11-28T08:24:05 | models.model_retrieval_base: Build text_encoder bert-base-uncased 2022-11-28T08:24:10 | models.model_retrieval_base: Build text_encoder bert-base-uncased, done! 2022-11-28T08:24:10 | models.model_vqa: Build text_decoder bert-base-uncased 2022-11-28T08:24:14 | models.model_vqa: Build text_decoder bert-base-uncased, done! 2022-11-28T08:24:14 | utils.optimizer: optimizer -- lr=1e-05 wd=0.02 len(p)=208 2022-11-28T08:24:14 | utils.optimizer: optimizer -- lr=1e-05 wd=0 len(p)=329 2022-11-28T08:24:14 | tasks.shared_utils: Loading checkpoint from /singularity/ckpts_and_logs/ft_msrvtt_qa_singularity_17m.pth 2022-11-28T08:24:21 | models.utils: Load temporal_embeddings, lengths: 64-->1 Traceback (most recent call last): File "tasks/vqa.py", line 295, in
main(cfg)
File "tasks/vqa.py", line 188, in main
find_unused_parameters=True
File "/singularity/ckpts_and_logs/qa_msrvtt/msrvtt/code/singularity/tasks/shared_utils.py", line 85, in setup_model
layer_num = int(encoder_keys[4])
ValueError: invalid literal for int() with base 10: 'attention'