Closed NingMa-AI closed 10 months ago
Please use the pretrained models from here
There are also my reproduced results from yesterday.
P-former: models/ours/pformer/checkpoint_60000.pth
Stage 1: models/ours/Caption_vatex_stage1
Stage 2: models/ours/Caption_vatex_stage2
You will find the generated captions here: models/ours/Caption_vatex_stage2/20240102015/result
I have the vatex datasets in the following folders. I also uploaded the json files to the above link. Please follow the package versions in pip_freeze.txt
to closely reproduce the results. We use a single RTX-A6000.
Please use transformers==4.26.1
Thanks for the fast response! After using transformers version 4.26.1, the error for "num_beams =5" was resolved!
I still have the missing key problem.
The main reason might be that I can not install your local version of "salesforce-lavis @ file:///home/ssd1/LipTome" in pip_freeeze.txt.
Besides the above salesforce-lavis, I have double checked that all the used packages have the same version with the provided pip_freeze.txt.
You do not need it, please comment out that line (salesforce-lavis @ file:///home/ssd1/LipTome was generated when I installed official lavis on another working directory).
You can do:
Thanks. Could you provide the version of salesforce-lavis? I can install versions: 1.0.0, 1.0.1rc1, 1.0.1rc2, 1.0.2rc1, 1.0.2
Here's what I suggest you to do, in the BLIText-video
directory:
# install lavis based on official LAVIS guideline
conda create -n lavis python=3.8
conda activate lavis
pip install -e .
# fix package version issues
pip install -r pip_freeze.txt
The missing key may not be an issue. Here's my output by running bash run_scripts/blip2/train/train_caption_vatex_stage1.sh
, just now.
(lavis-OpCounter) yiren@dartmouth-110B:~/V2T-Pformer$ bash run_scripts/blip2/train/train_caption_vatex_stage1.sh
| distributed init (rank 0, world 1): env://
2024-01-03 03:49:45,941 [INFO]
===== Running Parameters =====
2024-01-03 03:49:45,942 [INFO] {
"accum_grad_iters": 1,
"amp": true,
"batch_size_eval": 64,
"batch_size_train": 128,
"device": "cuda",
"dist_backend": "nccl",
"dist_url": "env://",
"distributed": true,
"evaluate": false,
"gpu": 0,
"init_lr": 0.0001,
"lr_sched": "linear_warmup_cosine_lr",
"max_epoch": 10,
"max_len": 30,
"min_len": 8,
"min_lr": 1e-05,
"num_beams": 5,
"num_workers": 4,
"output_dir": "output/BLIP-T/Caption_vatex_stage1",
"rank": 0,
"report_metric": false,
"resume_ckpt_path": null,
"seed": 42,
"task": "captioning",
"train_splits": [
"train"
],
"valid_splits": [
"val"
],
"warmup_lr": 1e-06,
"warmup_steps": 1000,
"weight_decay": 0.05,
"world_size": 1
}
2024-01-03 03:49:45,942 [INFO]
====== Dataset Attributes ======
2024-01-03 03:49:45,942 [INFO]
======== my_vatex_caption =======
2024-01-03 03:49:45,942 [INFO] {
"build_info": {
"annotations": {
"train": {
"storage": "vatex/annotations/cap_train.json",
"url": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/vatex/cap_train.json"
},
"val": {
"storage": "vatex/annotations/cap_val.json",
"url": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/vatex/cap_val.json"
}
},
"videos": {
"storage": "vatex/images"
}
},
"data_type": "videos",
"text_processor": {
"eval": {
"name": "blip_caption"
},
"train": {
"name": "blip_caption",
"prompt": "a photo of "
}
}
}
2024-01-03 03:49:45,942 [INFO]
====== Model Attributes ======
2024-01-03 03:49:45,943 [INFO] {
"arch": "video_feature_opt_stage1",
"drop_path_rate": 0,
"finetuned": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_caption_opt2.7b.pth",
"freeze_vit": true,
"image_size": 364,
"load_finetuned": false,
"load_pretrained": true,
"model_type": "caption_coco_opt2.7b",
"num_query_token": 32,
"opt_model": "facebook/opt-2.7b",
"pretrained": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained_opt2.7b.pth",
"pretrained_stage0": "/home/yiren/LAVIS/lavis/output/BLIP-T/Pretrain_stage0/vq/40m-noisy/checkpoint_60000.pth",
"prompt": "a photo of",
"use_grad_checkpoint": false,
"vit_precision": "fp32"
}
Using downloaded and verified file: /home/yiren/lavis_datasets/vatex/annotations/cap_train.json
Using downloaded and verified file: /home/yiren/lavis_datasets/vatex/annotations/cap_val.json
2024-01-03 03:49:45,943 [INFO] Building datasets...
2024-01-03 03:50:36,989 [INFO] Missing keys ['VL_adaptor.embeddings.position_ids', 'VL_adaptor.embeddings.word_embeddings.weight', 'VL_adaptor.embeddings.position_embeddings.weight', 'VL_adaptor.embeddings.token_type_embeddings.weight', 'VL_adaptor.embeddings.LayerNorm.weight', 'VL_adaptor.embeddings.LayerNorm.bias', 'VL_adaptor.encoder.layer.0.attention.self.query.weight', 'VL_adaptor.encoder.layer.0.attention.self.query.bias', 'VL_adaptor.encoder.layer.0.attention.self.key.weight', 'VL_adaptor.encoder.layer.0.attention.self.key.bias', 'VL_adaptor.encoder.layer.0.attention.self.value.weight', 'VL_adaptor.encoder.layer.0.attention.self.value.bias', 'VL_adaptor.encoder.layer.0.attention.output.dense.weight', 'VL_adaptor.encoder.layer.0.attention.output.dense.bias', 'VL_adaptor.encoder.layer.0.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.0.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.0.intermediate.dense.weight', 'VL_adaptor.encoder.layer.0.intermediate.dense.bias', 'VL_adaptor.encoder.layer.0.output.dense.weight', 'VL_adaptor.encoder.layer.0.output.dense.bias', 'VL_adaptor.encoder.layer.0.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.0.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.1.attention.self.query.weight', 'VL_adaptor.encoder.layer.1.attention.self.query.bias', 'VL_adaptor.encoder.layer.1.attention.self.key.weight', 'VL_adaptor.encoder.layer.1.attention.self.key.bias', 'VL_adaptor.encoder.layer.1.attention.self.value.weight', 'VL_adaptor.encoder.layer.1.attention.self.value.bias', 'VL_adaptor.encoder.layer.1.attention.output.dense.weight', 'VL_adaptor.encoder.layer.1.attention.output.dense.bias', 'VL_adaptor.encoder.layer.1.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.1.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.1.intermediate.dense.weight', 'VL_adaptor.encoder.layer.1.intermediate.dense.bias', 'VL_adaptor.encoder.layer.1.output.dense.weight', 'VL_adaptor.encoder.layer.1.output.dense.bias', 'VL_adaptor.encoder.layer.1.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.1.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.2.attention.self.query.weight', 'VL_adaptor.encoder.layer.2.attention.self.query.bias', 'VL_adaptor.encoder.layer.2.attention.self.key.weight', 'VL_adaptor.encoder.layer.2.attention.self.key.bias', 'VL_adaptor.encoder.layer.2.attention.self.value.weight', 'VL_adaptor.encoder.layer.2.attention.self.value.bias', 'VL_adaptor.encoder.layer.2.attention.output.dense.weight', 'VL_adaptor.encoder.layer.2.attention.output.dense.bias', 'VL_adaptor.encoder.layer.2.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.2.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.2.intermediate.dense.weight', 'VL_adaptor.encoder.layer.2.intermediate.dense.bias', 'VL_adaptor.encoder.layer.2.output.dense.weight', 'VL_adaptor.encoder.layer.2.output.dense.bias', 'VL_adaptor.encoder.layer.2.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.2.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.3.attention.self.query.weight', 'VL_adaptor.encoder.layer.3.attention.self.query.bias', 'VL_adaptor.encoder.layer.3.attention.self.key.weight', 'VL_adaptor.encoder.layer.3.attention.self.key.bias', 'VL_adaptor.encoder.layer.3.attention.self.value.weight', 'VL_adaptor.encoder.layer.3.attention.self.value.bias', 'VL_adaptor.encoder.layer.3.attention.output.dense.weight', 'VL_adaptor.encoder.layer.3.attention.output.dense.bias', 'VL_adaptor.encoder.layer.3.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.3.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.3.intermediate.dense.weight', 'VL_adaptor.encoder.layer.3.intermediate.dense.bias', 'VL_adaptor.encoder.layer.3.output.dense.weight', 'VL_adaptor.encoder.layer.3.output.dense.bias', 'VL_adaptor.encoder.layer.3.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.3.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.4.attention.self.query.weight', 'VL_adaptor.encoder.layer.4.attention.self.query.bias', 'VL_adaptor.encoder.layer.4.attention.self.key.weight', 'VL_adaptor.encoder.layer.4.attention.self.key.bias', 'VL_adaptor.encoder.layer.4.attention.self.value.weight', 'VL_adaptor.encoder.layer.4.attention.self.value.bias', 'VL_adaptor.encoder.layer.4.attention.output.dense.weight', 'VL_adaptor.encoder.layer.4.attention.output.dense.bias', 'VL_adaptor.encoder.layer.4.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.4.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.4.intermediate.dense.weight', 'VL_adaptor.encoder.layer.4.intermediate.dense.bias', 'VL_adaptor.encoder.layer.4.output.dense.weight', 'VL_adaptor.encoder.layer.4.output.dense.bias', 'VL_adaptor.encoder.layer.4.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.4.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.5.attention.self.query.weight', 'VL_adaptor.encoder.layer.5.attention.self.query.bias', 'VL_adaptor.encoder.layer.5.attention.self.key.weight', 'VL_adaptor.encoder.layer.5.attention.self.key.bias', 'VL_adaptor.encoder.layer.5.attention.self.value.weight', 'VL_adaptor.encoder.layer.5.attention.self.value.bias', 'VL_adaptor.encoder.layer.5.attention.output.dense.weight', 'VL_adaptor.encoder.layer.5.attention.output.dense.bias', 'VL_adaptor.encoder.layer.5.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.5.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.5.intermediate.dense.weight', 'VL_adaptor.encoder.layer.5.intermediate.dense.bias', 'VL_adaptor.encoder.layer.5.output.dense.weight', 'VL_adaptor.encoder.layer.5.output.dense.bias', 'VL_adaptor.encoder.layer.5.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.5.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.6.attention.self.query.weight', 'VL_adaptor.encoder.layer.6.attention.self.query.bias', 'VL_adaptor.encoder.layer.6.attention.self.key.weight', 'VL_adaptor.encoder.layer.6.attention.self.key.bias', 'VL_adaptor.encoder.layer.6.attention.self.value.weight', 'VL_adaptor.encoder.layer.6.attention.self.value.bias', 'VL_adaptor.encoder.layer.6.attention.output.dense.weight', 'VL_adaptor.encoder.layer.6.attention.output.dense.bias', 'VL_adaptor.encoder.layer.6.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.6.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.6.intermediate.dense.weight', 'VL_adaptor.encoder.layer.6.intermediate.dense.bias', 'VL_adaptor.encoder.layer.6.output.dense.weight', 'VL_adaptor.encoder.layer.6.output.dense.bias', 'VL_adaptor.encoder.layer.6.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.6.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.7.attention.self.query.weight', 'VL_adaptor.encoder.layer.7.attention.self.query.bias', 'VL_adaptor.encoder.layer.7.attention.self.key.weight', 'VL_adaptor.encoder.layer.7.attention.self.key.bias', 'VL_adaptor.encoder.layer.7.attention.self.value.weight', 'VL_adaptor.encoder.layer.7.attention.self.value.bias', 'VL_adaptor.encoder.layer.7.attention.output.dense.weight', 'VL_adaptor.encoder.layer.7.attention.output.dense.bias', 'VL_adaptor.encoder.layer.7.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.7.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.7.intermediate.dense.weight', 'VL_adaptor.encoder.layer.7.intermediate.dense.bias', 'VL_adaptor.encoder.layer.7.output.dense.weight', 'VL_adaptor.encoder.layer.7.output.dense.bias', 'VL_adaptor.encoder.layer.7.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.7.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.8.attention.self.query.weight', 'VL_adaptor.encoder.layer.8.attention.self.query.bias', 'VL_adaptor.encoder.layer.8.attention.self.key.weight', 'VL_adaptor.encoder.layer.8.attention.self.key.bias', 'VL_adaptor.encoder.layer.8.attention.self.value.weight', 'VL_adaptor.encoder.layer.8.attention.self.value.bias', 'VL_adaptor.encoder.layer.8.attention.output.dense.weight', 'VL_adaptor.encoder.layer.8.attention.output.dense.bias', 'VL_adaptor.encoder.layer.8.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.8.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.8.intermediate.dense.weight', 'VL_adaptor.encoder.layer.8.intermediate.dense.bias', 'VL_adaptor.encoder.layer.8.output.dense.weight', 'VL_adaptor.encoder.layer.8.output.dense.bias', 'VL_adaptor.encoder.layer.8.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.8.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.9.attention.self.query.weight', 'VL_adaptor.encoder.layer.9.attention.self.query.bias', 'VL_adaptor.encoder.layer.9.attention.self.key.weight', 'VL_adaptor.encoder.layer.9.attention.self.key.bias', 'VL_adaptor.encoder.layer.9.attention.self.value.weight', 'VL_adaptor.encoder.layer.9.attention.self.value.bias', 'VL_adaptor.encoder.layer.9.attention.output.dense.weight', 'VL_adaptor.encoder.layer.9.attention.output.dense.bias', 'VL_adaptor.encoder.layer.9.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.9.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.9.intermediate.dense.weight', 'VL_adaptor.encoder.layer.9.intermediate.dense.bias', 'VL_adaptor.encoder.layer.9.output.dense.weight', 'VL_adaptor.encoder.layer.9.output.dense.bias', 'VL_adaptor.encoder.layer.9.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.9.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.10.attention.self.query.weight', 'VL_adaptor.encoder.layer.10.attention.self.query.bias', 'VL_adaptor.encoder.layer.10.attention.self.key.weight', 'VL_adaptor.encoder.layer.10.attention.self.key.bias', 'VL_adaptor.encoder.layer.10.attention.self.value.weight', 'VL_adaptor.encoder.layer.10.attention.self.value.bias', 'VL_adaptor.encoder.layer.10.attention.output.dense.weight', 'VL_adaptor.encoder.layer.10.attention.output.dense.bias', 'VL_adaptor.encoder.layer.10.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.10.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.10.intermediate.dense.weight', 'VL_adaptor.encoder.layer.10.intermediate.dense.bias', 'VL_adaptor.encoder.layer.10.output.dense.weight', 'VL_adaptor.encoder.layer.10.output.dense.bias', 'VL_adaptor.encoder.layer.10.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.10.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.11.attention.self.query.weight', 'VL_adaptor.encoder.layer.11.attention.self.query.bias', 'VL_adaptor.encoder.layer.11.attention.self.key.weight', 'VL_adaptor.encoder.layer.11.attention.self.key.bias', 'VL_adaptor.encoder.layer.11.attention.self.value.weight', 'VL_adaptor.encoder.layer.11.attention.self.value.bias', 'VL_adaptor.encoder.layer.11.attention.output.dense.weight', 'VL_adaptor.encoder.layer.11.attention.output.dense.bias', 'VL_adaptor.encoder.layer.11.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.11.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.11.intermediate.dense.weight', 'VL_adaptor.encoder.layer.11.intermediate.dense.bias', 'VL_adaptor.encoder.layer.11.output.dense.weight', 'VL_adaptor.encoder.layer.11.output.dense.bias', 'VL_adaptor.encoder.layer.11.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.11.output.LayerNorm.bias', 'VL_adaptor.feat_proj.weight', 'VL_adaptor.feat_proj.bias', 'opt_model.model.decoder.embed_tokens.weight', 'opt_model.model.decoder.embed_positions.weight', 'opt_model.model.decoder.final_layer_norm.weight', 'opt_model.model.decoder.final_layer_norm.bias', 'opt_model.model.decoder.layers.0.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.0.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.0.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.0.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.0.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.0.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.0.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.0.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.0.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.0.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.0.fc1.weight', 'opt_model.model.decoder.layers.0.fc1.bias', 'opt_model.model.decoder.layers.0.fc2.weight', 'opt_model.model.decoder.layers.0.fc2.bias', 'opt_model.model.decoder.layers.0.final_layer_norm.weight', 'opt_model.model.decoder.layers.0.final_layer_norm.bias', 'opt_model.model.decoder.layers.1.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.1.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.1.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.1.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.1.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.1.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.1.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.1.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.1.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.1.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.1.fc1.weight', 'opt_model.model.decoder.layers.1.fc1.bias', 'opt_model.model.decoder.layers.1.fc2.weight', 'opt_model.model.decoder.layers.1.fc2.bias', 'opt_model.model.decoder.layers.1.final_layer_norm.weight', 'opt_model.model.decoder.layers.1.final_layer_norm.bias', 'opt_model.model.decoder.layers.2.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.2.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.2.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.2.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.2.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.2.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.2.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.2.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.2.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.2.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.2.fc1.weight', 'opt_model.model.decoder.layers.2.fc1.bias', 'opt_model.model.decoder.layers.2.fc2.weight', 'opt_model.model.decoder.layers.2.fc2.bias', 'opt_model.model.decoder.layers.2.final_layer_norm.weight', 'opt_model.model.decoder.layers.2.final_layer_norm.bias', 'opt_model.model.decoder.layers.3.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.3.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.3.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.3.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.3.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.3.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.3.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.3.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.3.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.3.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.3.fc1.weight', 'opt_model.model.decoder.layers.3.fc1.bias', 'opt_model.model.decoder.layers.3.fc2.weight', 'opt_model.model.decoder.layers.3.fc2.bias', 'opt_model.model.decoder.layers.3.final_layer_norm.weight', 'opt_model.model.decoder.layers.3.final_layer_norm.bias', 'opt_model.model.decoder.layers.4.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.4.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.4.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.4.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.4.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.4.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.4.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.4.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.4.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.4.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.4.fc1.weight', 'opt_model.model.decoder.layers.4.fc1.bias', 'opt_model.model.decoder.layers.4.fc2.weight', 'opt_model.model.decoder.layers.4.fc2.bias', 'opt_model.model.decoder.layers.4.final_layer_norm.weight', 'opt_model.model.decoder.layers.4.final_layer_norm.bias', 'opt_model.model.decoder.layers.5.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.5.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.5.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.5.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.5.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.5.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.5.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.5.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.5.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.5.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.5.fc1.weight', 'opt_model.model.decoder.layers.5.fc1.bias', 'opt_model.model.decoder.layers.5.fc2.weight', 'opt_model.model.decoder.layers.5.fc2.bias', 'opt_model.model.decoder.layers.5.final_layer_norm.weight', 'opt_model.model.decoder.layers.5.final_layer_norm.bias', 'opt_model.model.decoder.layers.6.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.6.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.6.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.6.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.6.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.6.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.6.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.6.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.6.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.6.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.6.fc1.weight', 'opt_model.model.decoder.layers.6.fc1.bias', 'opt_model.model.decoder.layers.6.fc2.weight', 'opt_model.model.decoder.layers.6.fc2.bias', 'opt_model.model.decoder.layers.6.final_layer_norm.weight', 'opt_model.model.decoder.layers.6.final_layer_norm.bias', 'opt_model.model.decoder.layers.7.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.7.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.7.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.7.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.7.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.7.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.7.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.7.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.7.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.7.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.7.fc1.weight', 'opt_model.model.decoder.layers.7.fc1.bias', 'opt_model.model.decoder.layers.7.fc2.weight', 'opt_model.model.decoder.layers.7.fc2.bias', 'opt_model.model.decoder.layers.7.final_layer_norm.weight', 'opt_model.model.decoder.layers.7.final_layer_norm.bias', 'opt_model.model.decoder.layers.8.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.8.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.8.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.8.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.8.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.8.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.8.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.8.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.8.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.8.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.8.fc1.weight', 'opt_model.model.decoder.layers.8.fc1.bias', 'opt_model.model.decoder.layers.8.fc2.weight', 'opt_model.model.decoder.layers.8.fc2.bias', 'opt_model.model.decoder.layers.8.final_layer_norm.weight', 'opt_model.model.decoder.layers.8.final_layer_norm.bias', 'opt_model.model.decoder.layers.9.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.9.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.9.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.9.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.9.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.9.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.9.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.9.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.9.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.9.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.9.fc1.weight', 'opt_model.model.decoder.layers.9.fc1.bias', 'opt_model.model.decoder.layers.9.fc2.weight', 'opt_model.model.decoder.layers.9.fc2.bias', 'opt_model.model.decoder.layers.9.final_layer_norm.weight', 'opt_model.model.decoder.layers.9.final_layer_norm.bias', 'opt_model.model.decoder.layers.10.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.10.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.10.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.10.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.10.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.10.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.10.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.10.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.10.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.10.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.10.fc1.weight', 'opt_model.model.decoder.layers.10.fc1.bias', 'opt_model.model.decoder.layers.10.fc2.weight', 'opt_model.model.decoder.layers.10.fc2.bias', 'opt_model.model.decoder.layers.10.final_layer_norm.weight', 'opt_model.model.decoder.layers.10.final_layer_norm.bias', 'opt_model.model.decoder.layers.11.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.11.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.11.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.11.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.11.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.11.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.11.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.11.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.11.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.11.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.11.fc1.weight', 'opt_model.model.decoder.layers.11.fc1.bias', 'opt_model.model.decoder.layers.11.fc2.weight', 'opt_model.model.decoder.layers.11.fc2.bias', 'opt_model.model.decoder.layers.11.final_layer_norm.weight', 'opt_model.model.decoder.layers.11.final_layer_norm.bias', 'opt_model.model.decoder.layers.12.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.12.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.12.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.12.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.12.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.12.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.12.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.12.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.12.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.12.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.12.fc1.weight', 'opt_model.model.decoder.layers.12.fc1.bias', 'opt_model.model.decoder.layers.12.fc2.weight', 'opt_model.model.decoder.layers.12.fc2.bias', 'opt_model.model.decoder.layers.12.final_layer_norm.weight', 'opt_model.model.decoder.layers.12.final_layer_norm.bias', 'opt_model.model.decoder.layers.13.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.13.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.13.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.13.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.13.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.13.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.13.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.13.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.13.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.13.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.13.fc1.weight', 'opt_model.model.decoder.layers.13.fc1.bias', 'opt_model.model.decoder.layers.13.fc2.weight', 'opt_model.model.decoder.layers.13.fc2.bias', 'opt_model.model.decoder.layers.13.final_layer_norm.weight', 'opt_model.model.decoder.layers.13.final_layer_norm.bias', 'opt_model.model.decoder.layers.14.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.14.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.14.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.14.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.14.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.14.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.14.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.14.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.14.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.14.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.14.fc1.weight', 'opt_model.model.decoder.layers.14.fc1.bias', 'opt_model.model.decoder.layers.14.fc2.weight', 'opt_model.model.decoder.layers.14.fc2.bias', 'opt_model.model.decoder.layers.14.final_layer_norm.weight', 'opt_model.model.decoder.layers.14.final_layer_norm.bias', 'opt_model.model.decoder.layers.15.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.15.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.15.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.15.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.15.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.15.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.15.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.15.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.15.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.15.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.15.fc1.weight', 'opt_model.model.decoder.layers.15.fc1.bias', 'opt_model.model.decoder.layers.15.fc2.weight', 'opt_model.model.decoder.layers.15.fc2.bias', 'opt_model.model.decoder.layers.15.final_layer_norm.weight', 'opt_model.model.decoder.layers.15.final_layer_norm.bias', 'opt_model.model.decoder.layers.16.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.16.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.16.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.16.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.16.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.16.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.16.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.16.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.16.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.16.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.16.fc1.weight', 'opt_model.model.decoder.layers.16.fc1.bias', 'opt_model.model.decoder.layers.16.fc2.weight', 'opt_model.model.decoder.layers.16.fc2.bias', 'opt_model.model.decoder.layers.16.final_layer_norm.weight', 'opt_model.model.decoder.layers.16.final_layer_norm.bias', 'opt_model.model.decoder.layers.17.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.17.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.17.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.17.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.17.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.17.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.17.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.17.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.17.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.17.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.17.fc1.weight', 'opt_model.model.decoder.layers.17.fc1.bias', 'opt_model.model.decoder.layers.17.fc2.weight', 'opt_model.model.decoder.layers.17.fc2.bias', 'opt_model.model.decoder.layers.17.final_layer_norm.weight', 'opt_model.model.decoder.layers.17.final_layer_norm.bias', 'opt_model.model.decoder.layers.18.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.18.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.18.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.18.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.18.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.18.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.18.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.18.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.18.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.18.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.18.fc1.weight', 'opt_model.model.decoder.layers.18.fc1.bias', 'opt_model.model.decoder.layers.18.fc2.weight', 'opt_model.model.decoder.layers.18.fc2.bias', 'opt_model.model.decoder.layers.18.final_layer_norm.weight', 'opt_model.model.decoder.layers.18.final_layer_norm.bias', 'opt_model.model.decoder.layers.19.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.19.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.19.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.19.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.19.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.19.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.19.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.19.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.19.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.19.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.19.fc1.weight', 'opt_model.model.decoder.layers.19.fc1.bias', 'opt_model.model.decoder.layers.19.fc2.weight', 'opt_model.model.decoder.layers.19.fc2.bias', 'opt_model.model.decoder.layers.19.final_layer_norm.weight', 'opt_model.model.decoder.layers.19.final_layer_norm.bias', 'opt_model.model.decoder.layers.20.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.20.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.20.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.20.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.20.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.20.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.20.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.20.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.20.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.20.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.20.fc1.weight', 'opt_model.model.decoder.layers.20.fc1.bias', 'opt_model.model.decoder.layers.20.fc2.weight', 'opt_model.model.decoder.layers.20.fc2.bias', 'opt_model.model.decoder.layers.20.final_layer_norm.weight', 'opt_model.model.decoder.layers.20.final_layer_norm.bias', 'opt_model.model.decoder.layers.21.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.21.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.21.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.21.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.21.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.21.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.21.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.21.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.21.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.21.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.21.fc1.weight', 'opt_model.model.decoder.layers.21.fc1.bias', 'opt_model.model.decoder.layers.21.fc2.weight', 'opt_model.model.decoder.layers.21.fc2.bias', 'opt_model.model.decoder.layers.21.final_layer_norm.weight', 'opt_model.model.decoder.layers.21.final_layer_norm.bias', 'opt_model.model.decoder.layers.22.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.22.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.22.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.22.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.22.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.22.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.22.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.22.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.22.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.22.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.22.fc1.weight', 'opt_model.model.decoder.layers.22.fc1.bias', 'opt_model.model.decoder.layers.22.fc2.weight', 'opt_model.model.decoder.layers.22.fc2.bias', 'opt_model.model.decoder.layers.22.final_layer_norm.weight', 'opt_model.model.decoder.layers.22.final_layer_norm.bias', 'opt_model.model.decoder.layers.23.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.23.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.23.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.23.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.23.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.23.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.23.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.23.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.23.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.23.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.23.fc1.weight', 'opt_model.model.decoder.layers.23.fc1.bias', 'opt_model.model.decoder.layers.23.fc2.weight', 'opt_model.model.decoder.layers.23.fc2.bias', 'opt_model.model.decoder.layers.23.final_layer_norm.weight', 'opt_model.model.decoder.layers.23.final_layer_norm.bias', 'opt_model.model.decoder.layers.24.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.24.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.24.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.24.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.24.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.24.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.24.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.24.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.24.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.24.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.24.fc1.weight', 'opt_model.model.decoder.layers.24.fc1.bias', 'opt_model.model.decoder.layers.24.fc2.weight', 'opt_model.model.decoder.layers.24.fc2.bias', 'opt_model.model.decoder.layers.24.final_layer_norm.weight', 'opt_model.model.decoder.layers.24.final_layer_norm.bias', 'opt_model.model.decoder.layers.25.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.25.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.25.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.25.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.25.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.25.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.25.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.25.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.25.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.25.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.25.fc1.weight', 'opt_model.model.decoder.layers.25.fc1.bias', 'opt_model.model.decoder.layers.25.fc2.weight', 'opt_model.model.decoder.layers.25.fc2.bias', 'opt_model.model.decoder.layers.25.final_layer_norm.weight', 'opt_model.model.decoder.layers.25.final_layer_norm.bias', 'opt_model.model.decoder.layers.26.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.26.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.26.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.26.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.26.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.26.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.26.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.26.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.26.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.26.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.26.fc1.weight', 'opt_model.model.decoder.layers.26.fc1.bias', 'opt_model.model.decoder.layers.26.fc2.weight', 'opt_model.model.decoder.layers.26.fc2.bias', 'opt_model.model.decoder.layers.26.final_layer_norm.weight', 'opt_model.model.decoder.layers.26.final_layer_norm.bias', 'opt_model.model.decoder.layers.27.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.27.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.27.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.27.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.27.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.27.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.27.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.27.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.27.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.27.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.27.fc1.weight', 'opt_model.model.decoder.layers.27.fc1.bias', 'opt_model.model.decoder.layers.27.fc2.weight', 'opt_model.model.decoder.layers.27.fc2.bias', 'opt_model.model.decoder.layers.27.final_layer_norm.weight', 'opt_model.model.decoder.layers.27.final_layer_norm.bias', 'opt_model.model.decoder.layers.28.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.28.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.28.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.28.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.28.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.28.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.28.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.28.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.28.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.28.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.28.fc1.weight', 'opt_model.model.decoder.layers.28.fc1.bias', 'opt_model.model.decoder.layers.28.fc2.weight', 'opt_model.model.decoder.layers.28.fc2.bias', 'opt_model.model.decoder.layers.28.final_layer_norm.weight', 'opt_model.model.decoder.layers.28.final_layer_norm.bias', 'opt_model.model.decoder.layers.29.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.29.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.29.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.29.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.29.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.29.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.29.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.29.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.29.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.29.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.29.fc1.weight', 'opt_model.model.decoder.layers.29.fc1.bias', 'opt_model.model.decoder.layers.29.fc2.weight', 'opt_model.model.decoder.layers.29.fc2.bias', 'opt_model.model.decoder.layers.29.final_layer_norm.weight', 'opt_model.model.decoder.layers.29.final_layer_norm.bias', 'opt_model.model.decoder.layers.30.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.30.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.30.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.30.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.30.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.30.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.30.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.30.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.30.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.30.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.30.fc1.weight', 'opt_model.model.decoder.layers.30.fc1.bias', 'opt_model.model.decoder.layers.30.fc2.weight', 'opt_model.model.decoder.layers.30.fc2.bias', 'opt_model.model.decoder.layers.30.final_layer_norm.weight', 'opt_model.model.decoder.layers.30.final_layer_norm.bias', 'opt_model.model.decoder.layers.31.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.31.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.31.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.31.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.31.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.31.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.31.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.31.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.31.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.31.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.31.fc1.weight', 'opt_model.model.decoder.layers.31.fc1.bias', 'opt_model.model.decoder.layers.31.fc2.weight', 'opt_model.model.decoder.layers.31.fc2.bias', 'opt_model.model.decoder.layers.31.final_layer_norm.weight', 'opt_model.model.decoder.layers.31.final_layer_norm.bias', 'opt_model.lm_head.weight', 'Darkformer.embeddings.position_ids', 'Darkformer.embeddings.word_embeddings.weight', 'Darkformer.embeddings.position_embeddings.weight', 'Darkformer.embeddings.token_type_embeddings.weight', 'Darkformer.embeddings.LayerNorm.weight', 'Darkformer.embeddings.LayerNorm.bias', 'Darkformer.encoder.layer.0.attention.self.query.weight', 'Darkformer.encoder.layer.0.attention.self.query.bias', 'Darkformer.encoder.layer.0.attention.self.key.weight', 'Darkformer.encoder.layer.0.attention.self.key.bias', 'Darkformer.encoder.layer.0.attention.self.value.weight', 'Darkformer.encoder.layer.0.attention.self.value.bias', 'Darkformer.encoder.layer.0.attention.output.dense.weight', 'Darkformer.encoder.layer.0.attention.output.dense.bias', 'Darkformer.encoder.layer.0.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.0.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.0.intermediate.dense.weight', 'Darkformer.encoder.layer.0.intermediate.dense.bias', 'Darkformer.encoder.layer.0.output.dense.weight', 'Darkformer.encoder.layer.0.output.dense.bias', 'Darkformer.encoder.layer.0.output.LayerNorm.weight', 'Darkformer.encoder.layer.0.output.LayerNorm.bias', 'Darkformer.encoder.layer.1.attention.self.query.weight', 'Darkformer.encoder.layer.1.attention.self.query.bias', 'Darkformer.encoder.layer.1.attention.self.key.weight', 'Darkformer.encoder.layer.1.attention.self.key.bias', 'Darkformer.encoder.layer.1.attention.self.value.weight', 'Darkformer.encoder.layer.1.attention.self.value.bias', 'Darkformer.encoder.layer.1.attention.output.dense.weight', 'Darkformer.encoder.layer.1.attention.output.dense.bias', 'Darkformer.encoder.layer.1.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.1.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.1.intermediate.dense.weight', 'Darkformer.encoder.layer.1.intermediate.dense.bias', 'Darkformer.encoder.layer.1.output.dense.weight', 'Darkformer.encoder.layer.1.output.dense.bias', 'Darkformer.encoder.layer.1.output.LayerNorm.weight', 'Darkformer.encoder.layer.1.output.LayerNorm.bias', 'Darkformer.encoder.layer.2.attention.self.query.weight', 'Darkformer.encoder.layer.2.attention.self.query.bias', 'Darkformer.encoder.layer.2.attention.self.key.weight', 'Darkformer.encoder.layer.2.attention.self.key.bias', 'Darkformer.encoder.layer.2.attention.self.value.weight', 'Darkformer.encoder.layer.2.attention.self.value.bias', 'Darkformer.encoder.layer.2.attention.output.dense.weight', 'Darkformer.encoder.layer.2.attention.output.dense.bias', 'Darkformer.encoder.layer.2.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.2.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.2.intermediate.dense.weight', 'Darkformer.encoder.layer.2.intermediate.dense.bias', 'Darkformer.encoder.layer.2.output.dense.weight', 'Darkformer.encoder.layer.2.output.dense.bias', 'Darkformer.encoder.layer.2.output.LayerNorm.weight', 'Darkformer.encoder.layer.2.output.LayerNorm.bias', 'Darkformer.encoder.layer.3.attention.self.query.weight', 'Darkformer.encoder.layer.3.attention.self.query.bias', 'Darkformer.encoder.layer.3.attention.self.key.weight', 'Darkformer.encoder.layer.3.attention.self.key.bias', 'Darkformer.encoder.layer.3.attention.self.value.weight', 'Darkformer.encoder.layer.3.attention.self.value.bias', 'Darkformer.encoder.layer.3.attention.output.dense.weight', 'Darkformer.encoder.layer.3.attention.output.dense.bias', 'Darkformer.encoder.layer.3.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.3.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.3.intermediate.dense.weight', 'Darkformer.encoder.layer.3.intermediate.dense.bias', 'Darkformer.encoder.layer.3.output.dense.weight', 'Darkformer.encoder.layer.3.output.dense.bias', 'Darkformer.encoder.layer.3.output.LayerNorm.weight', 'Darkformer.encoder.layer.3.output.LayerNorm.bias', 'Darkformer.encoder.layer.4.attention.self.query.weight', 'Darkformer.encoder.layer.4.attention.self.query.bias', 'Darkformer.encoder.layer.4.attention.self.key.weight', 'Darkformer.encoder.layer.4.attention.self.key.bias', 'Darkformer.encoder.layer.4.attention.self.value.weight', 'Darkformer.encoder.layer.4.attention.self.value.bias', 'Darkformer.encoder.layer.4.attention.output.dense.weight', 'Darkformer.encoder.layer.4.attention.output.dense.bias', 'Darkformer.encoder.layer.4.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.4.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.4.intermediate.dense.weight', 'Darkformer.encoder.layer.4.intermediate.dense.bias', 'Darkformer.encoder.layer.4.output.dense.weight', 'Darkformer.encoder.layer.4.output.dense.bias', 'Darkformer.encoder.layer.4.output.LayerNorm.weight', 'Darkformer.encoder.layer.4.output.LayerNorm.bias', 'Darkformer.encoder.layer.5.attention.self.query.weight', 'Darkformer.encoder.layer.5.attention.self.query.bias', 'Darkformer.encoder.layer.5.attention.self.key.weight', 'Darkformer.encoder.layer.5.attention.self.key.bias', 'Darkformer.encoder.layer.5.attention.self.value.weight', 'Darkformer.encoder.layer.5.attention.self.value.bias', 'Darkformer.encoder.layer.5.attention.output.dense.weight', 'Darkformer.encoder.layer.5.attention.output.dense.bias', 'Darkformer.encoder.layer.5.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.5.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.5.intermediate.dense.weight', 'Darkformer.encoder.layer.5.intermediate.dense.bias', 'Darkformer.encoder.layer.5.output.dense.weight', 'Darkformer.encoder.layer.5.output.dense.bias', 'Darkformer.encoder.layer.5.output.LayerNorm.weight', 'Darkformer.encoder.layer.5.output.LayerNorm.bias', 'Darkformer.encoder.layer.6.attention.self.query.weight', 'Darkformer.encoder.layer.6.attention.self.query.bias', 'Darkformer.encoder.layer.6.attention.self.key.weight', 'Darkformer.encoder.layer.6.attention.self.key.bias', 'Darkformer.encoder.layer.6.attention.self.value.weight', 'Darkformer.encoder.layer.6.attention.self.value.bias', 'Darkformer.encoder.layer.6.attention.output.dense.weight', 'Darkformer.encoder.layer.6.attention.output.dense.bias', 'Darkformer.encoder.layer.6.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.6.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.6.intermediate.dense.weight', 'Darkformer.encoder.layer.6.intermediate.dense.bias', 'Darkformer.encoder.layer.6.output.dense.weight', 'Darkformer.encoder.layer.6.output.dense.bias', 'Darkformer.encoder.layer.6.output.LayerNorm.weight', 'Darkformer.encoder.layer.6.output.LayerNorm.bias', 'Darkformer.encoder.layer.7.attention.self.query.weight', 'Darkformer.encoder.layer.7.attention.self.query.bias', 'Darkformer.encoder.layer.7.attention.self.key.weight', 'Darkformer.encoder.layer.7.attention.self.key.bias', 'Darkformer.encoder.layer.7.attention.self.value.weight', 'Darkformer.encoder.layer.7.attention.self.value.bias', 'Darkformer.encoder.layer.7.attention.output.dense.weight', 'Darkformer.encoder.layer.7.attention.output.dense.bias', 'Darkformer.encoder.layer.7.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.7.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.7.intermediate.dense.weight', 'Darkformer.encoder.layer.7.intermediate.dense.bias', 'Darkformer.encoder.layer.7.output.dense.weight', 'Darkformer.encoder.layer.7.output.dense.bias', 'Darkformer.encoder.layer.7.output.LayerNorm.weight', 'Darkformer.encoder.layer.7.output.LayerNorm.bias', 'Darkformer.encoder.layer.8.attention.self.query.weight', 'Darkformer.encoder.layer.8.attention.self.query.bias', 'Darkformer.encoder.layer.8.attention.self.key.weight', 'Darkformer.encoder.layer.8.attention.self.key.bias', 'Darkformer.encoder.layer.8.attention.self.value.weight', 'Darkformer.encoder.layer.8.attention.self.value.bias', 'Darkformer.encoder.layer.8.attention.output.dense.weight', 'Darkformer.encoder.layer.8.attention.output.dense.bias', 'Darkformer.encoder.layer.8.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.8.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.8.intermediate.dense.weight', 'Darkformer.encoder.layer.8.intermediate.dense.bias', 'Darkformer.encoder.layer.8.output.dense.weight', 'Darkformer.encoder.layer.8.output.dense.bias', 'Darkformer.encoder.layer.8.output.LayerNorm.weight', 'Darkformer.encoder.layer.8.output.LayerNorm.bias', 'Darkformer.encoder.layer.9.attention.self.query.weight', 'Darkformer.encoder.layer.9.attention.self.query.bias', 'Darkformer.encoder.layer.9.attention.self.key.weight', 'Darkformer.encoder.layer.9.attention.self.key.bias', 'Darkformer.encoder.layer.9.attention.self.value.weight', 'Darkformer.encoder.layer.9.attention.self.value.bias', 'Darkformer.encoder.layer.9.attention.output.dense.weight', 'Darkformer.encoder.layer.9.attention.output.dense.bias', 'Darkformer.encoder.layer.9.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.9.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.9.intermediate.dense.weight', 'Darkformer.encoder.layer.9.intermediate.dense.bias', 'Darkformer.encoder.layer.9.output.dense.weight', 'Darkformer.encoder.layer.9.output.dense.bias', 'Darkformer.encoder.layer.9.output.LayerNorm.weight', 'Darkformer.encoder.layer.9.output.LayerNorm.bias', 'Darkformer.encoder.layer.10.attention.self.query.weight', 'Darkformer.encoder.layer.10.attention.self.query.bias', 'Darkformer.encoder.layer.10.attention.self.key.weight', 'Darkformer.encoder.layer.10.attention.self.key.bias', 'Darkformer.encoder.layer.10.attention.self.value.weight', 'Darkformer.encoder.layer.10.attention.self.value.bias', 'Darkformer.encoder.layer.10.attention.output.dense.weight', 'Darkformer.encoder.layer.10.attention.output.dense.bias', 'Darkformer.encoder.layer.10.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.10.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.10.intermediate.dense.weight', 'Darkformer.encoder.layer.10.intermediate.dense.bias', 'Darkformer.encoder.layer.10.output.dense.weight', 'Darkformer.encoder.layer.10.output.dense.bias', 'Darkformer.encoder.layer.10.output.LayerNorm.weight', 'Darkformer.encoder.layer.10.output.LayerNorm.bias', 'Darkformer.encoder.layer.11.attention.self.query.weight', 'Darkformer.encoder.layer.11.attention.self.query.bias', 'Darkformer.encoder.layer.11.attention.self.key.weight', 'Darkformer.encoder.layer.11.attention.self.key.bias', 'Darkformer.encoder.layer.11.attention.self.value.weight', 'Darkformer.encoder.layer.11.attention.self.value.bias', 'Darkformer.encoder.layer.11.attention.output.dense.weight', 'Darkformer.encoder.layer.11.attention.output.dense.bias', 'Darkformer.encoder.layer.11.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.11.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.11.intermediate.dense.weight', 'Darkformer.encoder.layer.11.intermediate.dense.bias', 'Darkformer.encoder.layer.11.output.dense.weight', 'Darkformer.encoder.layer.11.output.dense.bias', 'Darkformer.encoder.layer.11.output.LayerNorm.weight', 'Darkformer.encoder.layer.11.output.LayerNorm.bias', 'Darkformer.cls_proj.weight', 'Darkformer.cls_proj.bias', 'Darkformer.pooler.0.weight', 'Darkformer.pooler.0.bias', 'Darkformer.opt_proj.weight', 'Darkformer.opt_proj.bias']
2024-01-03 03:50:36,989 [INFO] load checkpoint from https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained_opt2.7b.pth
2024-01-03 03:50:39,875 [INFO] Missing keys ['VL_adaptor.embeddings.position_ids', 'VL_adaptor.embeddings.word_embeddings.weight', 'VL_adaptor.embeddings.position_embeddings.weight', 'VL_adaptor.embeddings.token_type_embeddings.weight', 'VL_adaptor.embeddings.LayerNorm.weight', 'VL_adaptor.embeddings.LayerNorm.bias', 'VL_adaptor.encoder.layer.0.attention.self.query.weight', 'VL_adaptor.encoder.layer.0.attention.self.query.bias', 'VL_adaptor.encoder.layer.0.attention.self.key.weight', 'VL_adaptor.encoder.layer.0.attention.self.key.bias', 'VL_adaptor.encoder.layer.0.attention.self.value.weight', 'VL_adaptor.encoder.layer.0.attention.self.value.bias', 'VL_adaptor.encoder.layer.0.attention.output.dense.weight', 'VL_adaptor.encoder.layer.0.attention.output.dense.bias', 'VL_adaptor.encoder.layer.0.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.0.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.0.intermediate.dense.weight', 'VL_adaptor.encoder.layer.0.intermediate.dense.bias', 'VL_adaptor.encoder.layer.0.output.dense.weight', 'VL_adaptor.encoder.layer.0.output.dense.bias', 'VL_adaptor.encoder.layer.0.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.0.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.1.attention.self.query.weight', 'VL_adaptor.encoder.layer.1.attention.self.query.bias', 'VL_adaptor.encoder.layer.1.attention.self.key.weight', 'VL_adaptor.encoder.layer.1.attention.self.key.bias', 'VL_adaptor.encoder.layer.1.attention.self.value.weight', 'VL_adaptor.encoder.layer.1.attention.self.value.bias', 'VL_adaptor.encoder.layer.1.attention.output.dense.weight', 'VL_adaptor.encoder.layer.1.attention.output.dense.bias', 'VL_adaptor.encoder.layer.1.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.1.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.1.intermediate.dense.weight', 'VL_adaptor.encoder.layer.1.intermediate.dense.bias', 'VL_adaptor.encoder.layer.1.output.dense.weight', 'VL_adaptor.encoder.layer.1.output.dense.bias', 'VL_adaptor.encoder.layer.1.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.1.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.2.attention.self.query.weight', 'VL_adaptor.encoder.layer.2.attention.self.query.bias', 'VL_adaptor.encoder.layer.2.attention.self.key.weight', 'VL_adaptor.encoder.layer.2.attention.self.key.bias', 'VL_adaptor.encoder.layer.2.attention.self.value.weight', 'VL_adaptor.encoder.layer.2.attention.self.value.bias', 'VL_adaptor.encoder.layer.2.attention.output.dense.weight', 'VL_adaptor.encoder.layer.2.attention.output.dense.bias', 'VL_adaptor.encoder.layer.2.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.2.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.2.intermediate.dense.weight', 'VL_adaptor.encoder.layer.2.intermediate.dense.bias', 'VL_adaptor.encoder.layer.2.output.dense.weight', 'VL_adaptor.encoder.layer.2.output.dense.bias', 'VL_adaptor.encoder.layer.2.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.2.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.3.attention.self.query.weight', 'VL_adaptor.encoder.layer.3.attention.self.query.bias', 'VL_adaptor.encoder.layer.3.attention.self.key.weight', 'VL_adaptor.encoder.layer.3.attention.self.key.bias', 'VL_adaptor.encoder.layer.3.attention.self.value.weight', 'VL_adaptor.encoder.layer.3.attention.self.value.bias', 'VL_adaptor.encoder.layer.3.attention.output.dense.weight', 'VL_adaptor.encoder.layer.3.attention.output.dense.bias', 'VL_adaptor.encoder.layer.3.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.3.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.3.intermediate.dense.weight', 'VL_adaptor.encoder.layer.3.intermediate.dense.bias', 'VL_adaptor.encoder.layer.3.output.dense.weight', 'VL_adaptor.encoder.layer.3.output.dense.bias', 'VL_adaptor.encoder.layer.3.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.3.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.4.attention.self.query.weight', 'VL_adaptor.encoder.layer.4.attention.self.query.bias', 'VL_adaptor.encoder.layer.4.attention.self.key.weight', 'VL_adaptor.encoder.layer.4.attention.self.key.bias', 'VL_adaptor.encoder.layer.4.attention.self.value.weight', 'VL_adaptor.encoder.layer.4.attention.self.value.bias', 'VL_adaptor.encoder.layer.4.attention.output.dense.weight', 'VL_adaptor.encoder.layer.4.attention.output.dense.bias', 'VL_adaptor.encoder.layer.4.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.4.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.4.intermediate.dense.weight', 'VL_adaptor.encoder.layer.4.intermediate.dense.bias', 'VL_adaptor.encoder.layer.4.output.dense.weight', 'VL_adaptor.encoder.layer.4.output.dense.bias', 'VL_adaptor.encoder.layer.4.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.4.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.5.attention.self.query.weight', 'VL_adaptor.encoder.layer.5.attention.self.query.bias', 'VL_adaptor.encoder.layer.5.attention.self.key.weight', 'VL_adaptor.encoder.layer.5.attention.self.key.bias', 'VL_adaptor.encoder.layer.5.attention.self.value.weight', 'VL_adaptor.encoder.layer.5.attention.self.value.bias', 'VL_adaptor.encoder.layer.5.attention.output.dense.weight', 'VL_adaptor.encoder.layer.5.attention.output.dense.bias', 'VL_adaptor.encoder.layer.5.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.5.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.5.intermediate.dense.weight', 'VL_adaptor.encoder.layer.5.intermediate.dense.bias', 'VL_adaptor.encoder.layer.5.output.dense.weight', 'VL_adaptor.encoder.layer.5.output.dense.bias', 'VL_adaptor.encoder.layer.5.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.5.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.6.attention.self.query.weight', 'VL_adaptor.encoder.layer.6.attention.self.query.bias', 'VL_adaptor.encoder.layer.6.attention.self.key.weight', 'VL_adaptor.encoder.layer.6.attention.self.key.bias', 'VL_adaptor.encoder.layer.6.attention.self.value.weight', 'VL_adaptor.encoder.layer.6.attention.self.value.bias', 'VL_adaptor.encoder.layer.6.attention.output.dense.weight', 'VL_adaptor.encoder.layer.6.attention.output.dense.bias', 'VL_adaptor.encoder.layer.6.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.6.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.6.intermediate.dense.weight', 'VL_adaptor.encoder.layer.6.intermediate.dense.bias', 'VL_adaptor.encoder.layer.6.output.dense.weight', 'VL_adaptor.encoder.layer.6.output.dense.bias', 'VL_adaptor.encoder.layer.6.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.6.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.7.attention.self.query.weight', 'VL_adaptor.encoder.layer.7.attention.self.query.bias', 'VL_adaptor.encoder.layer.7.attention.self.key.weight', 'VL_adaptor.encoder.layer.7.attention.self.key.bias', 'VL_adaptor.encoder.layer.7.attention.self.value.weight', 'VL_adaptor.encoder.layer.7.attention.self.value.bias', 'VL_adaptor.encoder.layer.7.attention.output.dense.weight', 'VL_adaptor.encoder.layer.7.attention.output.dense.bias', 'VL_adaptor.encoder.layer.7.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.7.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.7.intermediate.dense.weight', 'VL_adaptor.encoder.layer.7.intermediate.dense.bias', 'VL_adaptor.encoder.layer.7.output.dense.weight', 'VL_adaptor.encoder.layer.7.output.dense.bias', 'VL_adaptor.encoder.layer.7.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.7.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.8.attention.self.query.weight', 'VL_adaptor.encoder.layer.8.attention.self.query.bias', 'VL_adaptor.encoder.layer.8.attention.self.key.weight', 'VL_adaptor.encoder.layer.8.attention.self.key.bias', 'VL_adaptor.encoder.layer.8.attention.self.value.weight', 'VL_adaptor.encoder.layer.8.attention.self.value.bias', 'VL_adaptor.encoder.layer.8.attention.output.dense.weight', 'VL_adaptor.encoder.layer.8.attention.output.dense.bias', 'VL_adaptor.encoder.layer.8.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.8.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.8.intermediate.dense.weight', 'VL_adaptor.encoder.layer.8.intermediate.dense.bias', 'VL_adaptor.encoder.layer.8.output.dense.weight', 'VL_adaptor.encoder.layer.8.output.dense.bias', 'VL_adaptor.encoder.layer.8.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.8.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.9.attention.self.query.weight', 'VL_adaptor.encoder.layer.9.attention.self.query.bias', 'VL_adaptor.encoder.layer.9.attention.self.key.weight', 'VL_adaptor.encoder.layer.9.attention.self.key.bias', 'VL_adaptor.encoder.layer.9.attention.self.value.weight', 'VL_adaptor.encoder.layer.9.attention.self.value.bias', 'VL_adaptor.encoder.layer.9.attention.output.dense.weight', 'VL_adaptor.encoder.layer.9.attention.output.dense.bias', 'VL_adaptor.encoder.layer.9.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.9.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.9.intermediate.dense.weight', 'VL_adaptor.encoder.layer.9.intermediate.dense.bias', 'VL_adaptor.encoder.layer.9.output.dense.weight', 'VL_adaptor.encoder.layer.9.output.dense.bias', 'VL_adaptor.encoder.layer.9.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.9.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.10.attention.self.query.weight', 'VL_adaptor.encoder.layer.10.attention.self.query.bias', 'VL_adaptor.encoder.layer.10.attention.self.key.weight', 'VL_adaptor.encoder.layer.10.attention.self.key.bias', 'VL_adaptor.encoder.layer.10.attention.self.value.weight', 'VL_adaptor.encoder.layer.10.attention.self.value.bias', 'VL_adaptor.encoder.layer.10.attention.output.dense.weight', 'VL_adaptor.encoder.layer.10.attention.output.dense.bias', 'VL_adaptor.encoder.layer.10.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.10.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.10.intermediate.dense.weight', 'VL_adaptor.encoder.layer.10.intermediate.dense.bias', 'VL_adaptor.encoder.layer.10.output.dense.weight', 'VL_adaptor.encoder.layer.10.output.dense.bias', 'VL_adaptor.encoder.layer.10.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.10.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.11.attention.self.query.weight', 'VL_adaptor.encoder.layer.11.attention.self.query.bias', 'VL_adaptor.encoder.layer.11.attention.self.key.weight', 'VL_adaptor.encoder.layer.11.attention.self.key.bias', 'VL_adaptor.encoder.layer.11.attention.self.value.weight', 'VL_adaptor.encoder.layer.11.attention.self.value.bias', 'VL_adaptor.encoder.layer.11.attention.output.dense.weight', 'VL_adaptor.encoder.layer.11.attention.output.dense.bias', 'VL_adaptor.encoder.layer.11.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.11.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.11.intermediate.dense.weight', 'VL_adaptor.encoder.layer.11.intermediate.dense.bias', 'VL_adaptor.encoder.layer.11.output.dense.weight', 'VL_adaptor.encoder.layer.11.output.dense.bias', 'VL_adaptor.encoder.layer.11.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.11.output.LayerNorm.bias', 'VL_adaptor.feat_proj.weight', 'VL_adaptor.feat_proj.bias', 'opt_proj.weight', 'opt_proj.bias']
2024-01-03 03:50:39,876 [INFO] load checkpoint from /home/yiren/LAVIS/lavis/output/BLIP-T/Pretrain_stage0/vq/40m-noisy/checkpoint_60000.pth
2024-01-03 03:50:39,964 [INFO] Start training
2024-01-03 03:50:40,812 [INFO] dataset_ratios not specified, datasets will be concatenated (map-style datasets) or chained (webdataset.DataPipeline).
2024-01-03 03:50:40,812 [INFO] Loaded 259910 records for train split from the dataset.
2024-01-03 03:50:40,812 [INFO] Loaded 3000 records for val split from the dataset.
2024-01-03 03:50:40,822 [INFO] number of trainable parameters: 87810304
2024-01-03 03:50:40,822 [INFO] Start training epoch 0, 2030 iters per inner epoch.
/home/yiren/anaconda3/envs/lavis-OpCounter/lib/python3.8/site-packages/transformers/modeling_utils.py:810: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
warnings.warn(
Train: data epoch: [0] [ 0/2030] eta: 0:39:59 lr: 0.000001 loss: 21.7157 time: 1.1820 data: 0.0000 max mem: 8448
2024-01-03 03:50:42,008 [INFO] Reducer buckets have been rebuilt in this iteration.
Train: data epoch: [0] [ 50/2030] eta: 0:04:48 lr: 0.000006 loss: 5.4797 time: 0.1261 data: 0.0000 max mem: 9128
Train: data epoch: [0] [ 100/2030] eta: 0:04:23 lr: 0.000011 loss: 3.6765 time: 0.1255 data: 0.0000 max mem: 9128
Train: data epoch: [0] [ 150/2030] eta: 0:04:10 lr: 0.000016 loss: 2.6226 time: 0.1256 data: 0.0000 max mem: 9128
Train: data epoch: [0] [ 200/2030] eta: 0:04:00 lr: 0.000021 loss: 2.3513 time: 0.1262 data: 0.0000 max mem: 9128
You should see the loss at each iteration matching exactly to the numbers provided here
The missing keys warning, if I recall, is added (then removed by authors in later version) to notify users the loaded weights. In my case, we load LLM, but not P-former, thus, it throws such warnings.
If I recall well,
The missing keys warning, if I recall, is added (then removed by authors in later version) to notify users the loaded weights. In my case, we load LLM, but not P-former, thus, it throws such warnings.
If I recall well,
- if we load P-former, it may throw warnings for missing keys in LLM
- and we load LLM, it throw warnings for missing keys in P-former (or Q-former etc.)
Great, thanks for your patience. I will overlook this warning.
I suggest adding some related discussion in Readme.md
Hi, Dr. Jian: Thanks for this video repo. I tried to reproduce the report result but still have two problems:
In "lavis/projects/blip2/train/caption_vatex_stage1.yaml", I gave the parameter "pretrained_stage0" a pre-trained Pformer from:
But a missing key warning happens when performing 104-105 line of base_model.py:
I am not sure whether this is caused by missing key or setting num_beam=1.
Thanks in advance! Best Ning