yiren-jian / BLIText-video

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training: Video Captioning
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

For Missing Keys in Loading P-former and The Setting of num_beams #1

Closed NingMa-AI closed 10 months ago

NingMa-AI commented 10 months ago

Hi, Dr. Jian: Thanks for this video repo. I tried to reproduce the report result but still have two problems:

  1. In "lavis/projects/blip2/train/caption_vatex_stage1.yaml", I gave the parameter "pretrained_stage0" a pre-trained Pformer from:
    1704258484248 But a missing key warning happens when performing 104-105 line of base_model.py: 1704258708015

    1. Using parameter "num_beams=5" report a dimension mismatching error at the evaluation stage. But when I set it to 1, the prediction is trivial:

1704259302852 I am not sure whether this is caused by missing key or setting num_beam=1.

Thanks in advance! Best Ning

yiren-jian commented 10 months ago

Please use the pretrained models from here

There are also my reproduced results from yesterday.

P-former: models/ours/pformer/checkpoint_60000.pth Stage 1: models/ours/Caption_vatex_stage1 Stage 2: models/ours/Caption_vatex_stage2 You will find the generated captions here: models/ours/Caption_vatex_stage2/20240102015/result

yiren-jian commented 10 months ago
Screen Shot 2024-01-03 at 12 47 20 AM

I have the vatex datasets in the following folders. I also uploaded the json files to the above link. Please follow the package versions in pip_freeze.txt to closely reproduce the results. We use a single RTX-A6000.

Please use transformers==4.26.1

NingMa-AI commented 10 months ago

Thanks for the fast response! After using transformers version 4.26.1, the error for "num_beams =5" was resolved!

I still have the missing key problem.

The main reason might be that I can not install your local version of "salesforce-lavis @ file:///home/ssd1/LipTome" in pip_freeeze.txt.

NingMa-AI commented 10 months ago

Besides the above salesforce-lavis, I have double checked that all the used packages have the same version with the provided pip_freeze.txt.

yiren-jian commented 10 months ago

You do not need it, please comment out that line (salesforce-lavis @ file:///home/ssd1/LipTome was generated when I installed official lavis on another working directory).

You can do:

  1. install lavis following the official guideline.
  2. comment out lines with local lavis version in pip_freeze.txt
  3. pip install pip_freeze.txt
NingMa-AI commented 10 months ago

Thanks. Could you provide the version of salesforce-lavis? I can install versions: 1.0.0, 1.0.1rc1, 1.0.1rc2, 1.0.2rc1, 1.0.2

yiren-jian commented 10 months ago

Here's what I suggest you to do, in the BLIText-video directory:

# install lavis based on official LAVIS guideline
conda create -n lavis python=3.8
conda activate lavis
pip install -e .

# fix package version issues
pip install -r pip_freeze.txt

The missing key may not be an issue. Here's my output by running bash run_scripts/blip2/train/train_caption_vatex_stage1.sh, just now.

(lavis-OpCounter) yiren@dartmouth-110B:~/V2T-Pformer$ bash run_scripts/blip2/train/train_caption_vatex_stage1.sh 
| distributed init (rank 0, world 1): env://
2024-01-03 03:49:45,941 [INFO] 
=====  Running Parameters    =====
2024-01-03 03:49:45,942 [INFO] {
    "accum_grad_iters": 1,
    "amp": true,
    "batch_size_eval": 64,
    "batch_size_train": 128,
    "device": "cuda",
    "dist_backend": "nccl",
    "dist_url": "env://",
    "distributed": true,
    "evaluate": false,
    "gpu": 0,
    "init_lr": 0.0001,
    "lr_sched": "linear_warmup_cosine_lr",
    "max_epoch": 10,
    "max_len": 30,
    "min_len": 8,
    "min_lr": 1e-05,
    "num_beams": 5,
    "num_workers": 4,
    "output_dir": "output/BLIP-T/Caption_vatex_stage1",
    "rank": 0,
    "report_metric": false,
    "resume_ckpt_path": null,
    "seed": 42,
    "task": "captioning",
    "train_splits": [
        "train"
    ],
    "valid_splits": [
        "val"
    ],
    "warmup_lr": 1e-06,
    "warmup_steps": 1000,
    "weight_decay": 0.05,
    "world_size": 1
}
2024-01-03 03:49:45,942 [INFO] 
======  Dataset Attributes  ======
2024-01-03 03:49:45,942 [INFO] 
======== my_vatex_caption =======
2024-01-03 03:49:45,942 [INFO] {
    "build_info": {
        "annotations": {
            "train": {
                "storage": "vatex/annotations/cap_train.json",
                "url": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/vatex/cap_train.json"
            },
            "val": {
                "storage": "vatex/annotations/cap_val.json",
                "url": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/vatex/cap_val.json"
            }
        },
        "videos": {
            "storage": "vatex/images"
        }
    },
    "data_type": "videos",
    "text_processor": {
        "eval": {
            "name": "blip_caption"
        },
        "train": {
            "name": "blip_caption",
            "prompt": "a photo of "
        }
    }
}
2024-01-03 03:49:45,942 [INFO] 
======  Model Attributes  ======
2024-01-03 03:49:45,943 [INFO] {
    "arch": "video_feature_opt_stage1",
    "drop_path_rate": 0,
    "finetuned": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_caption_opt2.7b.pth",
    "freeze_vit": true,
    "image_size": 364,
    "load_finetuned": false,
    "load_pretrained": true,
    "model_type": "caption_coco_opt2.7b",
    "num_query_token": 32,
    "opt_model": "facebook/opt-2.7b",
    "pretrained": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained_opt2.7b.pth",
    "pretrained_stage0": "/home/yiren/LAVIS/lavis/output/BLIP-T/Pretrain_stage0/vq/40m-noisy/checkpoint_60000.pth",
    "prompt": "a photo of",
    "use_grad_checkpoint": false,
    "vit_precision": "fp32"
}
Using downloaded and verified file: /home/yiren/lavis_datasets/vatex/annotations/cap_train.json
Using downloaded and verified file: /home/yiren/lavis_datasets/vatex/annotations/cap_val.json
2024-01-03 03:49:45,943 [INFO] Building datasets...
2024-01-03 03:50:36,989 [INFO] Missing keys ['VL_adaptor.embeddings.position_ids', 'VL_adaptor.embeddings.word_embeddings.weight', 'VL_adaptor.embeddings.position_embeddings.weight', 'VL_adaptor.embeddings.token_type_embeddings.weight', 'VL_adaptor.embeddings.LayerNorm.weight', 'VL_adaptor.embeddings.LayerNorm.bias', 'VL_adaptor.encoder.layer.0.attention.self.query.weight', 'VL_adaptor.encoder.layer.0.attention.self.query.bias', 'VL_adaptor.encoder.layer.0.attention.self.key.weight', 'VL_adaptor.encoder.layer.0.attention.self.key.bias', 'VL_adaptor.encoder.layer.0.attention.self.value.weight', 'VL_adaptor.encoder.layer.0.attention.self.value.bias', 'VL_adaptor.encoder.layer.0.attention.output.dense.weight', 'VL_adaptor.encoder.layer.0.attention.output.dense.bias', 'VL_adaptor.encoder.layer.0.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.0.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.0.intermediate.dense.weight', 'VL_adaptor.encoder.layer.0.intermediate.dense.bias', 'VL_adaptor.encoder.layer.0.output.dense.weight', 'VL_adaptor.encoder.layer.0.output.dense.bias', 'VL_adaptor.encoder.layer.0.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.0.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.1.attention.self.query.weight', 'VL_adaptor.encoder.layer.1.attention.self.query.bias', 'VL_adaptor.encoder.layer.1.attention.self.key.weight', 'VL_adaptor.encoder.layer.1.attention.self.key.bias', 'VL_adaptor.encoder.layer.1.attention.self.value.weight', 'VL_adaptor.encoder.layer.1.attention.self.value.bias', 'VL_adaptor.encoder.layer.1.attention.output.dense.weight', 'VL_adaptor.encoder.layer.1.attention.output.dense.bias', 'VL_adaptor.encoder.layer.1.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.1.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.1.intermediate.dense.weight', 'VL_adaptor.encoder.layer.1.intermediate.dense.bias', 'VL_adaptor.encoder.layer.1.output.dense.weight', 'VL_adaptor.encoder.layer.1.output.dense.bias', 'VL_adaptor.encoder.layer.1.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.1.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.2.attention.self.query.weight', 'VL_adaptor.encoder.layer.2.attention.self.query.bias', 'VL_adaptor.encoder.layer.2.attention.self.key.weight', 'VL_adaptor.encoder.layer.2.attention.self.key.bias', 'VL_adaptor.encoder.layer.2.attention.self.value.weight', 'VL_adaptor.encoder.layer.2.attention.self.value.bias', 'VL_adaptor.encoder.layer.2.attention.output.dense.weight', 'VL_adaptor.encoder.layer.2.attention.output.dense.bias', 'VL_adaptor.encoder.layer.2.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.2.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.2.intermediate.dense.weight', 'VL_adaptor.encoder.layer.2.intermediate.dense.bias', 'VL_adaptor.encoder.layer.2.output.dense.weight', 'VL_adaptor.encoder.layer.2.output.dense.bias', 'VL_adaptor.encoder.layer.2.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.2.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.3.attention.self.query.weight', 'VL_adaptor.encoder.layer.3.attention.self.query.bias', 'VL_adaptor.encoder.layer.3.attention.self.key.weight', 'VL_adaptor.encoder.layer.3.attention.self.key.bias', 'VL_adaptor.encoder.layer.3.attention.self.value.weight', 'VL_adaptor.encoder.layer.3.attention.self.value.bias', 'VL_adaptor.encoder.layer.3.attention.output.dense.weight', 'VL_adaptor.encoder.layer.3.attention.output.dense.bias', 'VL_adaptor.encoder.layer.3.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.3.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.3.intermediate.dense.weight', 'VL_adaptor.encoder.layer.3.intermediate.dense.bias', 'VL_adaptor.encoder.layer.3.output.dense.weight', 'VL_adaptor.encoder.layer.3.output.dense.bias', 'VL_adaptor.encoder.layer.3.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.3.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.4.attention.self.query.weight', 'VL_adaptor.encoder.layer.4.attention.self.query.bias', 'VL_adaptor.encoder.layer.4.attention.self.key.weight', 'VL_adaptor.encoder.layer.4.attention.self.key.bias', 'VL_adaptor.encoder.layer.4.attention.self.value.weight', 'VL_adaptor.encoder.layer.4.attention.self.value.bias', 'VL_adaptor.encoder.layer.4.attention.output.dense.weight', 'VL_adaptor.encoder.layer.4.attention.output.dense.bias', 'VL_adaptor.encoder.layer.4.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.4.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.4.intermediate.dense.weight', 'VL_adaptor.encoder.layer.4.intermediate.dense.bias', 'VL_adaptor.encoder.layer.4.output.dense.weight', 'VL_adaptor.encoder.layer.4.output.dense.bias', 'VL_adaptor.encoder.layer.4.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.4.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.5.attention.self.query.weight', 'VL_adaptor.encoder.layer.5.attention.self.query.bias', 'VL_adaptor.encoder.layer.5.attention.self.key.weight', 'VL_adaptor.encoder.layer.5.attention.self.key.bias', 'VL_adaptor.encoder.layer.5.attention.self.value.weight', 'VL_adaptor.encoder.layer.5.attention.self.value.bias', 'VL_adaptor.encoder.layer.5.attention.output.dense.weight', 'VL_adaptor.encoder.layer.5.attention.output.dense.bias', 'VL_adaptor.encoder.layer.5.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.5.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.5.intermediate.dense.weight', 'VL_adaptor.encoder.layer.5.intermediate.dense.bias', 'VL_adaptor.encoder.layer.5.output.dense.weight', 'VL_adaptor.encoder.layer.5.output.dense.bias', 'VL_adaptor.encoder.layer.5.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.5.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.6.attention.self.query.weight', 'VL_adaptor.encoder.layer.6.attention.self.query.bias', 'VL_adaptor.encoder.layer.6.attention.self.key.weight', 'VL_adaptor.encoder.layer.6.attention.self.key.bias', 'VL_adaptor.encoder.layer.6.attention.self.value.weight', 'VL_adaptor.encoder.layer.6.attention.self.value.bias', 'VL_adaptor.encoder.layer.6.attention.output.dense.weight', 'VL_adaptor.encoder.layer.6.attention.output.dense.bias', 'VL_adaptor.encoder.layer.6.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.6.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.6.intermediate.dense.weight', 'VL_adaptor.encoder.layer.6.intermediate.dense.bias', 'VL_adaptor.encoder.layer.6.output.dense.weight', 'VL_adaptor.encoder.layer.6.output.dense.bias', 'VL_adaptor.encoder.layer.6.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.6.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.7.attention.self.query.weight', 'VL_adaptor.encoder.layer.7.attention.self.query.bias', 'VL_adaptor.encoder.layer.7.attention.self.key.weight', 'VL_adaptor.encoder.layer.7.attention.self.key.bias', 'VL_adaptor.encoder.layer.7.attention.self.value.weight', 'VL_adaptor.encoder.layer.7.attention.self.value.bias', 'VL_adaptor.encoder.layer.7.attention.output.dense.weight', 'VL_adaptor.encoder.layer.7.attention.output.dense.bias', 'VL_adaptor.encoder.layer.7.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.7.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.7.intermediate.dense.weight', 'VL_adaptor.encoder.layer.7.intermediate.dense.bias', 'VL_adaptor.encoder.layer.7.output.dense.weight', 'VL_adaptor.encoder.layer.7.output.dense.bias', 'VL_adaptor.encoder.layer.7.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.7.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.8.attention.self.query.weight', 'VL_adaptor.encoder.layer.8.attention.self.query.bias', 'VL_adaptor.encoder.layer.8.attention.self.key.weight', 'VL_adaptor.encoder.layer.8.attention.self.key.bias', 'VL_adaptor.encoder.layer.8.attention.self.value.weight', 'VL_adaptor.encoder.layer.8.attention.self.value.bias', 'VL_adaptor.encoder.layer.8.attention.output.dense.weight', 'VL_adaptor.encoder.layer.8.attention.output.dense.bias', 'VL_adaptor.encoder.layer.8.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.8.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.8.intermediate.dense.weight', 'VL_adaptor.encoder.layer.8.intermediate.dense.bias', 'VL_adaptor.encoder.layer.8.output.dense.weight', 'VL_adaptor.encoder.layer.8.output.dense.bias', 'VL_adaptor.encoder.layer.8.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.8.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.9.attention.self.query.weight', 'VL_adaptor.encoder.layer.9.attention.self.query.bias', 'VL_adaptor.encoder.layer.9.attention.self.key.weight', 'VL_adaptor.encoder.layer.9.attention.self.key.bias', 'VL_adaptor.encoder.layer.9.attention.self.value.weight', 'VL_adaptor.encoder.layer.9.attention.self.value.bias', 'VL_adaptor.encoder.layer.9.attention.output.dense.weight', 'VL_adaptor.encoder.layer.9.attention.output.dense.bias', 'VL_adaptor.encoder.layer.9.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.9.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.9.intermediate.dense.weight', 'VL_adaptor.encoder.layer.9.intermediate.dense.bias', 'VL_adaptor.encoder.layer.9.output.dense.weight', 'VL_adaptor.encoder.layer.9.output.dense.bias', 'VL_adaptor.encoder.layer.9.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.9.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.10.attention.self.query.weight', 'VL_adaptor.encoder.layer.10.attention.self.query.bias', 'VL_adaptor.encoder.layer.10.attention.self.key.weight', 'VL_adaptor.encoder.layer.10.attention.self.key.bias', 'VL_adaptor.encoder.layer.10.attention.self.value.weight', 'VL_adaptor.encoder.layer.10.attention.self.value.bias', 'VL_adaptor.encoder.layer.10.attention.output.dense.weight', 'VL_adaptor.encoder.layer.10.attention.output.dense.bias', 'VL_adaptor.encoder.layer.10.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.10.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.10.intermediate.dense.weight', 'VL_adaptor.encoder.layer.10.intermediate.dense.bias', 'VL_adaptor.encoder.layer.10.output.dense.weight', 'VL_adaptor.encoder.layer.10.output.dense.bias', 'VL_adaptor.encoder.layer.10.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.10.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.11.attention.self.query.weight', 'VL_adaptor.encoder.layer.11.attention.self.query.bias', 'VL_adaptor.encoder.layer.11.attention.self.key.weight', 'VL_adaptor.encoder.layer.11.attention.self.key.bias', 'VL_adaptor.encoder.layer.11.attention.self.value.weight', 'VL_adaptor.encoder.layer.11.attention.self.value.bias', 'VL_adaptor.encoder.layer.11.attention.output.dense.weight', 'VL_adaptor.encoder.layer.11.attention.output.dense.bias', 'VL_adaptor.encoder.layer.11.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.11.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.11.intermediate.dense.weight', 'VL_adaptor.encoder.layer.11.intermediate.dense.bias', 'VL_adaptor.encoder.layer.11.output.dense.weight', 'VL_adaptor.encoder.layer.11.output.dense.bias', 'VL_adaptor.encoder.layer.11.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.11.output.LayerNorm.bias', 'VL_adaptor.feat_proj.weight', 'VL_adaptor.feat_proj.bias', 'opt_model.model.decoder.embed_tokens.weight', 'opt_model.model.decoder.embed_positions.weight', 'opt_model.model.decoder.final_layer_norm.weight', 'opt_model.model.decoder.final_layer_norm.bias', 'opt_model.model.decoder.layers.0.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.0.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.0.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.0.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.0.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.0.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.0.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.0.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.0.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.0.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.0.fc1.weight', 'opt_model.model.decoder.layers.0.fc1.bias', 'opt_model.model.decoder.layers.0.fc2.weight', 'opt_model.model.decoder.layers.0.fc2.bias', 'opt_model.model.decoder.layers.0.final_layer_norm.weight', 'opt_model.model.decoder.layers.0.final_layer_norm.bias', 'opt_model.model.decoder.layers.1.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.1.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.1.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.1.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.1.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.1.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.1.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.1.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.1.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.1.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.1.fc1.weight', 'opt_model.model.decoder.layers.1.fc1.bias', 'opt_model.model.decoder.layers.1.fc2.weight', 'opt_model.model.decoder.layers.1.fc2.bias', 'opt_model.model.decoder.layers.1.final_layer_norm.weight', 'opt_model.model.decoder.layers.1.final_layer_norm.bias', 'opt_model.model.decoder.layers.2.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.2.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.2.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.2.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.2.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.2.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.2.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.2.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.2.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.2.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.2.fc1.weight', 'opt_model.model.decoder.layers.2.fc1.bias', 'opt_model.model.decoder.layers.2.fc2.weight', 'opt_model.model.decoder.layers.2.fc2.bias', 'opt_model.model.decoder.layers.2.final_layer_norm.weight', 'opt_model.model.decoder.layers.2.final_layer_norm.bias', 'opt_model.model.decoder.layers.3.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.3.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.3.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.3.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.3.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.3.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.3.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.3.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.3.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.3.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.3.fc1.weight', 'opt_model.model.decoder.layers.3.fc1.bias', 'opt_model.model.decoder.layers.3.fc2.weight', 'opt_model.model.decoder.layers.3.fc2.bias', 'opt_model.model.decoder.layers.3.final_layer_norm.weight', 'opt_model.model.decoder.layers.3.final_layer_norm.bias', 'opt_model.model.decoder.layers.4.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.4.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.4.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.4.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.4.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.4.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.4.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.4.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.4.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.4.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.4.fc1.weight', 'opt_model.model.decoder.layers.4.fc1.bias', 'opt_model.model.decoder.layers.4.fc2.weight', 'opt_model.model.decoder.layers.4.fc2.bias', 'opt_model.model.decoder.layers.4.final_layer_norm.weight', 'opt_model.model.decoder.layers.4.final_layer_norm.bias', 'opt_model.model.decoder.layers.5.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.5.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.5.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.5.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.5.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.5.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.5.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.5.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.5.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.5.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.5.fc1.weight', 'opt_model.model.decoder.layers.5.fc1.bias', 'opt_model.model.decoder.layers.5.fc2.weight', 'opt_model.model.decoder.layers.5.fc2.bias', 'opt_model.model.decoder.layers.5.final_layer_norm.weight', 'opt_model.model.decoder.layers.5.final_layer_norm.bias', 'opt_model.model.decoder.layers.6.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.6.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.6.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.6.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.6.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.6.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.6.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.6.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.6.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.6.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.6.fc1.weight', 'opt_model.model.decoder.layers.6.fc1.bias', 'opt_model.model.decoder.layers.6.fc2.weight', 'opt_model.model.decoder.layers.6.fc2.bias', 'opt_model.model.decoder.layers.6.final_layer_norm.weight', 'opt_model.model.decoder.layers.6.final_layer_norm.bias', 'opt_model.model.decoder.layers.7.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.7.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.7.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.7.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.7.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.7.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.7.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.7.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.7.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.7.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.7.fc1.weight', 'opt_model.model.decoder.layers.7.fc1.bias', 'opt_model.model.decoder.layers.7.fc2.weight', 'opt_model.model.decoder.layers.7.fc2.bias', 'opt_model.model.decoder.layers.7.final_layer_norm.weight', 'opt_model.model.decoder.layers.7.final_layer_norm.bias', 'opt_model.model.decoder.layers.8.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.8.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.8.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.8.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.8.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.8.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.8.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.8.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.8.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.8.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.8.fc1.weight', 'opt_model.model.decoder.layers.8.fc1.bias', 'opt_model.model.decoder.layers.8.fc2.weight', 'opt_model.model.decoder.layers.8.fc2.bias', 'opt_model.model.decoder.layers.8.final_layer_norm.weight', 'opt_model.model.decoder.layers.8.final_layer_norm.bias', 'opt_model.model.decoder.layers.9.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.9.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.9.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.9.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.9.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.9.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.9.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.9.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.9.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.9.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.9.fc1.weight', 'opt_model.model.decoder.layers.9.fc1.bias', 'opt_model.model.decoder.layers.9.fc2.weight', 'opt_model.model.decoder.layers.9.fc2.bias', 'opt_model.model.decoder.layers.9.final_layer_norm.weight', 'opt_model.model.decoder.layers.9.final_layer_norm.bias', 'opt_model.model.decoder.layers.10.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.10.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.10.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.10.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.10.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.10.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.10.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.10.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.10.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.10.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.10.fc1.weight', 'opt_model.model.decoder.layers.10.fc1.bias', 'opt_model.model.decoder.layers.10.fc2.weight', 'opt_model.model.decoder.layers.10.fc2.bias', 'opt_model.model.decoder.layers.10.final_layer_norm.weight', 'opt_model.model.decoder.layers.10.final_layer_norm.bias', 'opt_model.model.decoder.layers.11.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.11.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.11.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.11.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.11.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.11.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.11.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.11.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.11.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.11.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.11.fc1.weight', 'opt_model.model.decoder.layers.11.fc1.bias', 'opt_model.model.decoder.layers.11.fc2.weight', 'opt_model.model.decoder.layers.11.fc2.bias', 'opt_model.model.decoder.layers.11.final_layer_norm.weight', 'opt_model.model.decoder.layers.11.final_layer_norm.bias', 'opt_model.model.decoder.layers.12.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.12.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.12.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.12.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.12.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.12.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.12.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.12.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.12.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.12.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.12.fc1.weight', 'opt_model.model.decoder.layers.12.fc1.bias', 'opt_model.model.decoder.layers.12.fc2.weight', 'opt_model.model.decoder.layers.12.fc2.bias', 'opt_model.model.decoder.layers.12.final_layer_norm.weight', 'opt_model.model.decoder.layers.12.final_layer_norm.bias', 'opt_model.model.decoder.layers.13.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.13.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.13.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.13.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.13.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.13.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.13.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.13.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.13.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.13.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.13.fc1.weight', 'opt_model.model.decoder.layers.13.fc1.bias', 'opt_model.model.decoder.layers.13.fc2.weight', 'opt_model.model.decoder.layers.13.fc2.bias', 'opt_model.model.decoder.layers.13.final_layer_norm.weight', 'opt_model.model.decoder.layers.13.final_layer_norm.bias', 'opt_model.model.decoder.layers.14.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.14.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.14.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.14.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.14.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.14.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.14.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.14.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.14.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.14.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.14.fc1.weight', 'opt_model.model.decoder.layers.14.fc1.bias', 'opt_model.model.decoder.layers.14.fc2.weight', 'opt_model.model.decoder.layers.14.fc2.bias', 'opt_model.model.decoder.layers.14.final_layer_norm.weight', 'opt_model.model.decoder.layers.14.final_layer_norm.bias', 'opt_model.model.decoder.layers.15.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.15.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.15.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.15.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.15.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.15.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.15.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.15.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.15.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.15.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.15.fc1.weight', 'opt_model.model.decoder.layers.15.fc1.bias', 'opt_model.model.decoder.layers.15.fc2.weight', 'opt_model.model.decoder.layers.15.fc2.bias', 'opt_model.model.decoder.layers.15.final_layer_norm.weight', 'opt_model.model.decoder.layers.15.final_layer_norm.bias', 'opt_model.model.decoder.layers.16.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.16.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.16.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.16.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.16.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.16.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.16.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.16.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.16.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.16.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.16.fc1.weight', 'opt_model.model.decoder.layers.16.fc1.bias', 'opt_model.model.decoder.layers.16.fc2.weight', 'opt_model.model.decoder.layers.16.fc2.bias', 'opt_model.model.decoder.layers.16.final_layer_norm.weight', 'opt_model.model.decoder.layers.16.final_layer_norm.bias', 'opt_model.model.decoder.layers.17.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.17.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.17.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.17.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.17.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.17.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.17.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.17.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.17.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.17.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.17.fc1.weight', 'opt_model.model.decoder.layers.17.fc1.bias', 'opt_model.model.decoder.layers.17.fc2.weight', 'opt_model.model.decoder.layers.17.fc2.bias', 'opt_model.model.decoder.layers.17.final_layer_norm.weight', 'opt_model.model.decoder.layers.17.final_layer_norm.bias', 'opt_model.model.decoder.layers.18.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.18.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.18.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.18.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.18.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.18.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.18.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.18.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.18.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.18.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.18.fc1.weight', 'opt_model.model.decoder.layers.18.fc1.bias', 'opt_model.model.decoder.layers.18.fc2.weight', 'opt_model.model.decoder.layers.18.fc2.bias', 'opt_model.model.decoder.layers.18.final_layer_norm.weight', 'opt_model.model.decoder.layers.18.final_layer_norm.bias', 'opt_model.model.decoder.layers.19.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.19.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.19.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.19.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.19.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.19.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.19.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.19.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.19.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.19.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.19.fc1.weight', 'opt_model.model.decoder.layers.19.fc1.bias', 'opt_model.model.decoder.layers.19.fc2.weight', 'opt_model.model.decoder.layers.19.fc2.bias', 'opt_model.model.decoder.layers.19.final_layer_norm.weight', 'opt_model.model.decoder.layers.19.final_layer_norm.bias', 'opt_model.model.decoder.layers.20.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.20.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.20.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.20.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.20.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.20.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.20.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.20.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.20.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.20.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.20.fc1.weight', 'opt_model.model.decoder.layers.20.fc1.bias', 'opt_model.model.decoder.layers.20.fc2.weight', 'opt_model.model.decoder.layers.20.fc2.bias', 'opt_model.model.decoder.layers.20.final_layer_norm.weight', 'opt_model.model.decoder.layers.20.final_layer_norm.bias', 'opt_model.model.decoder.layers.21.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.21.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.21.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.21.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.21.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.21.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.21.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.21.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.21.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.21.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.21.fc1.weight', 'opt_model.model.decoder.layers.21.fc1.bias', 'opt_model.model.decoder.layers.21.fc2.weight', 'opt_model.model.decoder.layers.21.fc2.bias', 'opt_model.model.decoder.layers.21.final_layer_norm.weight', 'opt_model.model.decoder.layers.21.final_layer_norm.bias', 'opt_model.model.decoder.layers.22.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.22.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.22.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.22.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.22.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.22.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.22.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.22.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.22.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.22.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.22.fc1.weight', 'opt_model.model.decoder.layers.22.fc1.bias', 'opt_model.model.decoder.layers.22.fc2.weight', 'opt_model.model.decoder.layers.22.fc2.bias', 'opt_model.model.decoder.layers.22.final_layer_norm.weight', 'opt_model.model.decoder.layers.22.final_layer_norm.bias', 'opt_model.model.decoder.layers.23.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.23.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.23.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.23.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.23.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.23.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.23.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.23.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.23.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.23.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.23.fc1.weight', 'opt_model.model.decoder.layers.23.fc1.bias', 'opt_model.model.decoder.layers.23.fc2.weight', 'opt_model.model.decoder.layers.23.fc2.bias', 'opt_model.model.decoder.layers.23.final_layer_norm.weight', 'opt_model.model.decoder.layers.23.final_layer_norm.bias', 'opt_model.model.decoder.layers.24.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.24.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.24.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.24.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.24.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.24.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.24.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.24.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.24.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.24.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.24.fc1.weight', 'opt_model.model.decoder.layers.24.fc1.bias', 'opt_model.model.decoder.layers.24.fc2.weight', 'opt_model.model.decoder.layers.24.fc2.bias', 'opt_model.model.decoder.layers.24.final_layer_norm.weight', 'opt_model.model.decoder.layers.24.final_layer_norm.bias', 'opt_model.model.decoder.layers.25.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.25.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.25.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.25.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.25.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.25.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.25.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.25.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.25.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.25.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.25.fc1.weight', 'opt_model.model.decoder.layers.25.fc1.bias', 'opt_model.model.decoder.layers.25.fc2.weight', 'opt_model.model.decoder.layers.25.fc2.bias', 'opt_model.model.decoder.layers.25.final_layer_norm.weight', 'opt_model.model.decoder.layers.25.final_layer_norm.bias', 'opt_model.model.decoder.layers.26.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.26.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.26.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.26.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.26.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.26.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.26.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.26.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.26.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.26.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.26.fc1.weight', 'opt_model.model.decoder.layers.26.fc1.bias', 'opt_model.model.decoder.layers.26.fc2.weight', 'opt_model.model.decoder.layers.26.fc2.bias', 'opt_model.model.decoder.layers.26.final_layer_norm.weight', 'opt_model.model.decoder.layers.26.final_layer_norm.bias', 'opt_model.model.decoder.layers.27.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.27.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.27.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.27.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.27.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.27.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.27.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.27.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.27.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.27.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.27.fc1.weight', 'opt_model.model.decoder.layers.27.fc1.bias', 'opt_model.model.decoder.layers.27.fc2.weight', 'opt_model.model.decoder.layers.27.fc2.bias', 'opt_model.model.decoder.layers.27.final_layer_norm.weight', 'opt_model.model.decoder.layers.27.final_layer_norm.bias', 'opt_model.model.decoder.layers.28.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.28.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.28.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.28.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.28.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.28.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.28.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.28.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.28.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.28.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.28.fc1.weight', 'opt_model.model.decoder.layers.28.fc1.bias', 'opt_model.model.decoder.layers.28.fc2.weight', 'opt_model.model.decoder.layers.28.fc2.bias', 'opt_model.model.decoder.layers.28.final_layer_norm.weight', 'opt_model.model.decoder.layers.28.final_layer_norm.bias', 'opt_model.model.decoder.layers.29.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.29.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.29.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.29.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.29.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.29.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.29.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.29.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.29.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.29.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.29.fc1.weight', 'opt_model.model.decoder.layers.29.fc1.bias', 'opt_model.model.decoder.layers.29.fc2.weight', 'opt_model.model.decoder.layers.29.fc2.bias', 'opt_model.model.decoder.layers.29.final_layer_norm.weight', 'opt_model.model.decoder.layers.29.final_layer_norm.bias', 'opt_model.model.decoder.layers.30.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.30.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.30.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.30.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.30.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.30.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.30.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.30.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.30.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.30.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.30.fc1.weight', 'opt_model.model.decoder.layers.30.fc1.bias', 'opt_model.model.decoder.layers.30.fc2.weight', 'opt_model.model.decoder.layers.30.fc2.bias', 'opt_model.model.decoder.layers.30.final_layer_norm.weight', 'opt_model.model.decoder.layers.30.final_layer_norm.bias', 'opt_model.model.decoder.layers.31.self_attn.k_proj.weight', 'opt_model.model.decoder.layers.31.self_attn.k_proj.bias', 'opt_model.model.decoder.layers.31.self_attn.v_proj.weight', 'opt_model.model.decoder.layers.31.self_attn.v_proj.bias', 'opt_model.model.decoder.layers.31.self_attn.q_proj.weight', 'opt_model.model.decoder.layers.31.self_attn.q_proj.bias', 'opt_model.model.decoder.layers.31.self_attn.out_proj.weight', 'opt_model.model.decoder.layers.31.self_attn.out_proj.bias', 'opt_model.model.decoder.layers.31.self_attn_layer_norm.weight', 'opt_model.model.decoder.layers.31.self_attn_layer_norm.bias', 'opt_model.model.decoder.layers.31.fc1.weight', 'opt_model.model.decoder.layers.31.fc1.bias', 'opt_model.model.decoder.layers.31.fc2.weight', 'opt_model.model.decoder.layers.31.fc2.bias', 'opt_model.model.decoder.layers.31.final_layer_norm.weight', 'opt_model.model.decoder.layers.31.final_layer_norm.bias', 'opt_model.lm_head.weight', 'Darkformer.embeddings.position_ids', 'Darkformer.embeddings.word_embeddings.weight', 'Darkformer.embeddings.position_embeddings.weight', 'Darkformer.embeddings.token_type_embeddings.weight', 'Darkformer.embeddings.LayerNorm.weight', 'Darkformer.embeddings.LayerNorm.bias', 'Darkformer.encoder.layer.0.attention.self.query.weight', 'Darkformer.encoder.layer.0.attention.self.query.bias', 'Darkformer.encoder.layer.0.attention.self.key.weight', 'Darkformer.encoder.layer.0.attention.self.key.bias', 'Darkformer.encoder.layer.0.attention.self.value.weight', 'Darkformer.encoder.layer.0.attention.self.value.bias', 'Darkformer.encoder.layer.0.attention.output.dense.weight', 'Darkformer.encoder.layer.0.attention.output.dense.bias', 'Darkformer.encoder.layer.0.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.0.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.0.intermediate.dense.weight', 'Darkformer.encoder.layer.0.intermediate.dense.bias', 'Darkformer.encoder.layer.0.output.dense.weight', 'Darkformer.encoder.layer.0.output.dense.bias', 'Darkformer.encoder.layer.0.output.LayerNorm.weight', 'Darkformer.encoder.layer.0.output.LayerNorm.bias', 'Darkformer.encoder.layer.1.attention.self.query.weight', 'Darkformer.encoder.layer.1.attention.self.query.bias', 'Darkformer.encoder.layer.1.attention.self.key.weight', 'Darkformer.encoder.layer.1.attention.self.key.bias', 'Darkformer.encoder.layer.1.attention.self.value.weight', 'Darkformer.encoder.layer.1.attention.self.value.bias', 'Darkformer.encoder.layer.1.attention.output.dense.weight', 'Darkformer.encoder.layer.1.attention.output.dense.bias', 'Darkformer.encoder.layer.1.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.1.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.1.intermediate.dense.weight', 'Darkformer.encoder.layer.1.intermediate.dense.bias', 'Darkformer.encoder.layer.1.output.dense.weight', 'Darkformer.encoder.layer.1.output.dense.bias', 'Darkformer.encoder.layer.1.output.LayerNorm.weight', 'Darkformer.encoder.layer.1.output.LayerNorm.bias', 'Darkformer.encoder.layer.2.attention.self.query.weight', 'Darkformer.encoder.layer.2.attention.self.query.bias', 'Darkformer.encoder.layer.2.attention.self.key.weight', 'Darkformer.encoder.layer.2.attention.self.key.bias', 'Darkformer.encoder.layer.2.attention.self.value.weight', 'Darkformer.encoder.layer.2.attention.self.value.bias', 'Darkformer.encoder.layer.2.attention.output.dense.weight', 'Darkformer.encoder.layer.2.attention.output.dense.bias', 'Darkformer.encoder.layer.2.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.2.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.2.intermediate.dense.weight', 'Darkformer.encoder.layer.2.intermediate.dense.bias', 'Darkformer.encoder.layer.2.output.dense.weight', 'Darkformer.encoder.layer.2.output.dense.bias', 'Darkformer.encoder.layer.2.output.LayerNorm.weight', 'Darkformer.encoder.layer.2.output.LayerNorm.bias', 'Darkformer.encoder.layer.3.attention.self.query.weight', 'Darkformer.encoder.layer.3.attention.self.query.bias', 'Darkformer.encoder.layer.3.attention.self.key.weight', 'Darkformer.encoder.layer.3.attention.self.key.bias', 'Darkformer.encoder.layer.3.attention.self.value.weight', 'Darkformer.encoder.layer.3.attention.self.value.bias', 'Darkformer.encoder.layer.3.attention.output.dense.weight', 'Darkformer.encoder.layer.3.attention.output.dense.bias', 'Darkformer.encoder.layer.3.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.3.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.3.intermediate.dense.weight', 'Darkformer.encoder.layer.3.intermediate.dense.bias', 'Darkformer.encoder.layer.3.output.dense.weight', 'Darkformer.encoder.layer.3.output.dense.bias', 'Darkformer.encoder.layer.3.output.LayerNorm.weight', 'Darkformer.encoder.layer.3.output.LayerNorm.bias', 'Darkformer.encoder.layer.4.attention.self.query.weight', 'Darkformer.encoder.layer.4.attention.self.query.bias', 'Darkformer.encoder.layer.4.attention.self.key.weight', 'Darkformer.encoder.layer.4.attention.self.key.bias', 'Darkformer.encoder.layer.4.attention.self.value.weight', 'Darkformer.encoder.layer.4.attention.self.value.bias', 'Darkformer.encoder.layer.4.attention.output.dense.weight', 'Darkformer.encoder.layer.4.attention.output.dense.bias', 'Darkformer.encoder.layer.4.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.4.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.4.intermediate.dense.weight', 'Darkformer.encoder.layer.4.intermediate.dense.bias', 'Darkformer.encoder.layer.4.output.dense.weight', 'Darkformer.encoder.layer.4.output.dense.bias', 'Darkformer.encoder.layer.4.output.LayerNorm.weight', 'Darkformer.encoder.layer.4.output.LayerNorm.bias', 'Darkformer.encoder.layer.5.attention.self.query.weight', 'Darkformer.encoder.layer.5.attention.self.query.bias', 'Darkformer.encoder.layer.5.attention.self.key.weight', 'Darkformer.encoder.layer.5.attention.self.key.bias', 'Darkformer.encoder.layer.5.attention.self.value.weight', 'Darkformer.encoder.layer.5.attention.self.value.bias', 'Darkformer.encoder.layer.5.attention.output.dense.weight', 'Darkformer.encoder.layer.5.attention.output.dense.bias', 'Darkformer.encoder.layer.5.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.5.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.5.intermediate.dense.weight', 'Darkformer.encoder.layer.5.intermediate.dense.bias', 'Darkformer.encoder.layer.5.output.dense.weight', 'Darkformer.encoder.layer.5.output.dense.bias', 'Darkformer.encoder.layer.5.output.LayerNorm.weight', 'Darkformer.encoder.layer.5.output.LayerNorm.bias', 'Darkformer.encoder.layer.6.attention.self.query.weight', 'Darkformer.encoder.layer.6.attention.self.query.bias', 'Darkformer.encoder.layer.6.attention.self.key.weight', 'Darkformer.encoder.layer.6.attention.self.key.bias', 'Darkformer.encoder.layer.6.attention.self.value.weight', 'Darkformer.encoder.layer.6.attention.self.value.bias', 'Darkformer.encoder.layer.6.attention.output.dense.weight', 'Darkformer.encoder.layer.6.attention.output.dense.bias', 'Darkformer.encoder.layer.6.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.6.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.6.intermediate.dense.weight', 'Darkformer.encoder.layer.6.intermediate.dense.bias', 'Darkformer.encoder.layer.6.output.dense.weight', 'Darkformer.encoder.layer.6.output.dense.bias', 'Darkformer.encoder.layer.6.output.LayerNorm.weight', 'Darkformer.encoder.layer.6.output.LayerNorm.bias', 'Darkformer.encoder.layer.7.attention.self.query.weight', 'Darkformer.encoder.layer.7.attention.self.query.bias', 'Darkformer.encoder.layer.7.attention.self.key.weight', 'Darkformer.encoder.layer.7.attention.self.key.bias', 'Darkformer.encoder.layer.7.attention.self.value.weight', 'Darkformer.encoder.layer.7.attention.self.value.bias', 'Darkformer.encoder.layer.7.attention.output.dense.weight', 'Darkformer.encoder.layer.7.attention.output.dense.bias', 'Darkformer.encoder.layer.7.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.7.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.7.intermediate.dense.weight', 'Darkformer.encoder.layer.7.intermediate.dense.bias', 'Darkformer.encoder.layer.7.output.dense.weight', 'Darkformer.encoder.layer.7.output.dense.bias', 'Darkformer.encoder.layer.7.output.LayerNorm.weight', 'Darkformer.encoder.layer.7.output.LayerNorm.bias', 'Darkformer.encoder.layer.8.attention.self.query.weight', 'Darkformer.encoder.layer.8.attention.self.query.bias', 'Darkformer.encoder.layer.8.attention.self.key.weight', 'Darkformer.encoder.layer.8.attention.self.key.bias', 'Darkformer.encoder.layer.8.attention.self.value.weight', 'Darkformer.encoder.layer.8.attention.self.value.bias', 'Darkformer.encoder.layer.8.attention.output.dense.weight', 'Darkformer.encoder.layer.8.attention.output.dense.bias', 'Darkformer.encoder.layer.8.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.8.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.8.intermediate.dense.weight', 'Darkformer.encoder.layer.8.intermediate.dense.bias', 'Darkformer.encoder.layer.8.output.dense.weight', 'Darkformer.encoder.layer.8.output.dense.bias', 'Darkformer.encoder.layer.8.output.LayerNorm.weight', 'Darkformer.encoder.layer.8.output.LayerNorm.bias', 'Darkformer.encoder.layer.9.attention.self.query.weight', 'Darkformer.encoder.layer.9.attention.self.query.bias', 'Darkformer.encoder.layer.9.attention.self.key.weight', 'Darkformer.encoder.layer.9.attention.self.key.bias', 'Darkformer.encoder.layer.9.attention.self.value.weight', 'Darkformer.encoder.layer.9.attention.self.value.bias', 'Darkformer.encoder.layer.9.attention.output.dense.weight', 'Darkformer.encoder.layer.9.attention.output.dense.bias', 'Darkformer.encoder.layer.9.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.9.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.9.intermediate.dense.weight', 'Darkformer.encoder.layer.9.intermediate.dense.bias', 'Darkformer.encoder.layer.9.output.dense.weight', 'Darkformer.encoder.layer.9.output.dense.bias', 'Darkformer.encoder.layer.9.output.LayerNorm.weight', 'Darkformer.encoder.layer.9.output.LayerNorm.bias', 'Darkformer.encoder.layer.10.attention.self.query.weight', 'Darkformer.encoder.layer.10.attention.self.query.bias', 'Darkformer.encoder.layer.10.attention.self.key.weight', 'Darkformer.encoder.layer.10.attention.self.key.bias', 'Darkformer.encoder.layer.10.attention.self.value.weight', 'Darkformer.encoder.layer.10.attention.self.value.bias', 'Darkformer.encoder.layer.10.attention.output.dense.weight', 'Darkformer.encoder.layer.10.attention.output.dense.bias', 'Darkformer.encoder.layer.10.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.10.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.10.intermediate.dense.weight', 'Darkformer.encoder.layer.10.intermediate.dense.bias', 'Darkformer.encoder.layer.10.output.dense.weight', 'Darkformer.encoder.layer.10.output.dense.bias', 'Darkformer.encoder.layer.10.output.LayerNorm.weight', 'Darkformer.encoder.layer.10.output.LayerNorm.bias', 'Darkformer.encoder.layer.11.attention.self.query.weight', 'Darkformer.encoder.layer.11.attention.self.query.bias', 'Darkformer.encoder.layer.11.attention.self.key.weight', 'Darkformer.encoder.layer.11.attention.self.key.bias', 'Darkformer.encoder.layer.11.attention.self.value.weight', 'Darkformer.encoder.layer.11.attention.self.value.bias', 'Darkformer.encoder.layer.11.attention.output.dense.weight', 'Darkformer.encoder.layer.11.attention.output.dense.bias', 'Darkformer.encoder.layer.11.attention.output.LayerNorm.weight', 'Darkformer.encoder.layer.11.attention.output.LayerNorm.bias', 'Darkformer.encoder.layer.11.intermediate.dense.weight', 'Darkformer.encoder.layer.11.intermediate.dense.bias', 'Darkformer.encoder.layer.11.output.dense.weight', 'Darkformer.encoder.layer.11.output.dense.bias', 'Darkformer.encoder.layer.11.output.LayerNorm.weight', 'Darkformer.encoder.layer.11.output.LayerNorm.bias', 'Darkformer.cls_proj.weight', 'Darkformer.cls_proj.bias', 'Darkformer.pooler.0.weight', 'Darkformer.pooler.0.bias', 'Darkformer.opt_proj.weight', 'Darkformer.opt_proj.bias']
2024-01-03 03:50:36,989 [INFO] load checkpoint from https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained_opt2.7b.pth
2024-01-03 03:50:39,875 [INFO] Missing keys ['VL_adaptor.embeddings.position_ids', 'VL_adaptor.embeddings.word_embeddings.weight', 'VL_adaptor.embeddings.position_embeddings.weight', 'VL_adaptor.embeddings.token_type_embeddings.weight', 'VL_adaptor.embeddings.LayerNorm.weight', 'VL_adaptor.embeddings.LayerNorm.bias', 'VL_adaptor.encoder.layer.0.attention.self.query.weight', 'VL_adaptor.encoder.layer.0.attention.self.query.bias', 'VL_adaptor.encoder.layer.0.attention.self.key.weight', 'VL_adaptor.encoder.layer.0.attention.self.key.bias', 'VL_adaptor.encoder.layer.0.attention.self.value.weight', 'VL_adaptor.encoder.layer.0.attention.self.value.bias', 'VL_adaptor.encoder.layer.0.attention.output.dense.weight', 'VL_adaptor.encoder.layer.0.attention.output.dense.bias', 'VL_adaptor.encoder.layer.0.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.0.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.0.intermediate.dense.weight', 'VL_adaptor.encoder.layer.0.intermediate.dense.bias', 'VL_adaptor.encoder.layer.0.output.dense.weight', 'VL_adaptor.encoder.layer.0.output.dense.bias', 'VL_adaptor.encoder.layer.0.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.0.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.1.attention.self.query.weight', 'VL_adaptor.encoder.layer.1.attention.self.query.bias', 'VL_adaptor.encoder.layer.1.attention.self.key.weight', 'VL_adaptor.encoder.layer.1.attention.self.key.bias', 'VL_adaptor.encoder.layer.1.attention.self.value.weight', 'VL_adaptor.encoder.layer.1.attention.self.value.bias', 'VL_adaptor.encoder.layer.1.attention.output.dense.weight', 'VL_adaptor.encoder.layer.1.attention.output.dense.bias', 'VL_adaptor.encoder.layer.1.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.1.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.1.intermediate.dense.weight', 'VL_adaptor.encoder.layer.1.intermediate.dense.bias', 'VL_adaptor.encoder.layer.1.output.dense.weight', 'VL_adaptor.encoder.layer.1.output.dense.bias', 'VL_adaptor.encoder.layer.1.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.1.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.2.attention.self.query.weight', 'VL_adaptor.encoder.layer.2.attention.self.query.bias', 'VL_adaptor.encoder.layer.2.attention.self.key.weight', 'VL_adaptor.encoder.layer.2.attention.self.key.bias', 'VL_adaptor.encoder.layer.2.attention.self.value.weight', 'VL_adaptor.encoder.layer.2.attention.self.value.bias', 'VL_adaptor.encoder.layer.2.attention.output.dense.weight', 'VL_adaptor.encoder.layer.2.attention.output.dense.bias', 'VL_adaptor.encoder.layer.2.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.2.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.2.intermediate.dense.weight', 'VL_adaptor.encoder.layer.2.intermediate.dense.bias', 'VL_adaptor.encoder.layer.2.output.dense.weight', 'VL_adaptor.encoder.layer.2.output.dense.bias', 'VL_adaptor.encoder.layer.2.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.2.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.3.attention.self.query.weight', 'VL_adaptor.encoder.layer.3.attention.self.query.bias', 'VL_adaptor.encoder.layer.3.attention.self.key.weight', 'VL_adaptor.encoder.layer.3.attention.self.key.bias', 'VL_adaptor.encoder.layer.3.attention.self.value.weight', 'VL_adaptor.encoder.layer.3.attention.self.value.bias', 'VL_adaptor.encoder.layer.3.attention.output.dense.weight', 'VL_adaptor.encoder.layer.3.attention.output.dense.bias', 'VL_adaptor.encoder.layer.3.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.3.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.3.intermediate.dense.weight', 'VL_adaptor.encoder.layer.3.intermediate.dense.bias', 'VL_adaptor.encoder.layer.3.output.dense.weight', 'VL_adaptor.encoder.layer.3.output.dense.bias', 'VL_adaptor.encoder.layer.3.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.3.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.4.attention.self.query.weight', 'VL_adaptor.encoder.layer.4.attention.self.query.bias', 'VL_adaptor.encoder.layer.4.attention.self.key.weight', 'VL_adaptor.encoder.layer.4.attention.self.key.bias', 'VL_adaptor.encoder.layer.4.attention.self.value.weight', 'VL_adaptor.encoder.layer.4.attention.self.value.bias', 'VL_adaptor.encoder.layer.4.attention.output.dense.weight', 'VL_adaptor.encoder.layer.4.attention.output.dense.bias', 'VL_adaptor.encoder.layer.4.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.4.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.4.intermediate.dense.weight', 'VL_adaptor.encoder.layer.4.intermediate.dense.bias', 'VL_adaptor.encoder.layer.4.output.dense.weight', 'VL_adaptor.encoder.layer.4.output.dense.bias', 'VL_adaptor.encoder.layer.4.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.4.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.5.attention.self.query.weight', 'VL_adaptor.encoder.layer.5.attention.self.query.bias', 'VL_adaptor.encoder.layer.5.attention.self.key.weight', 'VL_adaptor.encoder.layer.5.attention.self.key.bias', 'VL_adaptor.encoder.layer.5.attention.self.value.weight', 'VL_adaptor.encoder.layer.5.attention.self.value.bias', 'VL_adaptor.encoder.layer.5.attention.output.dense.weight', 'VL_adaptor.encoder.layer.5.attention.output.dense.bias', 'VL_adaptor.encoder.layer.5.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.5.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.5.intermediate.dense.weight', 'VL_adaptor.encoder.layer.5.intermediate.dense.bias', 'VL_adaptor.encoder.layer.5.output.dense.weight', 'VL_adaptor.encoder.layer.5.output.dense.bias', 'VL_adaptor.encoder.layer.5.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.5.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.6.attention.self.query.weight', 'VL_adaptor.encoder.layer.6.attention.self.query.bias', 'VL_adaptor.encoder.layer.6.attention.self.key.weight', 'VL_adaptor.encoder.layer.6.attention.self.key.bias', 'VL_adaptor.encoder.layer.6.attention.self.value.weight', 'VL_adaptor.encoder.layer.6.attention.self.value.bias', 'VL_adaptor.encoder.layer.6.attention.output.dense.weight', 'VL_adaptor.encoder.layer.6.attention.output.dense.bias', 'VL_adaptor.encoder.layer.6.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.6.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.6.intermediate.dense.weight', 'VL_adaptor.encoder.layer.6.intermediate.dense.bias', 'VL_adaptor.encoder.layer.6.output.dense.weight', 'VL_adaptor.encoder.layer.6.output.dense.bias', 'VL_adaptor.encoder.layer.6.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.6.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.7.attention.self.query.weight', 'VL_adaptor.encoder.layer.7.attention.self.query.bias', 'VL_adaptor.encoder.layer.7.attention.self.key.weight', 'VL_adaptor.encoder.layer.7.attention.self.key.bias', 'VL_adaptor.encoder.layer.7.attention.self.value.weight', 'VL_adaptor.encoder.layer.7.attention.self.value.bias', 'VL_adaptor.encoder.layer.7.attention.output.dense.weight', 'VL_adaptor.encoder.layer.7.attention.output.dense.bias', 'VL_adaptor.encoder.layer.7.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.7.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.7.intermediate.dense.weight', 'VL_adaptor.encoder.layer.7.intermediate.dense.bias', 'VL_adaptor.encoder.layer.7.output.dense.weight', 'VL_adaptor.encoder.layer.7.output.dense.bias', 'VL_adaptor.encoder.layer.7.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.7.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.8.attention.self.query.weight', 'VL_adaptor.encoder.layer.8.attention.self.query.bias', 'VL_adaptor.encoder.layer.8.attention.self.key.weight', 'VL_adaptor.encoder.layer.8.attention.self.key.bias', 'VL_adaptor.encoder.layer.8.attention.self.value.weight', 'VL_adaptor.encoder.layer.8.attention.self.value.bias', 'VL_adaptor.encoder.layer.8.attention.output.dense.weight', 'VL_adaptor.encoder.layer.8.attention.output.dense.bias', 'VL_adaptor.encoder.layer.8.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.8.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.8.intermediate.dense.weight', 'VL_adaptor.encoder.layer.8.intermediate.dense.bias', 'VL_adaptor.encoder.layer.8.output.dense.weight', 'VL_adaptor.encoder.layer.8.output.dense.bias', 'VL_adaptor.encoder.layer.8.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.8.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.9.attention.self.query.weight', 'VL_adaptor.encoder.layer.9.attention.self.query.bias', 'VL_adaptor.encoder.layer.9.attention.self.key.weight', 'VL_adaptor.encoder.layer.9.attention.self.key.bias', 'VL_adaptor.encoder.layer.9.attention.self.value.weight', 'VL_adaptor.encoder.layer.9.attention.self.value.bias', 'VL_adaptor.encoder.layer.9.attention.output.dense.weight', 'VL_adaptor.encoder.layer.9.attention.output.dense.bias', 'VL_adaptor.encoder.layer.9.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.9.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.9.intermediate.dense.weight', 'VL_adaptor.encoder.layer.9.intermediate.dense.bias', 'VL_adaptor.encoder.layer.9.output.dense.weight', 'VL_adaptor.encoder.layer.9.output.dense.bias', 'VL_adaptor.encoder.layer.9.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.9.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.10.attention.self.query.weight', 'VL_adaptor.encoder.layer.10.attention.self.query.bias', 'VL_adaptor.encoder.layer.10.attention.self.key.weight', 'VL_adaptor.encoder.layer.10.attention.self.key.bias', 'VL_adaptor.encoder.layer.10.attention.self.value.weight', 'VL_adaptor.encoder.layer.10.attention.self.value.bias', 'VL_adaptor.encoder.layer.10.attention.output.dense.weight', 'VL_adaptor.encoder.layer.10.attention.output.dense.bias', 'VL_adaptor.encoder.layer.10.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.10.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.10.intermediate.dense.weight', 'VL_adaptor.encoder.layer.10.intermediate.dense.bias', 'VL_adaptor.encoder.layer.10.output.dense.weight', 'VL_adaptor.encoder.layer.10.output.dense.bias', 'VL_adaptor.encoder.layer.10.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.10.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.11.attention.self.query.weight', 'VL_adaptor.encoder.layer.11.attention.self.query.bias', 'VL_adaptor.encoder.layer.11.attention.self.key.weight', 'VL_adaptor.encoder.layer.11.attention.self.key.bias', 'VL_adaptor.encoder.layer.11.attention.self.value.weight', 'VL_adaptor.encoder.layer.11.attention.self.value.bias', 'VL_adaptor.encoder.layer.11.attention.output.dense.weight', 'VL_adaptor.encoder.layer.11.attention.output.dense.bias', 'VL_adaptor.encoder.layer.11.attention.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.11.attention.output.LayerNorm.bias', 'VL_adaptor.encoder.layer.11.intermediate.dense.weight', 'VL_adaptor.encoder.layer.11.intermediate.dense.bias', 'VL_adaptor.encoder.layer.11.output.dense.weight', 'VL_adaptor.encoder.layer.11.output.dense.bias', 'VL_adaptor.encoder.layer.11.output.LayerNorm.weight', 'VL_adaptor.encoder.layer.11.output.LayerNorm.bias', 'VL_adaptor.feat_proj.weight', 'VL_adaptor.feat_proj.bias', 'opt_proj.weight', 'opt_proj.bias']
2024-01-03 03:50:39,876 [INFO] load checkpoint from /home/yiren/LAVIS/lavis/output/BLIP-T/Pretrain_stage0/vq/40m-noisy/checkpoint_60000.pth
2024-01-03 03:50:39,964 [INFO] Start training
2024-01-03 03:50:40,812 [INFO] dataset_ratios not specified, datasets will be concatenated (map-style datasets) or chained (webdataset.DataPipeline).
2024-01-03 03:50:40,812 [INFO] Loaded 259910 records for train split from the dataset.
2024-01-03 03:50:40,812 [INFO] Loaded 3000 records for val split from the dataset.
2024-01-03 03:50:40,822 [INFO] number of trainable parameters: 87810304
2024-01-03 03:50:40,822 [INFO] Start training epoch 0, 2030 iters per inner epoch.
/home/yiren/anaconda3/envs/lavis-OpCounter/lib/python3.8/site-packages/transformers/modeling_utils.py:810: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
Train: data epoch: [0]  [   0/2030]  eta: 0:39:59  lr: 0.000001  loss: 21.7157  time: 1.1820  data: 0.0000  max mem: 8448
2024-01-03 03:50:42,008 [INFO] Reducer buckets have been rebuilt in this iteration.
Train: data epoch: [0]  [  50/2030]  eta: 0:04:48  lr: 0.000006  loss: 5.4797  time: 0.1261  data: 0.0000  max mem: 9128
Train: data epoch: [0]  [ 100/2030]  eta: 0:04:23  lr: 0.000011  loss: 3.6765  time: 0.1255  data: 0.0000  max mem: 9128
Train: data epoch: [0]  [ 150/2030]  eta: 0:04:10  lr: 0.000016  loss: 2.6226  time: 0.1256  data: 0.0000  max mem: 9128
Train: data epoch: [0]  [ 200/2030]  eta: 0:04:00  lr: 0.000021  loss: 2.3513  time: 0.1262  data: 0.0000  max mem: 9128

You should see the loss at each iteration matching exactly to the numbers provided here

yiren-jian commented 10 months ago

The missing keys warning, if I recall, is added (then removed by authors in later version) to notify users the loaded weights. In my case, we load LLM, but not P-former, thus, it throws such warnings.

If I recall well,

  1. if we load P-former, it may throw warnings for missing keys in LLM
  2. and we load LLM, it throw warnings for missing keys in P-former (or Q-former etc.)
NingMa-AI commented 10 months ago

The missing keys warning, if I recall, is added (then removed by authors in later version) to notify users the loaded weights. In my case, we load LLM, but not P-former, thus, it throws such warnings.

If I recall well,

  1. if we load P-former, it may throw warnings for missing keys in LLM
  2. and we load LLM, it throw warnings for missing keys in P-former (or Q-former etc.)

Great, thanks for your patience. I will overlook this warning.

NingMa-AI commented 10 months ago

I suggest adding some related discussion in Readme.md