kohjingyu / fromage

🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
https://jykoh.com/fromage
Apache License 2.0
474 stars 35 forks source link

I got 'KeyError: 'best_score'' while trying to fine-tuning #29

Closed kxxseola closed 1 year ago

kxxseola commented 1 year ago

Hi, I'm trying fine-tuning my dataset on your FROMAGe. I tried to refer to your kind explanation on README.md as follows on NVIDIA A100-SXM4-40GB(GCP).

python -u main.py \
        --multiprocessing-distributed \
        --epochs=100 \
        --resume='.../fromage/fromage_model/fromage_vis4/pretrained_ckpt.pth.tar' \
        --max-len=96 \
        --world-size 1 \
        --rank 0 \
        --dataset=cc3m  \
        --val-dataset=cc3m \
        --dataset_dir='.../fromage/datasets' \
        --opt-version='facebook/opt-6.7b' \
        --visual-model='openai/clip-vit-large-patch14' \
        --exp_name='fromage_exp' \
        --image-dir='.../fromage/datasets/images/'  \
        --log-base-dir='.../fromage/runs/' \
        --learning-rate=0.0012 \
        --batch-size=48 \
        --print-freq=10 \
        --precision='fp32'

But I got error like below.

=> loading checkpoint '.../fromage/fromage_model/fromage_vis4/pretrained_ckpt.pth.tar'
Traceback (most recent call last):
  File ".../fromage/main.py", line 642, in <module>
    main(sys.argv[1:])
  File ".../fromage/main.py", line 197, in main
    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
  File ".../anaconda3/envs/fromage/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File ".../anaconda3/envs/fromage/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
  File ".../anaconda3/envs/fromage/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File ".../anaconda3/envs/fromage/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File ".../fromage/main.py", line 322, in main_worker
    best_score = checkpoint['best_score']
KeyError: 'best_score'

Can you help me, please🥺?

ps. I replaced the file path before '/fromage' with '...' because there are my personal information.

kohjingyu commented 1 year ago

Thanks for reporting this! I think it's due to the "best_score" not being saved in the pretrained checkpoint. 1e0264a8860f7abadfee8858470073d8dfc0d6c8 should fix this by initializing it to 0 if it doesn't exist. The actual score of the pretrained model shouldn't really matter if you're finetuning, since it will likely be worse at iteration 0 than after a few rounds of finetuning anyway.

Can you let me know if it works?

kxxseola commented 1 year ago

Thank you for your quick response! I just experimented with it right now because I couldn't get an A100 instance. And I'm facing this error.

Traceback (most recent call last):
  File ".../fromage/main.py", line 642, in <module>
    main(sys.argv[1:])
  File ".../fromage/main.py", line 197, in main
    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
  File ".../anaconda3/envs/fromage/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File ".../anaconda3/envs/fromage/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
  File ".../anaconda3/envs/fromage/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File ".../anaconda3/envs/fromage/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File ".../fromage/main.py", line 325, in main_worker
    best_score = best_score.to(args.gpu)
AttributeError: 'int' object has no attribute 'to'
kohjingyu commented 1 year ago

Ah, this is what happens when you don't test on a GPU...This line should be obsolete, because best_score doesn't need to be a tensor. 3d9bb8a49c947d8db6820484c888d8c90e7dfc97 should fix this, I think.

kxxseola commented 1 year ago

Then I got CUDA error about A100. So I tried below.

$export CUDA_LAUNCH_BLOCKING=1
$pip3 uninstall torch torchvision torchaudio
$pip3 install torch torchvision torchaudio

And It says there's missing keys.

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File ".../anaconda3/envs/fromage_2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File ".../fromage/main.py", line 327, in main_worker
    model.load_state_dict(checkpoint['state_dict'])
  File ".../anaconda3/envs/fromage_2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DistributedDataParallel:
        Missing key(s) in state_dict: "module.model.logit_scale", "module.model.lm.model.decoder.embed_tokens.weight", "module.model.lm.model.decoder.embed_positions.weight", "module.model.lm.model.decoder.final_layer_norm.weight", "module.model.lm.model.decoder.final_layer_norm.bias", "module.model.lm.model.decoder.layers.0.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.0.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.0.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.0.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.0.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.0.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.0.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.0.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.0.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.0.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.0.fc1.weight", "module.model.lm.model.decoder.layers.0.fc1.bias", "module.model.lm.model.decoder.layers.0.fc2.weight", "module.model.lm.model.decoder.layers.0.fc2.bias", "module.model.lm.model.decoder.layers.0.final_layer_norm.weight", "module.model.lm.model.decoder.layers.0.final_layer_norm.bias", "module.model.lm.model.decoder.layers.1.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.1.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.1.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.1.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.1.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.1.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.1.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.1.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.1.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.1.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.1.fc1.weight", "module.model.lm.model.decoder.layers.1.fc1.bias", "module.model.lm.model.decoder.layers.1.fc2.weight", "module.model.lm.model.decoder.layers.1.fc2.bias", "module.model.lm.model.decoder.layers.1.final_layer_norm.weight", "module.model.lm.model.decoder.layers.1.final_layer_norm.bias", "module.model.lm.model.decoder.layers.2.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.2.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.2.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.2.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.2.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.2.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.2.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.2.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.2.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.2.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.2.fc1.weight", "module.model.lm.model.decoder.layers.2.fc1.bias", "module.model.lm.model.decoder.layers.2.fc2.weight", "module.model.lm.model.decoder.layers.2.fc2.bias", "module.model.lm.model.decoder.layers.2.final_layer_norm.weight", "module.model.lm.model.decoder.layers.2.final_layer_norm.bias", "module.model.lm.model.decoder.layers.3.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.3.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.3.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.3.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.3.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.3.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.3.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.3.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.3.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.3.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.3.fc1.weight", "module.model.lm.model.decoder.layers.3.fc1.bias", "module.model.lm.model.decoder.layers.3.fc2.weight", "module.model.lm.model.decoder.layers.3.fc2.bias", "module.model.lm.model.decoder.layers.3.final_layer_norm.weight", "module.model.lm.model.decoder.layers.3.final_layer_norm.bias", "module.model.lm.model.decoder.layers.4.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.4.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.4.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.4.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.4.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.4.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.4.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.4.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.4.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.4.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.4.fc1.weight", "module.model.lm.model.decoder.layers.4.fc1.bias", "module.model.lm.model.decoder.layers.4.fc2.weight", "module.model.lm.model.decoder.layers.4.fc2.bias", "module.model.lm.model.decoder.layers.4.final_layer_norm.weight", "module.model.lm.model.decoder.layers.4.final_layer_norm.bias", "module.model.lm.model.decoder.layers.5.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.5.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.5.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.5.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.5.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.5.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.5.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.5.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.5.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.5.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.5.fc1.weight", "module.model.lm.model.decoder.layers.5.fc1.bias", "module.model.lm.model.decoder.layers.5.fc2.weight", "module.model.lm.model.decoder.layers.5.fc2.bias", "module.model.lm.model.decoder.layers.5.final_layer_norm.weight", "module.model.lm.model.decoder.layers.5.final_layer_norm.bias", "module.model.lm.model.decoder.layers.6.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.6.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.6.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.6.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.6.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.6.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.6.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.6.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.6.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.6.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.6.fc1.weight", "module.model.lm.model.decoder.layers.6.fc1.bias", "module.model.lm.model.decoder.layers.6.fc2.weight", "module.model.lm.model.decoder.layers.6.fc2.bias", "module.model.lm.model.decoder.layers.6.final_layer_norm.weight", "module.model.lm.model.decoder.layers.6.final_layer_norm.bias", "module.model.lm.model.decoder.layers.7.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.7.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.7.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.7.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.7.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.7.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.7.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.7.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.7.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.7.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.7.fc1.weight", "module.model.lm.model.decoder.layers.7.fc1.bias", "module.model.lm.model.decoder.layers.7.fc2.weight", "module.model.lm.model.decoder.layers.7.fc2.bias", "module.model.lm.model.decoder.layers.7.final_layer_norm.weight", "module.model.lm.model.decoder.layers.7.final_layer_norm.bias", "module.model.lm.model.decoder.layers.8.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.8.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.8.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.8.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.8.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.8.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.8.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.8.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.8.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.8.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.8.fc1.weight", "module.model.lm.model.decoder.layers.8.fc1.bias", "module.model.lm.model.decoder.layers.8.fc2.weight", "module.model.lm.model.decoder.layers.8.fc2.bias", "module.model.lm.model.decoder.layers.8.final_layer_norm.weight", "module.model.lm.model.decoder.layers.8.final_layer_norm.bias", "module.model.lm.model.decoder.layers.9.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.9.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.9.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.9.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.9.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.9.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.9.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.9.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.9.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.9.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.9.fc1.weight", "module.model.lm.model.decoder.layers.9.fc1.bias", "module.model.lm.model.decoder.layers.9.fc2.weight", "module.model.lm.model.decoder.layers.9.fc2.bias", "module.model.lm.model.decoder.layers.9.final_layer_norm.weight", "module.model.lm.model.decoder.layers.9.final_layer_norm.bias", "module.model.lm.model.decoder.layers.10.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.10.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.10.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.10.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.10.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.10.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.10.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.10.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.10.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.10.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.10.fc1.weight", "module.model.lm.model.decoder.layers.10.fc1.bias", "module.model.lm.model.decoder.layers.10.fc2.weight", "module.model.lm.model.decoder.layers.10.fc2.bias", "module.model.lm.model.decoder.layers.10.final_layer_norm.weight", "module.model.lm.model.decoder.layers.10.final_layer_norm.bias", "module.model.lm.model.decoder.layers.11.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.11.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.11.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.11.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.11.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.11.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.11.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.11.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.11.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.11.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.11.fc1.weight", "module.model.lm.model.decoder.layers.11.fc1.bias", "module.model.lm.model.decoder.layers.11.fc2.weight", "module.model.lm.model.decoder.layers.11.fc2.bias", "module.model.lm.model.decoder.layers.11.final_layer_norm.weight", "module.model.lm.model.decoder.layers.11.final_layer_norm.bias", "module.model.lm.model.decoder.layers.12.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.12.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.12.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.12.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.12.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.12.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.12.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.12.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.12.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.12.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.12.fc1.weight", "module.model.lm.model.decoder.layers.12.fc1.bias", "module.model.lm.model.decoder.layers.12.fc2.weight", "module.model.lm.model.decoder.layers.12.fc2.bias", "module.model.lm.model.decoder.layers.12.final_layer_norm.weight", "module.model.lm.model.decoder.layers.12.final_layer_norm.bias", "module.model.lm.model.decoder.layers.13.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.13.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.13.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.13.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.13.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.13.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.13.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.13.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.13.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.13.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.13.fc1.weight", "module.model.lm.model.decoder.layers.13.fc1.bias", "module.model.lm.model.decoder.layers.13.fc2.weight", "module.model.lm.model.decoder.layers.13.fc2.bias", "module.model.lm.model.decoder.layers.13.final_layer_norm.weight", "module.model.lm.model.decoder.layers.13.final_layer_norm.bias", "module.model.lm.model.decoder.layers.14.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.14.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.14.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.14.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.14.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.14.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.14.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.14.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.14.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.14.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.14.fc1.weight", "module.model.lm.model.decoder.layers.14.fc1.bias", "module.model.lm.model.decoder.layers.14.fc2.weight", "module.model.lm.model.decoder.layers.14.fc2.bias", "module.model.lm.model.decoder.layers.14.final_layer_norm.weight", "module.model.lm.model.decoder.layers.14.final_layer_norm.bias", "module.model.lm.model.decoder.layers.15.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.15.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.15.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.15.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.15.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.15.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.15.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.15.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.15.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.15.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.15.fc1.weight", "module.model.lm.model.decoder.layers.15.fc1.bias", "module.model.lm.model.decoder.layers.15.fc2.weight", "module.model.lm.model.decoder.layers.15.fc2.bias", "module.model.lm.model.decoder.layers.15.final_layer_norm.weight", "module.model.lm.model.decoder.layers.15.final_layer_norm.bias", "module.model.lm.model.decoder.layers.16.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.16.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.16.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.16.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.16.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.16.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.16.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.16.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.16.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.16.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.16.fc1.weight", "module.model.lm.model.decoder.layers.16.fc1.bias", "module.model.lm.model.decoder.layers.16.fc2.weight", "module.model.lm.model.decoder.layers.16.fc2.bias", "module.model.lm.model.decoder.layers.16.final_layer_norm.weight", "module.model.lm.model.decoder.layers.16.final_layer_norm.bias", "module.model.lm.model.decoder.layers.17.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.17.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.17.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.17.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.17.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.17.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.17.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.17.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.17.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.17.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.17.fc1.weight", "module.model.lm.model.decoder.layers.17.fc1.bias", "module.model.lm.model.decoder.layers.17.fc2.weight", "module.model.lm.model.decoder.layers.17.fc2.bias", "module.model.lm.model.decoder.layers.17.final_layer_norm.weight", "module.model.lm.model.decoder.layers.17.final_layer_norm.bias", "module.model.lm.model.decoder.layers.18.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.18.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.18.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.18.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.18.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.18.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.18.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.18.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.18.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.18.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.18.fc1.weight", "module.model.lm.model.decoder.layers.18.fc1.bias", "module.model.lm.model.decoder.layers.18.fc2.weight", "module.model.lm.model.decoder.layers.18.fc2.bias", "module.model.lm.model.decoder.layers.18.final_layer_norm.weight", "module.model.lm.model.decoder.layers.18.final_layer_norm.bias", "module.model.lm.model.decoder.layers.19.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.19.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.19.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.19.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.19.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.19.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.19.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.19.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.19.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.19.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.19.fc1.weight", "module.model.lm.model.decoder.layers.19.fc1.bias", "module.model.lm.model.decoder.layers.19.fc2.weight", "module.model.lm.model.decoder.layers.19.fc2.bias", "module.model.lm.model.decoder.layers.19.final_layer_norm.weight", "module.model.lm.model.decoder.layers.19.final_layer_norm.bias", "module.model.lm.model.decoder.layers.20.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.20.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.20.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.20.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.20.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.20.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.20.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.20.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.20.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.20.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.20.fc1.weight", "module.model.lm.model.decoder.layers.20.fc1.bias", "module.model.lm.model.decoder.layers.20.fc2.weight", "module.model.lm.model.decoder.layers.20.fc2.bias", "module.model.lm.model.decoder.layers.20.final_layer_norm.weight", "module.model.lm.model.decoder.layers.20.final_layer_norm.bias", "module.model.lm.model.decoder.layers.21.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.21.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.21.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.21.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.21.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.21.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.21.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.21.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.21.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.21.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.21.fc1.weight", "module.model.lm.model.decoder.layers.21.fc1.bias", "module.model.lm.model.decoder.layers.21.fc2.weight", "module.model.lm.model.decoder.layers.21.fc2.bias", "module.model.lm.model.decoder.layers.21.final_layer_norm.weight", "module.model.lm.model.decoder.layers.21.final_layer_norm.bias", "module.model.lm.model.decoder.layers.22.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.22.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.22.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.22.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.22.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.22.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.22.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.22.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.22.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.22.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.22.fc1.weight", "module.model.lm.model.decoder.layers.22.fc1.bias", "module.model.lm.model.decoder.layers.22.fc2.weight", "module.model.lm.model.decoder.layers.22.fc2.bias", "module.model.lm.model.decoder.layers.22.final_layer_norm.weight", "module.model.lm.model.decoder.layers.22.final_layer_norm.bias", "module.model.lm.model.decoder.layers.23.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.23.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.23.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.23.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.23.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.23.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.23.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.23.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.23.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.23.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.23.fc1.weight", "module.model.lm.model.decoder.layers.23.fc1.bias", "module.model.lm.model.decoder.layers.23.fc2.weight", "module.model.lm.model.decoder.layers.23.fc2.bias", "module.model.lm.model.decoder.layers.23.final_layer_norm.weight", "module.model.lm.model.decoder.layers.23.final_layer_norm.bias", "module.model.lm.model.decoder.layers.24.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.24.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.24.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.24.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.24.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.24.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.24.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.24.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.24.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.24.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.24.fc1.weight", "module.model.lm.model.decoder.layers.24.fc1.bias", "module.model.lm.model.decoder.layers.24.fc2.weight", "module.model.lm.model.decoder.layers.24.fc2.bias", "module.model.lm.model.decoder.layers.24.final_layer_norm.weight", "module.model.lm.model.decoder.layers.24.final_layer_norm.bias", "module.model.lm.model.decoder.layers.25.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.25.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.25.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.25.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.25.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.25.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.25.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.25.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.25.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.25.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.25.fc1.weight", "module.model.lm.model.decoder.layers.25.fc1.bias", "module.model.lm.model.decoder.layers.25.fc2.weight", "module.model.lm.model.decoder.layers.25.fc2.bias", "module.model.lm.model.decoder.layers.25.final_layer_norm.weight", "module.model.lm.model.decoder.layers.25.final_layer_norm.bias", "module.model.lm.model.decoder.layers.26.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.26.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.26.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.26.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.26.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.26.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.26.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.26.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.26.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.26.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.26.fc1.weight", "module.model.lm.model.decoder.layers.26.fc1.bias", "module.model.lm.model.decoder.layers.26.fc2.weight", "module.model.lm.model.decoder.layers.26.fc2.bias", "module.model.lm.model.decoder.layers.26.final_layer_norm.weight", "module.model.lm.model.decoder.layers.26.final_layer_norm.bias", "module.model.lm.model.decoder.layers.27.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.27.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.27.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.27.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.27.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.27.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.27.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.27.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.27.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.27.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.27.fc1.weight", "module.model.lm.model.decoder.layers.27.fc1.bias", "module.model.lm.model.decoder.layers.27.fc2.weight", "module.model.lm.model.decoder.layers.27.fc2.bias", "module.model.lm.model.decoder.layers.27.final_layer_norm.weight", "module.model.lm.model.decoder.layers.27.final_layer_norm.bias", "module.model.lm.model.decoder.layers.28.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.28.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.28.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.28.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.28.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.28.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.28.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.28.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.28.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.28.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.28.fc1.weight", "module.model.lm.model.decoder.layers.28.fc1.bias", "module.model.lm.model.decoder.layers.28.fc2.weight", "module.model.lm.model.decoder.layers.28.fc2.bias", "module.model.lm.model.decoder.layers.28.final_layer_norm.weight", "module.model.lm.model.decoder.layers.28.final_layer_norm.bias", "module.model.lm.model.decoder.layers.29.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.29.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.29.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.29.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.29.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.29.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.29.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.29.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.29.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.29.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.29.fc1.weight", "module.model.lm.model.decoder.layers.29.fc1.bias", "module.model.lm.model.decoder.layers.29.fc2.weight", "module.model.lm.model.decoder.layers.29.fc2.bias", "module.model.lm.model.decoder.layers.29.final_layer_norm.weight", "module.model.lm.model.decoder.layers.29.final_layer_norm.bias", "module.model.lm.model.decoder.layers.30.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.30.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.30.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.30.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.30.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.30.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.30.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.30.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.30.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.30.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.30.fc1.weight", "module.model.lm.model.decoder.layers.30.fc1.bias", "module.model.lm.model.decoder.layers.30.fc2.weight", "module.model.lm.model.decoder.layers.30.fc2.bias", "module.model.lm.model.decoder.layers.30.final_layer_norm.weight", "module.model.lm.model.decoder.layers.30.final_layer_norm.bias", "module.model.lm.model.decoder.layers.31.self_attn.k_proj.weight", "module.model.lm.model.decoder.layers.31.self_attn.k_proj.bias", "module.model.lm.model.decoder.layers.31.self_attn.v_proj.weight", "module.model.lm.model.decoder.layers.31.self_attn.v_proj.bias", "module.model.lm.model.decoder.layers.31.self_attn.q_proj.weight", "module.model.lm.model.decoder.layers.31.self_attn.q_proj.bias", "module.model.lm.model.decoder.layers.31.self_attn.out_proj.weight", "module.model.lm.model.decoder.layers.31.self_attn.out_proj.bias", "module.model.lm.model.decoder.layers.31.self_attn_layer_norm.weight", "module.model.lm.model.decoder.layers.31.self_attn_layer_norm.bias", "module.model.lm.model.decoder.layers.31.fc1.weight", "module.model.lm.model.decoder.layers.31.fc1.bias", "module.model.lm.model.decoder.layers.31.fc2.weight", "module.model.lm.model.decoder.layers.31.fc2.bias", "module.model.lm.model.decoder.layers.31.final_layer_norm.weight", "module.model.lm.model.decoder.layers.31.final_layer_norm.bias", "module.model.lm.lm_head.weight", "module.model.input_embeddings.weight", "module.model.visual_model.vision_model.embeddings.class_embedding", "module.model.visual_model.vision_model.embeddings.position_ids", "module.model.visual_model.vision_model.embeddings.patch_embedding.weight", "module.model.visual_model.vision_model.embeddings.position_embedding.weight", "module.model.visual_model.vision_model.pre_layrnorm.weight", "module.model.visual_model.vision_model.pre_layrnorm.bias", "module.model.visual_model.vision_model.encoder.layers.0.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.0.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.0.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.0.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.0.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.0.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.0.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.0.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.0.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.0.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.0.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.0.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.0.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.0.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.0.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.0.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.1.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.1.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.1.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.1.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.1.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.1.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.1.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.1.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.1.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.1.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.1.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.1.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.1.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.1.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.1.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.1.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.2.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.2.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.2.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.2.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.2.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.2.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.2.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.2.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.2.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.2.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.2.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.2.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.2.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.2.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.2.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.2.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.3.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.3.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.3.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.3.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.3.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.3.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.3.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.3.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.3.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.3.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.3.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.3.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.3.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.3.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.3.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.3.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.4.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.4.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.4.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.4.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.4.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.4.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.4.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.4.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.4.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.4.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.4.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.4.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.4.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.4.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.4.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.4.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.5.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.5.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.5.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.5.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.5.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.5.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.5.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.5.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.5.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.5.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.5.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.5.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.5.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.5.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.5.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.5.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.6.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.6.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.6.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.6.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.6.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.6.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.6.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.6.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.6.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.6.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.6.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.6.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.6.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.6.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.6.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.6.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.7.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.7.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.7.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.7.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.7.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.7.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.7.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.7.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.7.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.7.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.7.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.7.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.7.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.7.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.7.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.7.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.8.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.8.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.8.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.8.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.8.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.8.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.8.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.8.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.8.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.8.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.8.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.8.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.8.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.8.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.8.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.8.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.9.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.9.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.9.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.9.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.9.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.9.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.9.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.9.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.9.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.9.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.9.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.9.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.9.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.9.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.9.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.9.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.10.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.10.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.10.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.10.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.10.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.10.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.10.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.10.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.10.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.10.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.10.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.10.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.10.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.10.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.10.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.10.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.11.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.11.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.11.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.11.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.11.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.11.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.11.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.11.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.11.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.11.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.11.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.11.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.11.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.11.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.11.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.11.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.12.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.12.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.12.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.12.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.12.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.12.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.12.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.12.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.12.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.12.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.12.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.12.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.12.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.12.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.12.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.12.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.13.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.13.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.13.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.13.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.13.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.13.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.13.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.13.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.13.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.13.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.13.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.13.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.13.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.13.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.13.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.13.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.14.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.14.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.14.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.14.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.14.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.14.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.14.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.14.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.14.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.14.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.14.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.14.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.14.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.14.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.14.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.14.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.15.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.15.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.15.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.15.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.15.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.15.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.15.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.15.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.15.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.15.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.15.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.15.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.15.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.15.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.15.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.15.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.16.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.16.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.16.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.16.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.16.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.16.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.16.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.16.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.16.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.16.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.16.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.16.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.16.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.16.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.16.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.16.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.17.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.17.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.17.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.17.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.17.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.17.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.17.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.17.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.17.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.17.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.17.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.17.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.17.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.17.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.17.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.17.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.18.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.18.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.18.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.18.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.18.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.18.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.18.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.18.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.18.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.18.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.18.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.18.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.18.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.18.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.18.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.18.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.19.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.19.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.19.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.19.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.19.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.19.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.19.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.19.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.19.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.19.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.19.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.19.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.19.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.19.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.19.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.19.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.20.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.20.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.20.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.20.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.20.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.20.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.20.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.20.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.20.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.20.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.20.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.20.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.20.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.20.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.20.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.20.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.21.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.21.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.21.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.21.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.21.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.21.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.21.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.21.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.21.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.21.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.21.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.21.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.21.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.21.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.21.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.21.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.22.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.22.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.22.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.22.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.22.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.22.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.22.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.22.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.22.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.22.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.22.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.22.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.22.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.22.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.22.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.22.layer_norm2.bias", "module.model.visual_model.vision_model.encoder.layers.23.self_attn.k_proj.weight", "module.model.visual_model.vision_model.encoder.layers.23.self_attn.k_proj.bias", "module.model.visual_model.vision_model.encoder.layers.23.self_attn.v_proj.weight", "module.model.visual_model.vision_model.encoder.layers.23.self_attn.v_proj.bias", "module.model.visual_model.vision_model.encoder.layers.23.self_attn.q_proj.weight", "module.model.visual_model.vision_model.encoder.layers.23.self_attn.q_proj.bias", "module.model.visual_model.vision_model.encoder.layers.23.self_attn.out_proj.weight", "module.model.visual_model.vision_model.encoder.layers.23.self_attn.out_proj.bias", "module.model.visual_model.vision_model.encoder.layers.23.layer_norm1.weight", "module.model.visual_model.vision_model.encoder.layers.23.layer_norm1.bias", "module.model.visual_model.vision_model.encoder.layers.23.mlp.fc1.weight", "module.model.visual_model.vision_model.encoder.layers.23.mlp.fc1.bias", "module.model.visual_model.vision_model.encoder.layers.23.mlp.fc2.weight", "module.model.visual_model.vision_model.encoder.layers.23.mlp.fc2.bias", "module.model.visual_model.vision_model.encoder.layers.23.layer_norm2.weight", "module.model.visual_model.vision_model.encoder.layers.23.layer_norm2.bias", "module.model.visual_model.vision_model.post_layernorm.weight", "module.model.visual_model.vision_model.post_layernorm.bias", "module.model.text_hidden_fcs.0.0.weight", "module.model.text_hidden_fcs.0.0.bias", "module.model.visual_embeddings.weight", "module.model.visual_embeddings.bias", "module.model.visual_fc.weight", "module.model.visual_fc.bias". 
        Unexpected key(s) in state_dict: "model.logit_scale", "model.text_hidden_fcs.0.0.bias", "model.text_hidden_fcs.0.0.weight", "model.visual_embeddings.bias", "model.visual_embeddings.weight", "model.visual_fc.bias", "model.visual_fc.weight", "ret_input_embeddings.weight". 
kxxseola commented 1 year ago
weights = {'model.logit_scale':'module.model.logit_scale',
           'model.text_hidden_fcs.0.0.bias':'module.model.text_hidden_fcs.0.0.bias',
           'model.text_hidden_fcs.0.0.weight':'module.model.text_hidden_fcs.0.0.weight',
           'model.visual_embeddings.bias':'module.model.visual_embeddings.bias',
           'model.visual_embeddings.weight':'module.model.visual_embeddings.weight',
           'model.visual_fc.bias':'module.model.visual_fc.bias',
           'model.visual_fc.weight':'module.model.visual_fc.weight',
           'ret_input_embeddings.weight':'module.model.input_embeddings.weight'}

#write
with open('.../fromage/runs/fromage_exp/model_args.json', 'r') as f:
        model_kwargs = json.load(f)
        ret_token_idx = model_kwargs['retrieval_token_idx']

for k,v in weights.items():    
    if k == 'ret_input_embeddings.weight':
        checkpoint2['state_dict'][v][ret_token_idx:ret_token_idx+1, :] = checkpoint1['state_dict'][k]
    else:
        checkpoint2['state_dict'][v] = checkpoint1['state_dict'][k]
    print(k)        

torch.save(checkpoint2, '.../fromage/fromage_model/test_model/ckpt' + '.pth.tar')

'checkpoint1' is your pretrained_ckpt(prune_model) and 'checkpoint2' is just train(epoch=1) model. And then there is no any error now. But I want to know this is right way. Can you check this?

kohjingyu commented 1 year ago

This looks correct to me, it's essential inverting the pruning process.

kxxseola commented 1 year ago

Thank you for your kind and quick reply!