I am trying to use open flamingo with other sizes of llama models, would like to know if this can be done with some modifications. Amazing project by the way, so happy to see an open version :)
Expected Behavior
Was hoping to use llama-30b as language encoder, sizes other than 7B will not work with OpenFlamingo-9B.
Current Behavior
model, image_processor, tokenizer = create_model_and_transforms(
clip_vision_encoder_path="ViT-L-14",
clip_vision_encoder_pretrained="openai",
lang_encoder_path="/notebooks/llama-30b-hf/",
tokenizer_path="/notebooks/llama-30b-hf/",
cross_attn_every_n_layers=4
)
model.load_state_dict(torch.load('/notebooks/OpenFlamingo-9B/checkpoint.pt'), strict=False)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-10-597522fb2c51> in <module>
1 # checkpoint_path = hf_hub_download("openflamingo/OpenFlamingo-9B", "checkpoint.pt")
----> 2 model.load_state_dict(torch.load('/notebooks/OpenFlamingo-9B/checkpoint.pt'), strict=False)
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
1669 output = hook(self, name, value)
1670 if output is not None:
-> 1671 value = output
1672 buffers[name] = value
1673 else:
RuntimeError: Error(s) in loading state_dict for Flamingo:
size mismatch for lang_encoder.model.embed_tokens.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32003, 6656]).
size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.gated_cross_attn_layers.3.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.3.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.3.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.3.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.gated_cross_attn_layers.3.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.3.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.3.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.3.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.gated_cross_attn_layers.7.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.7.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.7.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.7.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.gated_cross_attn_layers.7.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.7.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.7.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.7.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.gated_cross_attn_layers.11.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.11.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.11.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.11.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.gated_cross_attn_layers.11.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.11.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.11.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.11.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.gated_cross_attn_layers.15.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.15.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.15.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.15.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.gated_cross_attn_layers.15.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.15.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.15.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.15.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.gated_cross_attn_layers.19.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.19.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.19.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.19.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.gated_cross_attn_layers.19.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.19.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.19.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.19.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.gated_cross_attn_layers.23.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.23.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.23.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.23.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.gated_cross_attn_layers.23.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.23.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.23.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.23.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.gated_cross_attn_layers.27.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.27.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.27.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.27.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.gated_cross_attn_layers.27.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.27.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.27.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.27.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
size mismatch for lang_encoder.gated_cross_attn_layers.31.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.31.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.31.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.31.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
size mismatch for lang_encoder.gated_cross_attn_layers.31.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.31.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.31.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
size mismatch for lang_encoder.gated_cross_attn_layers.31.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
...
Steps to Reproduce
The above chunk can be run if the compute resources are available.
Environment
Building python from requirements.txt. Code is being run on a gradient notebook with an IPU-POD16.
Detailed Description
This setup currently works for llama 7B. Is there a way to make this work now, or will we need to wait for a larger version of Open Flamingo and a new model checkpoint?
I am trying to use open flamingo with other sizes of llama models, would like to know if this can be done with some modifications. Amazing project by the way, so happy to see an open version :)
Expected Behavior
Was hoping to use llama-30b as language encoder, sizes other than 7B will not work with OpenFlamingo-9B.
Current Behavior
Steps to Reproduce
The above chunk can be run if the compute resources are available.
Environment
Building python from requirements.txt. Code is being run on a gradient notebook with an IPU-POD16.
Detailed Description
This setup currently works for llama 7B. Is there a way to make this work now, or will we need to wait for a larger version of Open Flamingo and a new model checkpoint?