Running open_flamingo with llama30B?

I am trying to use open flamingo with other sizes of llama models, would like to know if this can be done with some modifications. Amazing project by the way, so happy to see an open version :)
Expected Behavior

Was hoping to use llama-30b as language encoder, sizes other than 7B will not work with OpenFlamingo-9B.
Current Behavior

model, image_processor, tokenizer = create_model_and_transforms(
    clip_vision_encoder_path="ViT-L-14",
    clip_vision_encoder_pretrained="openai",
    lang_encoder_path="/notebooks/llama-30b-hf/",
    tokenizer_path="/notebooks/llama-30b-hf/",
    cross_attn_every_n_layers=4
)
model.load_state_dict(torch.load('/notebooks/OpenFlamingo-9B/checkpoint.pt'), strict=False)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-10-597522fb2c51> in <module>
      1 # checkpoint_path = hf_hub_download("openflamingo/OpenFlamingo-9B", "checkpoint.pt")
----> 2 model.load_state_dict(torch.load('/notebooks/OpenFlamingo-9B/checkpoint.pt'), strict=False)

/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
   1669                         output = hook(self, name, value)
   1670                         if output is not None:
-> 1671                             value = output
   1672                     buffers[name] = value
   1673                 else:

RuntimeError: Error(s) in loading state_dict for Flamingo:
    size mismatch for lang_encoder.model.embed_tokens.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32003, 6656]).
    size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.model.layers.3.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.model.layers.7.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.model.layers.11.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.model.layers.15.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.model.layers.19.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.model.layers.23.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.model.layers.27.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.model.layers.31.gated_cross_attn_layer.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.gated_cross_attn_layers.3.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.3.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.3.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.3.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.gated_cross_attn_layers.3.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.3.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.3.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.3.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.gated_cross_attn_layers.7.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.7.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.7.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.7.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.gated_cross_attn_layers.7.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.7.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.7.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.7.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.gated_cross_attn_layers.11.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.11.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.11.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.11.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.gated_cross_attn_layers.11.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.11.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.11.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.11.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.gated_cross_attn_layers.15.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.15.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.15.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.15.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.gated_cross_attn_layers.15.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.15.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.15.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.15.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.gated_cross_attn_layers.19.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.19.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.19.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.19.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.gated_cross_attn_layers.19.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.19.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.19.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.19.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.gated_cross_attn_layers.23.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.23.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.23.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.23.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.gated_cross_attn_layers.23.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.23.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.23.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.23.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.gated_cross_attn_layers.27.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.27.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.27.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.27.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.gated_cross_attn_layers.27.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.27.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.27.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.27.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
    size mismatch for lang_encoder.gated_cross_attn_layers.31.attn.norm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.31.attn.norm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.31.attn.to_q.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.31.attn.to_out.weight: copying a param with shape torch.Size([4096, 512]) from checkpoint, the shape in current model is torch.Size([6656, 512]).
    size mismatch for lang_encoder.gated_cross_attn_layers.31.ff.0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.31.ff.0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.31.ff.1.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([26624, 6656]).
    size mismatch for lang_encoder.gated_cross_attn_layers.31.ff.3.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([6656, 26624]).
...
Steps to Reproduce

The above chunk can be run if the compute resources are available.
Environment

Building python from requirements.txt. Code is being run on a gradient notebook with an IPU-POD16.
Detailed Description

This setup currently works for llama 7B. Is there a way to make this work now, or will we need to wait for a larger version of Open Flamingo and a new model checkpoint?
mlfoundations / open_flamingo