focalnet_large_fl4_o365_finetuned_on_coco.pth size mismatch

Hello,
Thank you for sharing your experiment.
I am trying to train an object detection based on focalnet large from this checkpoint : https://github.com/FocalNet/FocalNet-DINO#training
However some mismatch size is happening. It happened with the checkpoint pretrained on object 365 and then finetuned on coco dataset. I am using this config file : "DINO_4scale_focalnet_large_fl4.py" instead of "DINO_4scale_focalnet_fl4.py" as I did not find it in the repo. I was wondering if the config file uploaded in the repo was the correct one ?
Here the message :
RuntimeError: Error(s) in loading state_dict for DINO:
        size mismatch for transformer.level_embed: copying a param with shape torch.Size([5, 256]) from checkpoint, the shape in current model is torch.Size([4, 256]).
        size mismatch for transformer.encoder.layers.0.self_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.encoder.layers.0.self_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.encoder.layers.0.self_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.encoder.layers.0.self_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.encoder.layers.1.self_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.encoder.layers.1.self_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.encoder.layers.1.self_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.encoder.layers.1.self_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.encoder.layers.2.self_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.encoder.layers.2.self_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.encoder.layers.2.self_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.encoder.layers.2.self_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.encoder.layers.3.self_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.encoder.layers.3.self_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.encoder.layers.3.self_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.encoder.layers.3.self_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.encoder.layers.4.self_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.encoder.layers.4.self_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.encoder.layers.4.self_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.encoder.layers.4.self_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.encoder.layers.5.self_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.encoder.layers.5.self_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.encoder.layers.5.self_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.encoder.layers.5.self_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.decoder.layers.0.cross_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.decoder.layers.0.cross_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.decoder.layers.0.cross_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.decoder.layers.0.cross_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.decoder.layers.1.cross_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.decoder.layers.1.cross_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.decoder.layers.1.cross_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.decoder.layers.1.cross_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.decoder.layers.2.cross_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.decoder.layers.2.cross_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.decoder.layers.2.cross_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.decoder.layers.2.cross_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.decoder.layers.3.cross_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.decoder.layers.3.cross_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.decoder.layers.3.cross_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.decoder.layers.3.cross_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.decoder.layers.4.cross_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.decoder.layers.4.cross_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.decoder.layers.4.cross_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.decoder.layers.4.cross_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.decoder.layers.5.cross_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.decoder.layers.5.cross_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.decoder.layers.5.cross_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.decoder.layers.5.cross_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for input_proj.0.0.weight: copying a param with shape torch.Size([256, 192, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 384, 1, 1]).
        size mismatch for input_proj.1.0.weight: copying a param with shape torch.Size([256, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 768, 1, 1]).
        size mismatch for input_proj.2.0.weight: copying a param with shape torch.Size([256, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 1536, 1, 1]).
        size mismatch for input_proj.3.0.weight: copying a param with shape torch.Size([256, 1536, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 1536, 3, 3]).
microsoft / FocalNet

focalnet_large_fl4_o365_finetuned_on_coco.pth size mismatch #24