It seems that the shape of pre-trained focalnet_base_lrf is incompatible with Dino-base.

Thanks for your wonderful works! I was attracted here from the depository of Focalnet-Dino.

When I tried to repeat your experiments, it seems that the shape of pre-trained focalnet_base_lrf model is incompatible with Dino-base however.

The error log when loading pre-trained focalnet-base in Dino is here: RuntimeError: Error(s) in loading state_dict for FocalNet: size mismatch for patch_embed.proj.weight: copying a param with shape torch.Size([128, 3, 4, 4]) from checkpoint, the shape in current model is torch.Size([128, 3, 7, 7]). size mismatch for layers.0.downsample.proj.weight: copying a param with shape torch.Size([256, 128, 2, 2]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]). size mismatch for layers.1.downsample.proj.weight: copying a param with shape torch.Size([512, 256, 2, 2]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]). size mismatch for layers.2.downsample.proj.weight: copying a param with shape torch.Size([1024, 512, 2, 2]) from checkpoint, the shape in current model is torch.Size([1024, 512, 3, 3]).

For comparison, I have also tried focalnet_large_lrf_384_fl4 and the log is normal. The log when loading pretrained focalnet-large in Dino is here: _IncompatibleKeys(missing_keys=['norm0.weight', 'norm0.bias', 'norm1.weight', 'norm1.bias', 'norm2.weight', 'norm2.bias', 'norm3.weight', 'norm3.bias'], unexpected_keys=['norm.weight', 'norm.bias', 'head.weight', 'head.bias'])

So I guess some details of focalnet_base_lrf in pre-training have been modified somewhat. It's so appreciated for your help!

microsoft / FocalNet

It seems that the shape of pre-trained focalnet_base_lrf is incompatible with Dino-base. #40