Closed antofuller closed 2 weeks ago
Hey, I dont think the squeeze is required, but maybe it would be great if you could share the error you got before you added your "fix".
Thanks for the reply!
Here is the error without my "fix":
RuntimeError: Error(s) in loading state_dict for FCMAE:
size mismatch for encoder.downsample_layers.0.1.bias: copying a param with shape torch.Size([1, 80]) from checkpoint, the shape in current model is torch.Size([80]).
size mismatch for encoder.downsample_layers.1.1.bias: copying a param with shape torch.Size([1, 160]) from checkpoint, the shape in current model is torch.Size([160]).
size mismatch for encoder.downsample_layers.2.1.bias: copying a param with shape torch.Size([1, 320]) from checkpoint, the shape in current model is torch.Size([320]).
size mismatch for encoder.initial_conv.0.bias: copying a param with shape torch.Size([1, 40]) from checkpoint, the shape in current model is torch.Size([40]).
size mismatch for encoder.stem.0.bias: copying a param with shape torch.Size([1, 40]) from checkpoint, the shape in current model is torch.Size([40]).
size mismatch for encoder.stages.0.0.dwconv.bias: copying a param with shape torch.Size([1, 40]) from checkpoint, the shape in current model is torch.Size([40]).
size mismatch for encoder.stages.0.0.grn.gamma: copying a param with shape torch.Size([1, 160]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 160]).
size mismatch for encoder.stages.0.0.grn.beta: copying a param with shape torch.Size([1, 160]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 160]).
size mismatch for encoder.stages.0.1.dwconv.bias: copying a param with shape torch.Size([1, 40]) from checkpoint, the shape in current model is torch.Size([40]).
size mismatch for encoder.stages.0.1.grn.gamma: copying a param with shape torch.Size([1, 160]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 160]).
size mismatch for encoder.stages.0.1.grn.beta: copying a param with shape torch.Size([1, 160]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 160]).
size mismatch for encoder.stages.1.0.dwconv.bias: copying a param with shape torch.Size([1, 80]) from checkpoint, the shape in current model is torch.Size([80]).
size mismatch for encoder.stages.1.0.grn.gamma: copying a param with shape torch.Size([1, 320]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 320]).
size mismatch for encoder.stages.1.0.grn.beta: copying a param with shape torch.Size([1, 320]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 320]).
size mismatch for encoder.stages.1.1.dwconv.bias: copying a param with shape torch.Size([1, 80]) from checkpoint, the shape in current model is torch.Size([80]).
size mismatch for encoder.stages.1.1.grn.gamma: copying a param with shape torch.Size([1, 320]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 320]).
size mismatch for encoder.stages.1.1.grn.beta: copying a param with shape torch.Size([1, 320]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 320]).
size mismatch for encoder.stages.2.0.dwconv.bias: copying a param with shape torch.Size([1, 160]) from checkpoint, the shape in current model is torch.Size([160]).
size mismatch for encoder.stages.2.0.grn.gamma: copying a param with shape torch.Size([1, 640]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 640]).
size mismatch for encoder.stages.2.0.grn.beta: copying a param with shape torch.Size([1, 640]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 640]).
size mismatch for encoder.stages.2.1.dwconv.bias: copying a param with shape torch.Size([1, 160]) from checkpoint, the shape in current model is torch.Size([160]).
size mismatch for encoder.stages.2.1.grn.gamma: copying a param with shape torch.Size([1, 640]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 640]).
size mismatch for encoder.stages.2.1.grn.beta: copying a param with shape torch.Size([1, 640]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 640]).
size mismatch for encoder.stages.2.2.dwconv.bias: copying a param with shape torch.Size([1, 160]) from checkpoint, the shape in current model is torch.Size([160]).
size mismatch for encoder.stages.2.2.grn.gamma: copying a param with shape torch.Size([1, 640]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 640]).
size mismatch for encoder.stages.2.2.grn.beta: copying a param with shape torch.Size([1, 640]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 640]).
size mismatch for encoder.stages.2.3.dwconv.bias: copying a param with shape torch.Size([1, 160]) from checkpoint, the shape in current model is torch.Size([160]).
size mismatch for encoder.stages.2.3.grn.gamma: copying a param with shape torch.Size([1, 640]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 640]).
size mismatch for encoder.stages.2.3.grn.beta: copying a param with shape torch.Size([1, 640]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 640]).
size mismatch for encoder.stages.2.4.dwconv.bias: copying a param with shape torch.Size([1, 160]) from checkpoint, the shape in current model is torch.Size([160]).
size mismatch for encoder.stages.2.4.grn.gamma: copying a param with shape torch.Size([1, 640]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 640]).
size mismatch for encoder.stages.2.4.grn.beta: copying a param with shape torch.Size([1, 640]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 640]).
size mismatch for encoder.stages.2.5.dwconv.bias: copying a param with shape torch.Size([1, 160]) from checkpoint, the shape in current model is torch.Size([160]).
size mismatch for encoder.stages.2.5.grn.gamma: copying a param with shape torch.Size([1, 640]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 640]).
size mismatch for encoder.stages.2.5.grn.beta: copying a param with shape torch.Size([1, 640]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 640]).
size mismatch for encoder.stages.3.0.dwconv.bias: copying a param with shape torch.Size([1, 320]) from checkpoint, the shape in current model is torch.Size([320]).
size mismatch for encoder.stages.3.0.grn.gamma: copying a param with shape torch.Size([1, 1280]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 1280]).
size mismatch for encoder.stages.3.0.grn.beta: copying a param with shape torch.Size([1, 1280]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 1280]).
size mismatch for encoder.stages.3.1.dwconv.bias: copying a param with shape torch.Size([1, 320]) from checkpoint, the shape in current model is torch.Size([320]).
size mismatch for encoder.stages.3.1.grn.gamma: copying a param with shape torch.Size([1, 1280]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 1280]).
size mismatch for encoder.stages.3.1.grn.beta: copying a param with shape torch.Size([1, 1280]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 1280]).
And here is a Google Colab notebook to reproduce it: (edited)
I commented out all sparse functionality in models/fcmae.py
for this to run in Colab.
Thanks again for the support :)
I was able to fix the issue by using your remap_checkpoint_keys
function to make the weights compatible with the non-sparse version of ConvNext-V2.
Hi, great work with MMEarth!
I am trying to load an MMEarth pretrained state dictionary but am having some issues — including when using the provided example:
I'm using the weights stored in
pt-all_mod_atto_1M_128_uncertainty_112-16
. The first issue that throws an error (even withstrict=False
) are shape mismatches between the state dictionary and the initialized model. But I think they can be fixed via:With these reshaped weights,
torch.load
now warns of many missing keys:Any help would be much appreciated! And I apologize if I'm doing something dumb :)
Thanks again, Anthony