keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.41k stars 82 forks source link

Can not load the pretrained convnext_small model #60

Closed lyangfan closed 10 months ago

lyangfan commented 10 months ago

I trained a convnext_small model using my own dataset. When I try to load the weights:

import torch, timm
convnext_s, state = timm.create_model('convnext_small'), torch.load('convnext_small_1kpretrained_timm_style.pth', 'cpu')
convnext_s.load_state_dict(state.get('module', state), strict=False)  

I got these output:

_IncompatibleKeys(missing_keys=['stem.0.weight', 'stem.0.bias', 'stem.1.weight', 'stem.1.bias', 'stages.0.blocks.0.gamma', 'stages.0.blocks.0.conv_dw.weight', 'stages.0.blocks.0.conv_dw.bias', 'stages.0.blocks.0.norm.weight', 'stages.0.blocks.0.norm.bias', 'stages.0.blocks.0.mlp.fc1.weight', 'stages.0.blocks.0.mlp.fc1.bias', 'stages.0.blocks.0.mlp.fc2.weight', 'stages.0.blocks.0.mlp.fc2.bias', 'stages.0.blocks.1.gamma', 'stages.0.blocks.1.conv_dw.weight', 'stages.0.blocks.1.conv_dw.bias', 'stages.0.blocks.1.norm.weight', 'stages.0.blocks.1.norm.bias', 'stages.0.blocks.1.mlp.fc1.weight', 'stages.0.blocks.1.mlp.fc1.bias', 'stages.0.blocks.1.mlp.fc2.weight', 'stages.0.blocks.1.mlp.fc2.bias', 'stages.0.blocks.2.gamma', 'stages.0.blocks.2.conv_dw.weight', 'stages.0.blocks.2.conv_dw.bias', 'stages.0.blocks.2.norm.weight', 'stages.0.blocks.2.norm.bias', 'stages.0.blocks.2.mlp.fc1.weight', 'stages.0.blocks.2.mlp.fc1.bias', 'stages.0.blocks.2.mlp.fc2.weight', 'stages.0.blocks.2.mlp.fc2.bias', 'stages.1.downsample.0.weight', 'stages.1.downsample.0.bias', 'stages.1.downsample.1.weight', 'stages.1.downsample.1.bias', 'stages.1.blocks.0.gamma', 'stages.1.blocks.0.conv_dw.weight', 'stages.1.blocks.0.conv_dw.bias', 'stages.1.blocks.0.norm.weight', 'stages.1.blocks.0.norm.bias', 'stages.1.blocks.0.mlp.fc1.weight', 'stages.1.blocks.0.mlp.fc1.bias', 'stages.1.blocks.0.mlp.fc2.weight', 'stages.1.blocks.0.mlp.fc2.bias', 'stages.1.blocks.1.gamma', 'stages.1.blocks.1.conv_dw.weight', 'stages.1.blocks.1.conv_dw.bias', 'stages.1.blocks.1.norm.weight', 'stages.1.blocks.1.norm.bias', 'stages.1.blocks.1.mlp.fc1.weight', 'stages.1.blocks.1.mlp.fc1.bias', 'stages.1.blocks.1.mlp.fc2.weight', 'stages.1.blocks.1.mlp.fc2.bias', 'stages.1.blocks.2.gamma', 'stages.1.blocks.2.conv_dw.weight', 'stages.1.blocks.2.conv_dw.bias', 'stages.1.blocks.2.norm.weight', 'stages.1.blocks.2.norm.bias', 'stages.1.blocks.2.mlp.fc1.weight', 'stages.1.blocks.2.mlp.fc1.bias', 'stages.1.blocks.2.mlp.fc2.weight', 'stages.1.blocks.2.mlp.fc2.bias', 'stages.2.downsample.0.weight', 'stages.2.downsample.0.bias', 'stages.2.downsample.1.weight', 'stages.2.downsample.1.bias', 'stages.2.blocks.0.gamma', 'stages.2.blocks.0.conv_dw.weight', 'stages.2.blocks.0.conv_dw.bias', 'stages.2.blocks.0.norm.weight', 'stages.2.blocks.0.norm.bias', 'stages.2.blocks.0.mlp.fc1.weight', 'stages.2.blocks.0.mlp.fc1.bias', 'stages.2.blocks.0.mlp.fc2.weight', 'stages.2.blocks.0.mlp.fc2.bias', 'stages.2.blocks.1.gamma', 'stages.2.blocks.1.conv_dw.weight', 'stages.2.blocks.1.conv_dw.bias', 'stages.2.blocks.1.norm.weight', 'stages.2.blocks.1.norm.bias', 'stages.2.blocks.1.mlp.fc1.weight', 'stages.2.blocks.1.mlp.fc1.bias', 'stages.2.blocks.1.mlp.fc2.weight', 'stages.2.blocks.1.mlp.fc2.bias', 'stages.2.blocks.2.gamma', 'stages.2.blocks.2.conv_dw.weight', 'stages.2.blocks.2.conv_dw.bias', 'stages.2.blocks.2.norm.weight', 'stages.2.blocks.2.norm.bias', 'stages.2.blocks.2.mlp.fc1.weight', 'stages.2.blocks.2.mlp.fc1.bias', 'stages.2.blocks.2.mlp.fc2.weight', 'stages.2.blocks.2.mlp.fc2.bias', 'stages.2.blocks.3.gamma', 'stages.2.blocks.3.conv_dw.weight', 'stages.2.blocks.3.conv_dw.bias', 'stages.2.blocks.3.norm.weight', 'stages.2.blocks.3.norm.bias', 'stages.2.blocks.3.mlp.fc1.weight', 'stages.2.blocks.3.mlp.fc1.bias', 'stages.2.blocks.3.mlp.fc2.weight', 'stages.2.blocks.3.mlp.fc2.bias', 'stages.2.blocks.4.gamma', 'stages.2.blocks.4.conv_dw.weight', 'stages.2.blocks.4.conv_dw.bias', 'stages.2.blocks.4.norm.weight', 'stages.2.blocks.4.norm.bias', 'stages.2.blocks.4.mlp.fc1.weight', 'stages.2.blocks.4.mlp.fc1.bias', 'stages.2.blocks.4.mlp.fc2.weight', 'stages.2.blocks.4.mlp.fc2.bias', 'stages.2.blocks.5.gamma', 'stages.2.blocks.5.conv_dw.weight', 'stages.2.blocks.5.conv_dw.bias', 'stages.2.blocks.5.norm.weight', 'stages.2.blocks.5.norm.bias', 'stages.2.blocks.5.mlp.fc1.weight', 'stages.2.blocks.5.mlp.fc1.bias', 'stages.2.blocks.5.mlp.fc2.weight', 'stages.2.blocks.5.mlp.fc2.bias', 'stages.2.blocks.6.gamma', 'stages.2.blocks.6.conv_dw.weight', 'stages.2.blocks.6.conv_dw.bias', 'stages.2.blocks.6.norm.weight', 'stages.2.blocks.6.norm.bias', 'stages.2.blocks.6.mlp.fc1.weight', 'stages.2.blocks.6.mlp.fc1.bias', 'stages.2.blocks.6.mlp.fc2.weight', 'stages.2.blocks.6.mlp.fc2.bias', 'stages.2.blocks.7.gamma', 'stages.2.blocks.7.conv_dw.weight', 'stages.2.blocks.7.conv_dw.bias', 'stages.2.blocks.7.norm.weight', 'stages.2.blocks.7.norm.bias', 'stages.2.blocks.7.mlp.fc1.weight', 'stages.2.blocks.7.mlp.fc1.bias', 'stages.2.blocks.7.mlp.fc2.weight', 'stages.2.blocks.7.mlp.fc2.bias', 'stages.2.blocks.8.gamma', 'stages.2.blocks.8.conv_dw.weight', 'stages.2.blocks.8.conv_dw.bias', 'stages.2.blocks.8.norm.weight', 'stages.2.blocks.8.norm.bias', 'stages.2.blocks.8.mlp.fc1.weight', 'stages.2.blocks.8.mlp.fc1.bias', 'stages.2.blocks.8.mlp.fc2.weight', 'stages.2.blocks.8.mlp.fc2.bias', 'stages.2.blocks.9.gamma', 'stages.2.blocks.9.conv_dw.weight', 'stages.2.blocks.9.conv_dw.bias', 'stages.2.blocks.9.norm.weight', 'stages.2.blocks.9.norm.bias', 'stages.2.blocks.9.mlp.fc1.weight', 'stages.2.blocks.9.mlp.fc1.bias', 'stages.2.blocks.9.mlp.fc2.weight', 'stages.2.blocks.9.mlp.fc2.bias', 'stages.2.blocks.10.gamma', 'stages.2.blocks.10.conv_dw.weight', 'stages.2.blocks.10.conv_dw.bias', 'stages.2.blocks.10.norm.weight', 'stages.2.blocks.10.norm.bias', 'stages.2.blocks.10.mlp.fc1.weight', 'stages.2.blocks.10.mlp.fc1.bias', 'stages.2.blocks.10.mlp.fc2.weight', 'stages.2.blocks.10.mlp.fc2.bias', 'stages.2.blocks.11.gamma', 'stages.2.blocks.11.conv_dw.weight', 'stages.2.blocks.11.conv_dw.bias', 'stages.2.blocks.11.norm.weight', 'stages.2.blocks.11.norm.bias', 'stages.2.blocks.11.mlp.fc1.weight', 'stages.2.blocks.11.mlp.fc1.bias', 'stages.2.blocks.11.mlp.fc2.weight', 'stages.2.blocks.11.mlp.fc2.bias', 'stages.2.blocks.12.gamma', 'stages.2.blocks.12.conv_dw.weight', 'stages.2.blocks.12.conv_dw.bias', 'stages.2.blocks.12.norm.weight', 'stages.2.blocks.12.norm.bias', 'stages.2.blocks.12.mlp.fc1.weight', 'stages.2.blocks.12.mlp.fc1.bias', 'stages.2.blocks.12.mlp.fc2.weight', 'stages.2.blocks.12.mlp.fc2.bias', 'stages.2.blocks.13.gamma', 'stages.2.blocks.13.conv_dw.weight', 'stages.2.blocks.13.conv_dw.bias', 'stages.2.blocks.13.norm.weight', 'stages.2.blocks.13.norm.bias', 'stages.2.blocks.13.mlp.fc1.weight', 'stages.2.blocks.13.mlp.fc1.bias', 'stages.2.blocks.13.mlp.fc2.weight', 'stages.2.blocks.13.mlp.fc2.bias', 'stages.2.blocks.14.gamma', 'stages.2.blocks.14.conv_dw.weight', 'stages.2.blocks.14.conv_dw.bias', 'stages.2.blocks.14.norm.weight', 'stages.2.blocks.14.norm.bias', 'stages.2.blocks.14.mlp.fc1.weight', 'stages.2.blocks.14.mlp.fc1.bias', 'stages.2.blocks.14.mlp.fc2.weight', 'stages.2.blocks.14.mlp.fc2.bias', 'stages.2.blocks.15.gamma', 'stages.2.blocks.15.conv_dw.weight', 'stages.2.blocks.15.conv_dw.bias', 'stages.2.blocks.15.norm.weight', 'stages.2.blocks.15.norm.bias', 'stages.2.blocks.15.mlp.fc1.weight', 'stages.2.blocks.15.mlp.fc1.bias', 'stages.2.blocks.15.mlp.fc2.weight', 'stages.2.blocks.15.mlp.fc2.bias', 'stages.2.blocks.16.gamma', 'stages.2.blocks.16.conv_dw.weight', 'stages.2.blocks.16.conv_dw.bias', 'stages.2.blocks.16.norm.weight', 'stages.2.blocks.16.norm.bias', 'stages.2.blocks.16.mlp.fc1.weight', 'stages.2.blocks.16.mlp.fc1.bias', 'stages.2.blocks.16.mlp.fc2.weight', 'stages.2.blocks.16.mlp.fc2.bias', 'stages.2.blocks.17.gamma', 'stages.2.blocks.17.conv_dw.weight', 'stages.2.blocks.17.conv_dw.bias', 'stages.2.blocks.17.norm.weight', 'stages.2.blocks.17.norm.bias', 'stages.2.blocks.17.mlp.fc1.weight', 'stages.2.blocks.17.mlp.fc1.bias', 'stages.2.blocks.17.mlp.fc2.weight', 'stages.2.blocks.17.mlp.fc2.bias', 'stages.2.blocks.18.gamma', 'stages.2.blocks.18.conv_dw.weight', 'stages.2.blocks.18.conv_dw.bias', 'stages.2.blocks.18.norm.weight', 'stages.2.blocks.18.norm.bias', 'stages.2.blocks.18.mlp.fc1.weight', 'stages.2.blocks.18.mlp.fc1.bias', 'stages.2.blocks.18.mlp.fc2.weight', 'stages.2.blocks.18.mlp.fc2.bias', 'stages.2.blocks.19.gamma', 'stages.2.blocks.19.conv_dw.weight', 'stages.2.blocks.19.conv_dw.bias', 'stages.2.blocks.19.norm.weight', 'stages.2.blocks.19.norm.bias', 'stages.2.blocks.19.mlp.fc1.weight', 'stages.2.blocks.19.mlp.fc1.bias', 'stages.2.blocks.19.mlp.fc2.weight', 'stages.2.blocks.19.mlp.fc2.bias', 'stages.2.blocks.20.gamma', 'stages.2.blocks.20.conv_dw.weight', 'stages.2.blocks.20.conv_dw.bias', 'stages.2.blocks.20.norm.weight', 'stages.2.blocks.20.norm.bias', 'stages.2.blocks.20.mlp.fc1.weight', 'stages.2.blocks.20.mlp.fc1.bias', 'stages.2.blocks.20.mlp.fc2.weight', 'stages.2.blocks.20.mlp.fc2.bias', 'stages.2.blocks.21.gamma', 'stages.2.blocks.21.conv_dw.weight', 'stages.2.blocks.21.conv_dw.bias', 'stages.2.blocks.21.norm.weight', 'stages.2.blocks.21.norm.bias', 'stages.2.blocks.21.mlp.fc1.weight', 'stages.2.blocks.21.mlp.fc1.bias', 'stages.2.blocks.21.mlp.fc2.weight', 'stages.2.blocks.21.mlp.fc2.bias', 'stages.2.blocks.22.gamma', 'stages.2.blocks.22.conv_dw.weight', 'stages.2.blocks.22.conv_dw.bias', 'stages.2.blocks.22.norm.weight', 'stages.2.blocks.22.norm.bias', 'stages.2.blocks.22.mlp.fc1.weight', 'stages.2.blocks.22.mlp.fc1.bias', 'stages.2.blocks.22.mlp.fc2.weight', 'stages.2.blocks.22.mlp.fc2.bias', 'stages.2.blocks.23.gamma', 'stages.2.blocks.23.conv_dw.weight', 'stages.2.blocks.23.conv_dw.bias', 'stages.2.blocks.23.norm.weight', 'stages.2.blocks.23.norm.bias', 'stages.2.blocks.23.mlp.fc1.weight', 'stages.2.blocks.23.mlp.fc1.bias', 'stages.2.blocks.23.mlp.fc2.weight', 'stages.2.blocks.23.mlp.fc2.bias', 'stages.2.blocks.24.gamma', 'stages.2.blocks.24.conv_dw.weight', 'stages.2.blocks.24.conv_dw.bias', 'stages.2.blocks.24.norm.weight', 'stages.2.blocks.24.norm.bias', 'stages.2.blocks.24.mlp.fc1.weight', 'stages.2.blocks.24.mlp.fc1.bias', 'stages.2.blocks.24.mlp.fc2.weight', 'stages.2.blocks.24.mlp.fc2.bias', 'stages.2.blocks.25.gamma', 'stages.2.blocks.25.conv_dw.weight', 'stages.2.blocks.25.conv_dw.bias', 'stages.2.blocks.25.norm.weight', 'stages.2.blocks.25.norm.bias', 'stages.2.blocks.25.mlp.fc1.weight', 'stages.2.blocks.25.mlp.fc1.bias', 'stages.2.blocks.25.mlp.fc2.weight', 'stages.2.blocks.25.mlp.fc2.bias', 'stages.2.blocks.26.gamma', 'stages.2.blocks.26.conv_dw.weight', 'stages.2.blocks.26.conv_dw.bias', 'stages.2.blocks.26.norm.weight', 'stages.2.blocks.26.norm.bias', 'stages.2.blocks.26.mlp.fc1.weight', 'stages.2.blocks.26.mlp.fc1.bias', 'stages.2.blocks.26.mlp.fc2.weight', 'stages.2.blocks.26.mlp.fc2.bias', 'stages.3.downsample.0.weight', 'stages.3.downsample.0.bias', 'stages.3.downsample.1.weight', 'stages.3.downsample.1.bias', 'stages.3.blocks.0.gamma', 'stages.3.blocks.0.conv_dw.weight', 'stages.3.blocks.0.conv_dw.bias', 'stages.3.blocks.0.norm.weight', 'stages.3.blocks.0.norm.bias', 'stages.3.blocks.0.mlp.fc1.weight', 'stages.3.blocks.0.mlp.fc1.bias', 'stages.3.blocks.0.mlp.fc2.weight', 'stages.3.blocks.0.mlp.fc2.bias', 'stages.3.blocks.1.gamma', 'stages.3.blocks.1.conv_dw.weight', 'stages.3.blocks.1.conv_dw.bias', 'stages.3.blocks.1.norm.weight', 'stages.3.blocks.1.norm.bias', 'stages.3.blocks.1.mlp.fc1.weight', 'stages.3.blocks.1.mlp.fc1.bias', 'stages.3.blocks.1.mlp.fc2.weight', 'stages.3.blocks.1.mlp.fc2.bias', 'stages.3.blocks.2.gamma', 'stages.3.blocks.2.conv_dw.weight', 'stages.3.blocks.2.conv_dw.bias', 'stages.3.blocks.2.norm.weight', 'stages.3.blocks.2.norm.bias', 'stages.3.blocks.2.mlp.fc1.weight', 'stages.3.blocks.2.mlp.fc1.bias', 'stages.3.blocks.2.mlp.fc2.weight', 'stages.3.blocks.2.mlp.fc2.bias', 'head.norm.weight', 'head.norm.bias', 'head.fc.weight', 'head.fc.bias'], unexpected_keys=['downsample_layers.0.0.weight', 'downsample_layers.0.0.bias', 'downsample_layers.0.1.weight', 'downsample_layers.0.1.bias', 'downsample_layers.1.0.weight', 'downsample_layers.1.0.bias', 'downsample_layers.1.1.weight', 'downsample_layers.1.1.bias', 'downsample_layers.2.0.weight', 'downsample_layers.2.0.bias', 'downsample_layers.2.1.weight', 'downsample_layers.2.1.bias', 'downsample_layers.3.0.weight', 'downsample_layers.3.0.bias', 'downsample_layers.3.1.weight', 'downsample_layers.3.1.bias', 'stages.0.0.gamma', 'stages.0.0.dwconv.weight', 'stages.0.0.dwconv.bias', 'stages.0.0.norm.weight', 'stages.0.0.norm.bias', 'stages.0.0.pwconv1.weight', 'stages.0.0.pwconv1.bias', 'stages.0.0.pwconv2.weight', 'stages.0.0.pwconv2.bias', 'stages.0.1.gamma', 'stages.0.1.dwconv.weight', 'stages.0.1.dwconv.bias', 'stages.0.1.norm.weight', 'stages.0.1.norm.bias', 'stages.0.1.pwconv1.weight', 'stages.0.1.pwconv1.bias', 'stages.0.1.pwconv2.weight', 'stages.0.1.pwconv2.bias', 'stages.0.2.gamma', 'stages.0.2.dwconv.weight', 'stages.0.2.dwconv.bias', 'stages.0.2.norm.weight', 'stages.0.2.norm.bias', 'stages.0.2.pwconv1.weight', 'stages.0.2.pwconv1.bias', 'stages.0.2.pwconv2.weight', 'stages.0.2.pwconv2.bias', 'stages.1.0.gamma', 'stages.1.0.dwconv.weight', 'stages.1.0.dwconv.bias', 'stages.1.0.norm.weight', 'stages.1.0.norm.bias', 'stages.1.0.pwconv1.weight', 'stages.1.0.pwconv1.bias', 'stages.1.0.pwconv2.weight', 'stages.1.0.pwconv2.bias', 'stages.1.1.gamma', 'stages.1.1.dwconv.weight', 'stages.1.1.dwconv.bias', 'stages.1.1.norm.weight', 'stages.1.1.norm.bias', 'stages.1.1.pwconv1.weight', 'stages.1.1.pwconv1.bias', 'stages.1.1.pwconv2.weight', 'stages.1.1.pwconv2.bias', 'stages.1.2.gamma', 'stages.1.2.dwconv.weight', 'stages.1.2.dwconv.bias', 'stages.1.2.norm.weight', 'stages.1.2.norm.bias', 'stages.1.2.pwconv1.weight', 'stages.1.2.pwconv1.bias', 'stages.1.2.pwconv2.weight', 'stages.1.2.pwconv2.bias', 'stages.2.0.gamma', 'stages.2.0.dwconv.weight', 'stages.2.0.dwconv.bias', 'stages.2.0.norm.weight', 'stages.2.0.norm.bias', 'stages.2.0.pwconv1.weight', 'stages.2.0.pwconv1.bias', 'stages.2.0.pwconv2.weight', 'stages.2.0.pwconv2.bias', 'stages.2.1.gamma', 'stages.2.1.dwconv.weight', 'stages.2.1.dwconv.bias', 'stages.2.1.norm.weight', 'stages.2.1.norm.bias', 'stages.2.1.pwconv1.weight', 'stages.2.1.pwconv1.bias', 'stages.2.1.pwconv2.weight', 'stages.2.1.pwconv2.bias', 'stages.2.2.gamma', 'stages.2.2.dwconv.weight', 'stages.2.2.dwconv.bias', 'stages.2.2.norm.weight', 'stages.2.2.norm.bias', 'stages.2.2.pwconv1.weight', 'stages.2.2.pwconv1.bias', 'stages.2.2.pwconv2.weight', 'stages.2.2.pwconv2.bias', 'stages.2.3.gamma', 'stages.2.3.dwconv.weight', 'stages.2.3.dwconv.bias', 'stages.2.3.norm.weight', 'stages.2.3.norm.bias', 'stages.2.3.pwconv1.weight', 'stages.2.3.pwconv1.bias', 'stages.2.3.pwconv2.weight', 'stages.2.3.pwconv2.bias', 'stages.2.4.gamma', 'stages.2.4.dwconv.weight', 'stages.2.4.dwconv.bias', 'stages.2.4.norm.weight', 'stages.2.4.norm.bias', 'stages.2.4.pwconv1.weight', 'stages.2.4.pwconv1.bias', 'stages.2.4.pwconv2.weight', 'stages.2.4.pwconv2.bias', 'stages.2.5.gamma', 'stages.2.5.dwconv.weight', 'stages.2.5.dwconv.bias', 'stages.2.5.norm.weight', 'stages.2.5.norm.bias', 'stages.2.5.pwconv1.weight', 'stages.2.5.pwconv1.bias', 'stages.2.5.pwconv2.weight', 'stages.2.5.pwconv2.bias', 'stages.2.6.gamma', 'stages.2.6.dwconv.weight', 'stages.2.6.dwconv.bias', 'stages.2.6.norm.weight', 'stages.2.6.norm.bias', 'stages.2.6.pwconv1.weight', 'stages.2.6.pwconv1.bias', 'stages.2.6.pwconv2.weight', 'stages.2.6.pwconv2.bias', 'stages.2.7.gamma', 'stages.2.7.dwconv.weight', 'stages.2.7.dwconv.bias', 'stages.2.7.norm.weight', 'stages.2.7.norm.bias', 'stages.2.7.pwconv1.weight', 'stages.2.7.pwconv1.bias', 'stages.2.7.pwconv2.weight', 'stages.2.7.pwconv2.bias', 'stages.2.8.gamma', 'stages.2.8.dwconv.weight', 'stages.2.8.dwconv.bias', 'stages.2.8.norm.weight', 'stages.2.8.norm.bias', 'stages.2.8.pwconv1.weight', 'stages.2.8.pwconv1.bias', 'stages.2.8.pwconv2.weight', 'stages.2.8.pwconv2.bias', 'stages.2.9.gamma', 'stages.2.9.dwconv.weight', 'stages.2.9.dwconv.bias', 'stages.2.9.norm.weight', 'stages.2.9.norm.bias', 'stages.2.9.pwconv1.weight', 'stages.2.9.pwconv1.bias', 'stages.2.9.pwconv2.weight', 'stages.2.9.pwconv2.bias', 'stages.2.10.gamma', 'stages.2.10.dwconv.weight', 'stages.2.10.dwconv.bias', 'stages.2.10.norm.weight', 'stages.2.10.norm.bias', 'stages.2.10.pwconv1.weight', 'stages.2.10.pwconv1.bias', 'stages.2.10.pwconv2.weight', 'stages.2.10.pwconv2.bias', 'stages.2.11.gamma', 'stages.2.11.dwconv.weight', 'stages.2.11.dwconv.bias', 'stages.2.11.norm.weight', 'stages.2.11.norm.bias', 'stages.2.11.pwconv1.weight', 'stages.2.11.pwconv1.bias', 'stages.2.11.pwconv2.weight', 'stages.2.11.pwconv2.bias', 'stages.2.12.gamma', 'stages.2.12.dwconv.weight', 'stages.2.12.dwconv.bias', 'stages.2.12.norm.weight', 'stages.2.12.norm.bias', 'stages.2.12.pwconv1.weight', 'stages.2.12.pwconv1.bias', 'stages.2.12.pwconv2.weight', 'stages.2.12.pwconv2.bias', 'stages.2.13.gamma', 'stages.2.13.dwconv.weight', 'stages.2.13.dwconv.bias', 'stages.2.13.norm.weight', 'stages.2.13.norm.bias', 'stages.2.13.pwconv1.weight', 'stages.2.13.pwconv1.bias', 'stages.2.13.pwconv2.weight', 'stages.2.13.pwconv2.bias', 'stages.2.14.gamma', 'stages.2.14.dwconv.weight', 'stages.2.14.dwconv.bias', 'stages.2.14.norm.weight', 'stages.2.14.norm.bias', 'stages.2.14.pwconv1.weight', 'stages.2.14.pwconv1.bias', 'stages.2.14.pwconv2.weight', 'stages.2.14.pwconv2.bias', 'stages.2.15.gamma', 'stages.2.15.dwconv.weight', 'stages.2.15.dwconv.bias', 'stages.2.15.norm.weight', 'stages.2.15.norm.bias', 'stages.2.15.pwconv1.weight', 'stages.2.15.pwconv1.bias', 'stages.2.15.pwconv2.weight', 'stages.2.15.pwconv2.bias', 'stages.2.16.gamma', 'stages.2.16.dwconv.weight', 'stages.2.16.dwconv.bias', 'stages.2.16.norm.weight', 'stages.2.16.norm.bias', 'stages.2.16.pwconv1.weight', 'stages.2.16.pwconv1.bias', 'stages.2.16.pwconv2.weight', 'stages.2.16.pwconv2.bias', 'stages.2.17.gamma', 'stages.2.17.dwconv.weight', 'stages.2.17.dwconv.bias', 'stages.2.17.norm.weight', 'stages.2.17.norm.bias', 'stages.2.17.pwconv1.weight', 'stages.2.17.pwconv1.bias', 'stages.2.17.pwconv2.weight', 'stages.2.17.pwconv2.bias', 'stages.2.18.gamma', 'stages.2.18.dwconv.weight', 'stages.2.18.dwconv.bias', 'stages.2.18.norm.weight', 'stages.2.18.norm.bias', 'stages.2.18.pwconv1.weight', 'stages.2.18.pwconv1.bias', 'stages.2.18.pwconv2.weight', 'stages.2.18.pwconv2.bias', 'stages.2.19.gamma', 'stages.2.19.dwconv.weight', 'stages.2.19.dwconv.bias', 'stages.2.19.norm.weight', 'stages.2.19.norm.bias', 'stages.2.19.pwconv1.weight', 'stages.2.19.pwconv1.bias', 'stages.2.19.pwconv2.weight', 'stages.2.19.pwconv2.bias', 'stages.2.20.gamma', 'stages.2.20.dwconv.weight', 'stages.2.20.dwconv.bias', 'stages.2.20.norm.weight', 'stages.2.20.norm.bias', 'stages.2.20.pwconv1.weight', 'stages.2.20.pwconv1.bias', 'stages.2.20.pwconv2.weight', 'stages.2.20.pwconv2.bias', 'stages.2.21.gamma', 'stages.2.21.dwconv.weight', 'stages.2.21.dwconv.bias', 'stages.2.21.norm.weight', 'stages.2.21.norm.bias', 'stages.2.21.pwconv1.weight', 'stages.2.21.pwconv1.bias', 'stages.2.21.pwconv2.weight', 'stages.2.21.pwconv2.bias', 'stages.2.22.gamma', 'stages.2.22.dwconv.weight', 'stages.2.22.dwconv.bias', 'stages.2.22.norm.weight', 'stages.2.22.norm.bias', 'stages.2.22.pwconv1.weight', 'stages.2.22.pwconv1.bias', 'stages.2.22.pwconv2.weight', 'stages.2.22.pwconv2.bias', 'stages.2.23.gamma', 'stages.2.23.dwconv.weight', 'stages.2.23.dwconv.bias', 'stages.2.23.norm.weight', 'stages.2.23.norm.bias', 'stages.2.23.pwconv1.weight', 'stages.2.23.pwconv1.bias', 'stages.2.23.pwconv2.weight', 'stages.2.23.pwconv2.bias', 'stages.2.24.gamma', 'stages.2.24.dwconv.weight', 'stages.2.24.dwconv.bias', 'stages.2.24.norm.weight', 'stages.2.24.norm.bias', 'stages.2.24.pwconv1.weight', 'stages.2.24.pwconv1.bias', 'stages.2.24.pwconv2.weight', 'stages.2.24.pwconv2.bias', 'stages.2.25.gamma', 'stages.2.25.dwconv.weight', 'stages.2.25.dwconv.bias', 'stages.2.25.norm.weight', 'stages.2.25.norm.bias', 'stages.2.25.pwconv1.weight', 'stages.2.25.pwconv1.bias', 'stages.2.25.pwconv2.weight', 'stages.2.25.pwconv2.bias', 'stages.2.26.gamma', 'stages.2.26.dwconv.weight', 'stages.2.26.dwconv.bias', 'stages.2.26.norm.weight', 'stages.2.26.norm.bias', 'stages.2.26.pwconv1.weight', 'stages.2.26.pwconv1.bias', 'stages.2.26.pwconv2.weight', 'stages.2.26.pwconv2.bias', 'stages.3.0.gamma', 'stages.3.0.dwconv.weight', 'stages.3.0.dwconv.bias', 'stages.3.0.norm.weight', 'stages.3.0.norm.bias', 'stages.3.0.pwconv1.weight', 'stages.3.0.pwconv1.bias', 'stages.3.0.pwconv2.weight', 'stages.3.0.pwconv2.bias', 'stages.3.1.gamma', 'stages.3.1.dwconv.weight', 'stages.3.1.dwconv.bias', 'stages.3.1.norm.weight', 'stages.3.1.norm.bias', 'stages.3.1.pwconv1.weight', 'stages.3.1.pwconv1.bias', 'stages.3.1.pwconv2.weight', 'stages.3.1.pwconv2.bias', 'stages.3.2.gamma', 'stages.3.2.dwconv.weight', 'stages.3.2.dwconv.bias', 'stages.3.2.norm.weight', 'stages.3.2.norm.bias', 'stages.3.2.pwconv1.weight', 'stages.3.2.pwconv1.bias', 'stages.3.2.pwconv2.weight', 'stages.3.2.pwconv2.bias'])

It seems that the keys are not aligned My timm version is 0.5.4

Looking forward to your reply Liu Yangfan 2023.9.28

keyu-tian commented 10 months ago

The reason: our codebase uses convnext's official definition (downstream_imagenet/models/convnext_official.py) rather than timm.ConvNeXt.

Solution: import this convnext_official.py file before timm.create_model('convnext_small'). Our definitions and registrations in convnext_official.py will override and replace those of timm.

lyangfan commented 10 months ago

Thanks for your reply!