huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
https://huggingface.co/docs/timm
Apache License 2.0
32.35k stars 4.76k forks source link

[BUG] Can't load pretrained mobilenetv3_small with catavgmax pooling #1561

Closed alicanb closed 2 years ago

alicanb commented 2 years ago

Describe the bug Trying to load mobilenetv3_small with catavgmax pooling on timm 0.6.11 gives following error:

RuntimeError: Error(s) in loading state_dict for MobileNetV3:
        size mismatch for conv_head.weight: copying a param with shape torch.Size([1024, 576, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1152, 1, 1]).

To Reproduce

  1. run:
    import timm
    timm.create_model(model_name='mobilenetv3_small_100', pretrained=True, global_pool='catavgmax',num_classes=2)

Expected behavior model should load

Desktop (please complete the following information):

rwightman commented 2 years ago

@alicanb that one is not so easy to support, https://github.com/rwightman/pytorch-image-models/blob/main/timm/models/mobilenetv3.py#L187

For mobilenetv3, there is an extra layer after the pooling that doesn't exist for most other nets. Using catavgmax doubles with # features so requires reseting and reconfiguring that instead of just the final classifier layer. I do not currently have a clean mechanism to support this generically (to avoid per model customization) although had some designs re more flexible head adaptation that might cover this (but not going to happen right away).

For now, two alternatives that can work (but not ideal)

alicanb commented 2 years ago

sounds reasonable 👍