open-mmlab / mmengine

OpenMMLab Foundational Library for Training Deep Learning Models
https://mmengine.readthedocs.io/
Apache License 2.0
1.16k stars 349 forks source link

[Bug] Not working `copying a param` when in_channels != 3 #776

Open okotaku opened 1 year ago

okotaku commented 1 year ago

Prerequisite

Environment

mmengine==0.3.2 mmcls==1.0.0rc3

Reproduces the problem - code sample

import mmengine
import torch
from mmcls.models import build_classifier

pretrained = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'
cfg = dict(
    type='ImageClassifier',
    backbone=dict(
        type='ResNet',
        in_channels=4,
        depth=50,
        num_stages=4,
        out_indices=(3, ),
        style='pytorch',
        init_cfg=dict(
            type='Pretrained', checkpoint=pretrained, prefix='backbone')),
    neck=dict(type='GlobalAveragePooling'),
    head=dict(
        type='LinearClsHead',
        num_classes=1000,
        in_channels=2048,
        loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
        topk=(1, 5),
    ))

cfg = mmengine.Config(dict(model=cfg))

model1 = build_classifier(cfg.model)
model2 = build_classifier(cfg.model)
model1.init_weights()
model2.init_weights()

# 入力層
print(torch.all(model1.backbone.conv1.weight == model2.backbone.conv1.weight))
>> tensor(False)

# 中間層
print(torch.all(model1.backbone.layer1[0].conv1.weight == model2.backbone.layer1[0].conv1.weight))
>> tensor(True)

Reproduces the problem - command or script

None

Reproduces the problem - error message

Loads checkpoint by http backend from path: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth
11/29 17:18:17 - mmengine - WARNING - The model and loaded state dict do not match exactly

size mismatch for conv1.weight: copying a param with shape torch.Size([64, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 4, 7, 7]).
11/29 17:18:17 - mmengine - INFO - 
backbone.conv1.weight - torch.Size([64, 4, 7, 7]): 
The value is the same before and after calling `init_weights` of ImageClassifier  

11/29 17:18:17 - mmengine - INFO - 
backbone.bn1.weight - torch.Size([64]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth 
...

Additional information

The message says size mismatch for conv1.weight: copying a param with shape torch.Size([64, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 4, 7, 7]). that the 3-channel weight is copied to the 4-channel. On the other hand, as it says The value is the same before and after callinginit_weightsof ImageClassifier, the loaded weight is not reflected.

I expect conv1.weight copying a param correctly to 4 channel weights.

zhouzaida commented 1 year ago

Hi @okotaku , thanks for your feedback. what is your expected behavior? I can't quite understand the meaning of I expect conv1.weight copying a param correctly to 4 channel weights..

okotaku commented 1 year ago

@zhouzaida Here is the example.

https://github.com/rwightman/pytorch-image-models/blob/ce4d3485b690837ba4e1cb4e0e6c4ed415e36cea/timm/models/helpers.py#L197