open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.45k stars 9.43k forks source link

Transferring pytorch resnet weights trained using another repo #5841

Closed pcicales closed 3 years ago

pcicales commented 3 years ago

Is there documentation on how to do this properly? I am trying to transfer weights of a resnet50 model trained using a different paradigm to my instance segmentation model resnet50 backbone. Any help is appreciated!

PeterVennerstrom commented 3 years ago

If there are different naming schemes in the model definition or differences in the architecture, those would probably need to be adjusted to load your pretrained weights using MMDetection's resnet50 implementation. It's possible your different paradigm matches MMDetection's resnet50, I think they use the Torchvision pretrained weights and naming scheme.

If they can't be loaded with the config..

init_cfg=dict(type='Pretrained', checkpoint='path/to/weights.pth')

..another approach would be add your backbone code to the model registry. Your backbone code would likely need some small adjustments to return the correct tensors output at each stage for the neck/head used in your instance segmentation model.

pcicales commented 3 years ago

@PeterVennerstrom thanks so much for your reply. I implemented what you recommended and got no errors (kind of expected as the model is from a Facebook repo, so it should follow naming conventions). Here is the log output; I noticed that I got no indicators that the model was loaded successfully, did I do it correctly in the config file? Additionally, what is the difference between load_from and init_cfg? Is init_cfg specific to the backbone?

2021-08-10 14:29:42,702 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA Quadro RTX 8000
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.2, V10.2.89
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.7.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.0
OpenCV: 4.5.2
MMCV: 1.3.5
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMDetection: 2.12.0+
------------------------------------------------------------

2021-08-10 14:29:44,303 - mmdet - INFO - Distributed training: True
2021-08-10 14:29:45,710 - mmdet - INFO - Config:
run_dir = 'redact'
model = dict(
    type='MaskRCNN',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    roi_head=dict(
        type='StandardRoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=25,
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=dict(
            type='FCNMaskHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=25,
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=-1,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            mask_size=28,
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100,
            mask_thr_binary=0.5)))
dataset_type = 'CocoDataset'
data_root = 'redact'
CLASSES = ('person', 'personal_mobility', 'stroller', 'wheelchair', 'cart',
           'barrier', 'debris', 'pushable_pullable', 'trafficcone', 'car',
           'suv', 'van', 'bus', 'trolley', 'tram', 'train', 'truck', 'trailer',
           'motorcycle', 'bicycle', 'construction', 'traffic_sign',
           'traffic_light', 'bicycle_rack', 'animal')
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='Resize', img_scale=(1600, 900), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1600, 900),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=16,
    workers_per_gpu=1,
    train=dict(
        type='CocoDataset',
        ann_file=
        'redact',
        img_prefix='redact',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
            dict(type='Resize', img_scale=(1600, 900), keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
        ],
        classes=('person', 'personal_mobility', 'stroller', 'wheelchair',
                 'cart', 'barrier', 'debris', 'pushable_pullable',
                 'trafficcone', 'car', 'suv', 'van', 'bus', 'trolley', 'tram',
                 'train', 'truck', 'trailer', 'motorcycle', 'bicycle',
                 'construction', 'traffic_sign', 'traffic_light',
                 'bicycle_rack', 'animal')),
    val=dict(
        type='CocoDataset',
        ann_file=
        'redact',
        img_prefix='redact',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1600, 900),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        classes=('person', 'personal_mobility', 'stroller', 'wheelchair',
                 'cart', 'barrier', 'debris', 'pushable_pullable',
                 'trafficcone', 'car', 'suv', 'van', 'bus', 'trolley', 'tram',
                 'train', 'truck', 'trailer', 'motorcycle', 'bicycle',
                 'construction', 'traffic_sign', 'traffic_light',
                 'bicycle_rack', 'animal')),
    test=dict(
        type='CocoDataset',
        ann_file=
        'redact',
        img_prefix='redact',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1600, 900),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        classes=('person', 'personal_mobility', 'stroller', 'wheelchair',
                 'cart', 'barrier', 'debris', 'pushable_pullable',
                 'trafficcone', 'car', 'suv', 'van', 'bus', 'trolley', 'tram',
                 'train', 'truck', 'trailer', 'motorcycle', 'bicycle',
                 'construction', 'traffic_sign', 'traffic_light',
                 'bicycle_rack', 'animal')))
evaluation = dict(metric=['bbox', 'segm'])
init_cfg = dict(
    type='Pretrained',
    checkpoint=
    'redact'
)
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=300,
    warmup_ratio=0.001,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)
log_config = dict(interval=20, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = 'redact'
load_from = None
resume_from = None
workflow = [('train', 1), ('val', 1)]
gpu_ids = range(0, 4)

2021-08-10 14:30:37,385 - mmdet - INFO - Start running, redact
2021-08-10 14:30:37,386 - mmdet - INFO - workflow: [('train', 1), ('val', 1)], max: 12 epochs
2021-08-10 14:33:18,733 - mmdet - INFO - Epoch [1][20/944] lr: 1.285e-03, eta: 1 day, 1:17:44, time: 8.053, data_time: 2.858, memory: 34294, loss_rpn_cls: 0.6900, loss_rpn_bbox: 0.0879, loss_cls: 1.3905, acc: 72.8334, loss_bbox: 0.0192, loss_mask: 0.7013, loss: 2.8888
PeterVennerstrom commented 3 years ago

backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'))

The init_cfg pointing at your weights should be used in the backbone dictionary in the config.

load_from loads weights for the entire model: backbone, neck and head.

There are examples of model conversions from other repos in tools/model_converters.

Also, the underlying code which handles the loading of weights for the MMDetection backbone implementations is in MMCV. The MMDetection BaseDetector inherits from the MMCV BaseModule.

pcicales commented 3 years ago

@PeterVennerstrom ah now I see. I think I got confused by pretrained that comes before the backbone dict in the default config files. I guess that is just an alternative way to load a pretrained backbone.

PeterVennerstrom commented 3 years ago

Yes, pretrained is deprecated, but still works and calls the same underlying code.

pcicales commented 3 years ago

@PeterVennerstrom thank you, closing this issue now.