open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.61k stars 9.47k forks source link

backbone errors #332

Closed luogen1996 closed 4 years ago

luogen1996 commented 5 years ago

when i change backbone to 'resnet34', i get these error: RuntimeError: Given groups=1, weight of size [256, 256, 1, 1], expected input[1, 64, 256, 344] to have 256 channels, but got 64 channels instead

ZhihuaGao commented 5 years ago

You should change the cfg

in_channels

to mathch your resnet34 backbone....

luogen1996 commented 5 years ago

You should change the cfg

in_channels

to mathch your resnet34 backbone....

it works. thank you so much!

CF2220160244 commented 5 years ago

@luogen1996 i only change this : in_channels=[64, 128, 256, 512] can you help me, when i train my data, the loss is very large as :

s_cls: 0.0313, s1.acc: 98.8535, s1.loss_reg: 0.0153, s2.loss_cls: 0.0106, s2.acc: 99.3574, s2.loss_reg: 0.0022, loss: 0.2922 2019-02-19 16:37:08,917 - INFO - Epoch [1][250/2500] lr: 0.01331, eta: 12:29:56, time: 0.753, data_time: 0.003, loss_rpn_cls: 0.0520, loss_rpn_reg: 0.0148, s0.loss_cls: 0.1502, s0.acc: 96.9824, s0.loss_reg: 0.0659, s1.loss_cls: 0.0407, s1.acc: 98.4824, s1.loss_reg: 0.0233, s2.loss_cls: 0.0125, s2.acc: 99.2676, s2.loss_reg: 0.0028, loss: 0.3622 2019-02-19 16:37:46,870 - INFO - Epoch [1][300/2500] lr: 0.01464, eta: 12:30:18, time: 0.759, data_time: 0.004, loss_rpn_cls: 0.0447, loss_rpn_reg: 0.0168, s0.loss_cls: 0.1202, s0.acc: 97.1973, s0.loss_reg: 0.0520, s1.loss_cls: 0.0414, s1.acc: 98.0254, s1.loss_reg: 0.0322, s2.loss_cls: 0.0128, s2.acc: 99.0586, s2.loss_reg: 0.0061, loss: 0.3262 2019-02-19 16:38:25,328 - INFO - Epoch [1][350/2500] lr: 0.01597, eta: 12:31:49, time: 0.769, data_time: 0.003, loss_rpn_cls: 0.0563, loss_rpn_reg: 0.0154, s0.loss_cls: 0.1260, s0.acc: 97.2832, s0.loss_reg: 0.0501, s1.loss_cls: 0.0488, s1.acc: 97.7637, s1.loss_reg: 0.0402, s2.loss_cls: 0.0174, s2.acc: 98.5859, s2.loss_reg: 0.0105, loss: 0.3648 2019-02-19 16:39:02,281 - INFO - Epoch [1][400/2500] lr: 0.01731, eta: 12:29:03, time: 0.739, data_time: 0.003, loss_rpn_cls: 23.7278, loss_rpn_reg: 28.0340, s0.loss_cls: 417.9379, s0.acc: 96.8261, s0.loss_reg: 190.3232, s1.loss_cls: 22.1440, s1.acc: 97.6402, s1.loss_reg: 2.6521, s2.loss_cls: 12.1844, s2.acc: 99.2803, s2.loss_reg: 0.4299, loss: 697.4334 2019-02-19 16:39:33,124 - INFO - Epoch [1][450/2500] lr: 0.01864, eta: 12:13:17, time: 0.617, data_time: 0.003, loss_rpn_cls: 25152594.3785, loss_rpn_reg: 3851770.1321, s0.loss_cls: 8637201.6297, s0.acc: 73.8330, s0.loss_reg: 4083896.6927, s1.loss_cls: 1117100.5227, s1.acc: 84.8878, s1.loss_reg: 5527708.0596, s2.loss_cls: 302149.7116, s2.acc: 88.0995, s2.loss_reg: 46457.3539, loss: 48718880.8694 2019-02-19 16:40:02,494 - INFO - Epoch [1][500/2500] lr: 0.01997, eta: 11:57:39, time: 0.587, data_time: 0.003, loss_rpn_cls: 9212548204839.1953, loss_rpn_reg: 2969416261400.9214, s0.loss_cls: 69064999831716.8828, s0.acc: 50.1057, s0.loss_reg: 85041669332823.6562, s1.loss_cls: 10721497235618.1406, s1.acc: 45.0701, s1.loss_reg: 42140222317317.6250, s2.loss_cls: 1663234368924.2700, s2.acc: 53.7911, s2.loss_reg: 10114430675285.6719, loss: 230928031131482.3125

ZhihuaGao commented 5 years ago

please show your configure

CF2220160244 commented 5 years ago

@AresGao

model settings

model = dict( type='CascadeRCNN', num_stages=3, pretrained='modelzoo://resnet34',

pretrained='/home/chenfei/Downloads/xception-c0a72b38.pth.tar',

backbone=dict(
    type='ResNet',
    depth=34,
    num_stages=4,
    out_indices=(0, 1, 2, 3),
    frozen_stages=1,
    style='pytorch'),
neck=dict(
    type='FPN',
    in_channels=[64, 128, 256, 512],
    out_channels=256,
    num_outs=5),
rpn_head=dict(
    type='RPNHead',
    in_channels=256,
    feat_channels=256,
    anchor_scales=[8],
    anchor_ratios=[0.5, 1.0, 2.0],
    anchor_strides=[4, 8, 16, 32, 64],
    target_means=[.0, .0, .0, .0],
    target_stds=[1.0, 1.0, 1.0, 1.0],
    use_sigmoid_cls=True),
bbox_roi_extractor=dict(
    type='SingleRoIExtractor',
    roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
    out_channels=256,
    featmap_strides=[4, 8, 16, 32]),
bbox_head=[
    dict(
        type='SharedFCBBoxHead',
        num_fcs=2,
        in_channels=256,
        fc_out_channels=256,
        roi_feat_size=7,
        num_classes=14,
        target_means=[0., 0., 0., 0.],
        target_stds=[0.1, 0.1, 0.2, 0.2],
        reg_class_agnostic=True),
    dict(
        type='SharedFCBBoxHead',
        num_fcs=2,
        in_channels=256,
        fc_out_channels=256,
        roi_feat_size=7,
        num_classes=14,
        target_means=[0., 0., 0., 0.],
        target_stds=[0.05, 0.05, 0.1, 0.1],
        reg_class_agnostic=True),
    dict(
        type='SharedFCBBoxHead',
        num_fcs=2,
        in_channels=256,
        fc_out_channels=256,
        roi_feat_size=7,
        num_classes=14,
        target_means=[0., 0., 0., 0.],
        target_stds=[0.033, 0.033, 0.067, 0.067],
        reg_class_agnostic=True)
])

model training and testing settings

train_cfg = dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, smoothl1_beta=1 / 9.0, debug=False), rcnn=[ dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.7, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False) ], stage_loss_weights=[1, 0.5, 0.25]) test_cfg = dict( rpn=dict( nms_across_levels=False, nms_pre=2000, nms_post=2000, max_num=2000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100), keep_all_stages=False)

dataset settings

dataset_type = 'CustomDataset'

dataset_type = 'SignDataset' data_root = '/home/chenfei/taiwan/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) data = dict( imgs_per_gpu=2, workers_per_gpu=2, train=dict( type=dataset_type, ann_file=data_root + '/sign_train.pkl', img_prefix=data_root, img_scale=(1280, 720), img_norm_cfg=img_norm_cfg, size_divisor=32, flip_ratio=0.5, with_mask=False, with_crowd=True, with_label=True), val=dict( type=dataset_type, ann_file=data_root + '/sign_val.pkl', img_prefix=data_root, img_scale=(1280, 720), img_norm_cfg=img_norm_cfg, size_divisor=32, flip_ratio=0, with_mask=False, with_crowd=True, with_label=True), test=dict( type=dataset_type, ann_file=data_root + '/sign_val.pkl', img_prefix=data_root, img_scale=(1280, 720), img_norm_cfg=img_norm_cfg, size_divisor=32, flip_ratio=0, with_mask=False, with_label=False, test_mode=True))

optimizer

optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

learning policy

lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=1.0 / 3, step=[8, 11]) checkpoint_config = dict(interval=1)

yapf:disable

log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'),

dict(type='TensorboardLoggerHook')

])

yapf:enable

runtime settings

total_epochs = 24 dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = './work_dirs/cascade_rcnn_r50_fpn_1x' load_from = None resume_from = None workflow = [('train', 1)]

luogen1996 commented 5 years ago

@AresGao

model settings

model = dict( type='CascadeRCNN', num_stages=3, pretrained='modelzoo://resnet34',

pretrained='/home/chenfei/Downloads/xception-c0a72b38.pth.tar',

backbone=dict( type='ResNet', depth=34, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, style='pytorch'), neck=dict( type='FPN', in_channels=[64, 128, 256, 512], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_scales=[8], anchor_ratios=[0.5, 1.0, 2.0], anchor_strides=[4, 8, 16, 32, 64], target_means=[.0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0], use_sigmoid_cls=True), bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=[ dict( type='SharedFCBBoxHead', num_fcs=2, in_channels=256, fc_out_channels=256, roi_feat_size=7, num_classes=14, target_means=[0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2], reg_class_agnostic=True), dict( type='SharedFCBBoxHead', num_fcs=2, in_channels=256, fc_out_channels=256, roi_feat_size=7, num_classes=14, target_means=[0., 0., 0., 0.], target_stds=[0.05, 0.05, 0.1, 0.1], reg_class_agnostic=True), dict( type='SharedFCBBoxHead', num_fcs=2, in_channels=256, fc_out_channels=256, roi_feat_size=7, num_classes=14, target_means=[0., 0., 0., 0.], target_stds=[0.033, 0.033, 0.067, 0.067], reg_class_agnostic=True) ])

model training and testing settings

train_cfg = dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, smoothl1_beta=1 / 9.0, debug=False), rcnn=[ dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.7, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False) ], stage_loss_weights=[1, 0.5, 0.25]) test_cfg = dict( rpn=dict( nms_across_levels=False, nms_pre=2000, nms_post=2000, max_num=2000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100), keep_all_stages=False)

dataset settings

dataset_type = 'CustomDataset'

dataset_type = 'SignDataset' data_root = '/home/chenfei/taiwan/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) data = dict( imgs_per_gpu=2, workers_per_gpu=2, train=dict( type=dataset_type, ann_file=data_root + '/sign_train.pkl', img_prefix=data_root, img_scale=(1280, 720), img_norm_cfg=img_norm_cfg, size_divisor=32, flip_ratio=0.5, with_mask=False, with_crowd=True, with_label=True), val=dict( type=dataset_type, ann_file=data_root + '/sign_val.pkl', img_prefix=data_root, img_scale=(1280, 720), img_norm_cfg=img_norm_cfg, size_divisor=32, flip_ratio=0, with_mask=False, with_crowd=True, with_label=True), test=dict( type=dataset_type, ann_file=data_root + '/sign_val.pkl', img_prefix=data_root, img_scale=(1280, 720), img_norm_cfg=img_norm_cfg, size_divisor=32, flip_ratio=0, with_mask=False, with_label=False, test_mode=True))

optimizer

optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

learning policy

lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=1.0 / 3, step=[8, 11]) checkpoint_config = dict(interval=1)

yapf:disable

log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'),

dict(type='TensorboardLoggerHook')

])

yapf:enable

runtime settings

total_epochs = 24 dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = './work_dirs/cascade_rcnn_r50_fpn_1x' load_from = None resume_from = None workflow = [('train', 1)]

I have the same problem ,even when i use resnet50 as backbone. you can try to set warmup_iters larger,it works for me.

ZhihuaGao commented 5 years ago

How many gpus do you use for training? If you just train with one gpu, your learning rate should be lower...

CF2220160244 commented 5 years ago

you are right, i haven't change the learing rate. @AresGao . and can you give me some advice about using the Xception backbone, the mmdetection lib do not have the Xception backbone registry support. i try it and have this error: 'Xception is not in the backbone registry'

ghm666 commented 4 years ago

You should change the cfg

in_channels

to mathch your resnet34 backbone....

it works. thank you so much!

thank you very much, It helped me