pengzhao-life commented 2 years ago

Hi, I run into same problem as the link below, i.e. when using lovasz loss with loss_type='binary', the following problems will occur IndexError: The shape of the mask [....] at index 0 does not match the shape of the indexed tensor [....] at index 0. I noticed that #1036 has been closed. I wonder how it was handled. Thanks!

https://github.com/open-mmlab/mmsegmentation/issues/1036

xiexinch commented 2 years ago

Hi @pengzhao-life Could you provide your model config? If the model is not used for binary classification, the error occurs.

pengzhao-life commented 2 years ago

Hi, Thank you so much for the response! My settings are as below:

This is for (a binary) semantic segmentation, each pixel is classified as object pixel or background pixel.
Swin-B is backbone, and UPernet is for segmentation.
config file is as follows:
cfg = Config.fromfile('../configs/swin/upernet_swin_base_patch4_window12_512x512_160k_ade20k_pretrain_384x384_22K.py')
from mmseg.apis import set_random_seed

Since we use only one GPU, BN is used instead of SyncBN

cfg.norm_cfg = dict(type='BN', requires_grad=True)

cfg.model.backbone.norm_cfg = cfg.norm_cfg

cfg.model.decode_head.norm_cfg = cfg.norm_cfg cfg.model.auxiliary_head.norm_cfg = cfg.norm_cfg

modify num classes of the model in decode/auxiliary head

cfg.model.decode_head.num_classes = 2 cfg.model.auxiliary_head.num_classes = 2

Modify dataset type and path

cfg.dataset_type = 'StanfordBackgroundDataset' cfg.data_root = data_root

cfg.data.samples_per_gpu = 8 cfg.data.workers_per_gpu=8

crop_size = (384,384) # (256,256) image_scale = (512,512) # (320, 240)

cfg.img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) cfg.crop_size = crop_size cfg.train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=image_scale, ratio_range=(0.5, 2.0)), dict(type='RandomCrop', crop_size=cfg.crop_size, cat_max_ratio=0.75), dict(type='RandomFlip', flip_ratio=0.5), dict(type='PhotoMetricDistortion'), dict(type='Normalize', **cfg.img_norm_cfg), dict(type='Pad', size=cfg.crop_size, pad_val=0, seg_pad_val=255), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_semantic_seg']), ]

cfg.test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=image_scale,

img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],

    flip=False,
    transforms=[
        dict(type='Resize', keep_ratio=True),
        dict(type='RandomFlip'),
        dict(type='Normalize', **cfg.img_norm_cfg),
        dict(type='ImageToTensor', keys=['img']),
        dict(type='Collect', keys=['img']),
    ])

]

cfg.data.train.type = cfg.dataset_type cfg.data.train.data_root = cfg.data_root cfg.data.train.img_dir = img_dir cfg.data.train.ann_dir = ann_dir cfg.data.train.pipeline = cfg.train_pipeline cfg.data.train.split = 'splits/train.txt'

cfg.data.val.type = cfg.dataset_type cfg.data.val.data_root = cfg.data_root cfg.data.val.img_dir = img_dir cfg.data.val.ann_dir = ann_dir cfg.data.val.pipeline = cfg.test_pipeline cfg.data.val.split = 'splits/val.txt'

cfg.data.test.type = cfg.dataset_type cfg.data.test.data_root = cfg.data_root cfg.data.test.img_dir = img_dir cfg.data.test.ann_dir = ann_dir cfg.data.test.pipeline = cfg.test_pipeline cfg.data.test.split = 'splits/val.txt'

We can still use the pre-trained Mask RCNN model though we do not need to

use the mask branch

cfg.load_from = '../../../../checkpoints/upernet_swin_base_patch4_window12_512x512_160k_ade20k_pretrain_384x384_22K_20210531_125459-429057bf.pth'

Set up working dir to save files and logs.

cfg.work_dir = '/tmp/peng-tmp'

cfg.runner.max_iters = 10000 cfg.log_config.interval = 10 cfg.evaluation.interval = 200 cfg.checkpoint_config.interval = 200

Set seed to facitate reproducing the result

cfg.seed = 0 set_random_seed(0, deterministic=False) cfg.gpu_ids = range(1)

Let's have a look at the final config used for training

print(f'Config:\n{cfg.pretty_text}')

in 'upernet_swin.py', the loss function is modified as below:
norm_cfg = dict(type='SyncBN', requires_grad=True) backbone_norm_cfg = dict(type='LN', requires_grad=True) model = dict( type='EncoderDecoder', pretrained=None, backbone=dict( type='SwinTransformer', pretrain_img_size=224, embed_dims=96, patch_size=4, window_size=7, mlp_ratio=4, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], strides=(4, 2, 2, 2), out_indices=(0, 1, 2, 3), qkv_bias=True, qk_scale=None, patch_norm=True, drop_rate=0., attn_drop_rate=0., drop_path_rate=0.3, use_abs_pos_embed=False, act_cfg=dict(type='GELU'), norm_cfg=backbone_norm_cfg), decode_head=dict( type='UPerHead', in_channels=[96, 192, 384, 768], in_index=[0, 1, 2, 3], pool_scales=(1, 2, 3, 6), channels=512, dropout_ratio=0.1, num_classes=19, norm_cfg=norm_cfg, align_corners=False, loss_decode=dict( type='LovaszLoss', loss_type='binary', reduction='none', loss_weight=1.0, loss_name='loss_lovasz')), auxiliary_head=dict( type='FCNHead', in_channels=384, in_index=2, channels=256, num_convs=1, concat_input=False, dropout_ratio=0.1, num_classes=19, norm_cfg=norm_cfg, align_corners=False, loss_decode=dict( type='LovaszLoss', loss_type='binary', reduction='none', loss_weight=0.4, loss_name='loss_lovasz')),

model training and testing settings

train_cfg=dict(), test_cfg=dict(mode='whole'))
Train and evaluation:
Build the dataset

datasets = [build_dataset(cfg.data.train)]

Build the detector

model = build_segmentor(cfg.model)

Add an attribute for visualization convenience

model.CLASSES = datasets[0].CLASSES

Create work_dir

mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir)) train_segmentor(model, datasets, cfg, distributed=False, validate=True, meta=dict())

Thanks, Peng

From: 谢昕辰 @.> Sent: Monday, June 20, 2022 11:28 PM To: open-mmlab/mmsegmentation @.> Cc: Peng Zhao @.>; Mention @.> Subject: Re: [open-mmlab/mmsegmentation] I got same as reported in Lovasz loss #1036 (Issue #1681)

Hi @pengzhao-lifehttps://github.com/pengzhao-life Could you provide your model config? If the model is not used for binary classification, the error occurs.

― Reply to this email directly, view it on GitHubhttps://github.com/open-mmlab/mmsegmentation/issues/1681#issuecomment-1161199078, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZZO2JI7RPCW36R2X2KS53VQEZGZANCNFSM5ZHL23OQ. You are receiving this because you were mentioned.Message ID: @.***>

MengzhangLI commented 2 years ago

As for binary segmentation, num_classes should be set 1 rather than 2. You can read 4.1.1 of PRML to know two classes problem.

You can try to modify your config, as configs/_base_/models/fcn_unet_s5-d16.py for example, like:

# model settings
norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(
        type='UNet',
        in_channels=3,
        base_channels=64,
        num_stages=5,
        strides=(1, 1, 1, 1, 1),
        enc_num_convs=(2, 2, 2, 2, 2),
        dec_num_convs=(2, 2, 2, 2),
        downsamples=(True, True, True, True),
        enc_dilations=(1, 1, 1, 1, 1),
        dec_dilations=(1, 1, 1, 1),
        with_cp=False,
        conv_cfg=None,
        norm_cfg=norm_cfg,
        act_cfg=dict(type='ReLU'),
        upsample_cfg=dict(type='InterpConv'),
        norm_eval=False),
    decode_head=dict(
        type='FCNHead',
        in_channels=64,
        in_index=4,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=1,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='LovaszLoss', loss_type='binary', reduction='none', loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=128,
        in_index=3,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=1,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='LovaszLoss', loss_type='binary', reduction='none', loss_weight=0.4)),
    # model training and testing settings
    train_cfg=dict(),
    test_cfg=dict(mode='slide', crop_size=256, stride=170))

pengzhao-life commented 2 years ago

Hi,

Thanks for the response! I changed num_classes=1 in configs/base/models/fcn_unet_s5-d16.py. And I have to change 'CLASSES' as below to one element only, otherwise it throws error for not matching. It runs, but the segmentation result is always the whole image. My annotated image is attached, where white pixels are my interest, and black is the background. Can you tell what went wrong? Thanks!

Peng

@DATASETS.register_module() class StanfordBackgroundDataset(CustomDataset): CLASSES =('b') # some class name PALETTE =[[255,0,0]] # some color def init(self, split, kwargs): super().init(img_suffix='.png', seg_map_suffix='.png', split=split, kwargs) assert osp.exists(self.img_dir) and self.split is not None

From: MengzhangLI @.> Sent: Tuesday, June 21, 2022 1:37 AM To: open-mmlab/mmsegmentation @.> Cc: Peng Zhao @.>; Mention @.> Subject: Re: [open-mmlab/mmsegmentation] I got same as reported in Lovasz loss #1036 (Issue #1681)

As for binary segmentation, num_classes should be set 1 rather than 2. You can try to modify config from configs/base/models/fcn_unet_s5-d16.py like:

model settings

norm_cfg = dict(type='SyncBN', requires_grad=True) model = dict( type='EncoderDecoder', pretrained=None, backbone=dict( type='UNet', in_channels=3, base_channels=64, num_stages=5, strides=(1, 1, 1, 1, 1), enc_num_convs=(2, 2, 2, 2, 2), dec_num_convs=(2, 2, 2, 2), downsamples=(True, True, True, True), enc_dilations=(1, 1, 1, 1, 1), dec_dilations=(1, 1, 1, 1), with_cp=False, conv_cfg=None, norm_cfg=norm_cfg, act_cfg=dict(type='ReLU'), upsample_cfg=dict(type='InterpConv'), norm_eval=False), decode_head=dict( type='FCNHead', in_channels=64, in_index=4, channels=64, num_convs=1, concat_input=False, dropout_ratio=0.1, num_classes=1, norm_cfg=norm_cfg, align_corners=False, loss_decode=dict( type='LovaszLoss', loss_type='binary', reduction='none', loss_weight=1.0)), auxiliary_head=dict( type='FCNHead', in_channels=128, in_index=3, channels=64, num_convs=1, concat_input=False, dropout_ratio=0.1, num_classes=1, norm_cfg=norm_cfg, align_corners=False, loss_decode=dict( type='LovaszLoss', loss_type='binary', reduction='none', loss_weight=0.4)),

model training and testing settings

train_cfg=dict(),
test_cfg=dict(mode='slide', crop_size=256, stride=170))

— Reply to this email directly, view it on GitHubhttps://github.com/open-mmlab/mmsegmentation/issues/1681#issuecomment-1161286913, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZZO2NUIU6G5USF35AZD53VQFIJPANCNFSM5ZHL23OQ. You are receiving this because you were mentioned.Message ID: @.***>

MengzhangLI commented 2 years ago

Hi, Thanks for the response! I changed num_classes=1 in configs/base/models/fcn_unet_s5-d16.py. And I have to change 'CLASSES' as below to one element only, otherwise it throws error for not matching. It runs, but the segmentation result is always the whole image. My annotated image is attached, where white pixels are my interest, and black is the background. Can you tell what went wrong? Thanks! Peng @DATASETS.register_module() class StanfordBackgroundDataset(CustomDataset): CLASSES =('b') # some class name PALETTE =[[255,0,0]] # some color def init(self, split, kwargs): super().init(img_suffix='.png', seg_map_suffix='.png', split=split, kwargs) assert osp.exists(self.img_dir) and self.split is not None … ____ From: MengzhangLI @.> Sent: Tuesday, June 21, 2022 1:37 AM To: open-mmlab/mmsegmentation @.> Cc: Peng Zhao @.>; Mention @.> Subject: Re: [open-mmlab/mmsegmentation] I got same as reported in Lovasz loss #1036 (Issue #1681) As for binary segmentation, num_classes should be set 1 rather than 2. You can try to modify config from configs/base/models/fcn_unet_s5-d16.py like: # model settings norm_cfg = dict(type='SyncBN', requires_grad=True) model = dict( type='EncoderDecoder', pretrained=None, backbone=dict( type='UNet', in_channels=3, base_channels=64, num_stages=5, strides=(1, 1, 1, 1, 1), enc_num_convs=(2, 2, 2, 2, 2), dec_num_convs=(2, 2, 2, 2), downsamples=(True, True, True, True), enc_dilations=(1, 1, 1, 1, 1), dec_dilations=(1, 1, 1, 1), with_cp=False, conv_cfg=None, norm_cfg=norm_cfg, act_cfg=dict(type='ReLU'), upsample_cfg=dict(type='InterpConv'), norm_eval=False), decode_head=dict( type='FCNHead', in_channels=64, in_index=4, channels=64, num_convs=1, concat_input=False, dropout_ratio=0.1, num_classes=1, norm_cfg=norm_cfg, align_corners=False, loss_decode=dict( type='LovaszLoss', loss_type='binary', reduction='none', loss_weight=1.0)), auxiliary_head=dict( type='FCNHead', in_channels=128, in_index=3, channels=64, num_convs=1, concat_input=False, dropout_ratio=0.1, num_classes=1, norm_cfg=norm_cfg, align_corners=False, loss_decode=dict( type='LovaszLoss', loss_type='binary', reduction='none', loss_weight=0.4)), # model training and testing settings train_cfg=dict(), test_cfg=dict(mode='slide', crop_size=256, stride=170)) — Reply to this email directly, view it on GitHub<#1681 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZZO2NUIU6G5USF35AZD53VQFIJPANCNFSM5ZHL23OQ. You are receiving this because you were mentioned.Message ID: @.***>

Sorry for late reply. Could you please try to use dict(type='LoadAnnotations', reduce_zero_label=True), in train_pipeline? Looking forward to your reply.

open-mmlab / mmsegmentation

I got same as reported in Lovasz loss #1036 #1681

Since we use only one GPU, BN is used instead of SyncBN

cfg.model.backbone.norm_cfg = cfg.norm_cfg

modify num classes of the model in decode/auxiliary head

Modify dataset type and path

img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],

We can still use the pre-trained Mask RCNN model though we do not need to

use the mask branch

Set up working dir to save files and logs.

Set seed to facitate reproducing the result

Let's have a look at the final config used for training

model training and testing settings

Build the dataset

Build the detector

Add an attribute for visualization convenience

Create work_dir

model settings

model training and testing settings