How to make the unet? - Githubissues

Hi, the mmsegmentation is very great and useful. But,when I change for the Unet，some questions defused me.I'd like to ask you a few questions，Thx.

The backbone type = ‘Unet’，what is the decode_head? I see the source code of unet is already made up of encode and decode。
If I want to change the network for the unet, for example using the resnet. Is there any convenient operation? Please forgive for my poor English,Thanks a lot.

You should write a unet_head inherited from BaseDecodeHead with some modification.
I think the architecture of mmseg is not developed for unet-like segmentation network. Because the mmseg defined the segmentation network as backbone-neck(option)-decode_head. The backbone extracts the multi-scale features by setting different dilation and strides, to restrict the feature map in 8x downsampling as the input of decode_head to predict the final mask. While the Unet-like network use encode downsample step-by-step, and upsample step-by-step. In the current architecture design, there is no convenient way to modify the Unet-like network. However, I think you can change the phase of the neck as the decode phase, the backbone as the true backbone( without the decode phase). Then, the processing pahse became into backbone(encode)-neck(skip connection & decode)-decode_head. And it's very convenient to change the architecture of the network.

@YLyeliang Thanks for your reply. After reading your reply, my understanding is the decode_head of mmseg is just predict the 8× downsampling of the backbone，without the upsampling option to decode? Am I right?

If I only select the backbone of unet to train，what is the type of model I should choose? I see there only are 'EncoderDecoder' and 'CascadeEncoderDecoder' which must select the decoder_head? Or it can not work for the architecture of mmseg. I must write the true backbone( without the decode phase) and skip connection neck? Thanks a lot！

@YLyeliang Thanks for your reply. After reading your reply, my understanding is the decode_head of mmseg is just predict the 8× downsampling of the backbone，without the upsampling option to decode? Am I right?

If I only select the backbone of unet to train，what is the type of model I should choose? I see there only are 'EncoderDecoder' and 'CascadeEncoderDecoder' which must select the decoder_head? Or it can not work for the architecture of mmseg. I must write the true backbone( without the decode phase) and skip connection neck? Thanks a lot！

e.g. UNet (backbone) + FCN (decode_head):

model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(
        type='UNet',
        in_channels=3,
        base_channels=64,
        num_stages=5,
        strides=(1, 1, 1, 1, 1),
        enc_num_convs=(2, 2, 2, 2, 2),
        dec_num_convs=(2, 2, 2, 2),
        downsamples=(True, True, True, True),
        enc_dilations=(1, 1, 1, 1, 1),
        dec_dilations=(1, 1, 1, 1),
        with_cp=False,
        conv_cfg=None,
        norm_cfg=norm_cfg,
        act_cfg=dict(type='ReLU'),
        upsample_cfg=dict(type='InterpConv'),
        norm_eval=False),
    decode_head=dict(
        type='FCNHead',
        in_channels=64,
        in_index=4,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=128,
        in_index=3,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)))

Hi @lzcstar, you can also replace FCN decode_head with other decode_head (e.g. ASPP, PSP).

@Junjun2016 ，Thanks. I just do it following your answer ,but there have two questions. 1.inputs = inputs[self.in_index] ,IndexError: list index out of range ; Your decode_head's in_index=4 ,so I do change it =-1; 2.when running ,RuntimeError: Given groups=1, weight of size [64, 64, 3, 3], expected input[16, 1024, 18, 18] to have 64 channels, but got 1024 channels instead. can you work it? I do exactly as you do.

@Junjun2016 ，Thanks. I just do it following your answer ,but there have two questions. 1.inputs = inputs[self.in_index] ,IndexError: list index out of range ; Your decode_head's in_index=4 ,so I do change it =-1; 2.when running ,RuntimeError: Given groups=1, weight of size [64, 64, 3, 3], expected input[16, 1024, 18, 18] to have 64 channels, but got 1024 channels instead. can you work it? I do exactly as you do.

It works. Full config:

model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(
        type='UNet',
        in_channels=3,
        base_channels=64,
        num_stages=5,
        strides=(1, 1, 1, 1, 1),
        enc_num_convs=(2, 2, 2, 2, 2),
        dec_num_convs=(2, 2, 2, 2),
        downsamples=(True, True, True, True),
        enc_dilations=(1, 1, 1, 1, 1),
        dec_dilations=(1, 1, 1, 1),
        with_cp=False,
        conv_cfg=None,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        act_cfg=dict(type='ReLU'),
        upsample_cfg=dict(type='InterpConv'),
        norm_eval=False),
    decode_head=dict(
        type='FCNHead',
        in_channels=64,
        in_index=4,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=150,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=128,
        in_index=3,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=150,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)))

train_cfg = dict()
test_cfg = dict(mode='slide', crop_size=(512, 512), stride=(341, 341))

data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    train=dict(
        type='ADE20KDataset',
        data_root='data1/ade/ADEChallengeData2016',
        img_dir='images/training',
        ann_dir='annotations/training',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', reduce_zero_label=True),
            dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
            dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
            dict(type='RandomFlip', prob=0.5),
            dict(type='PhotoMetricDistortion'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_semantic_seg'])
        ]),
    val=dict(
        type='ADE20KDataset',
        data_root='data1/ade/ADEChallengeData2016',
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='ADE20KDataset',
        data_root='data1/ade/ADEChallengeData2016',
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
log_config = dict(
    interval=50, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
lr_config = dict(policy='poly', power=0.9, min_lr=0.0001, by_epoch=False)
runner = dict(type='IterBasedRunner', max_iters=160000)
checkpoint_config = dict(by_epoch=False, interval=16000, max_keep_ckpts=3)
evaluation = dict(interval=16000, metric='mIoU')

log:

2020-12-06 18:08:17,330 - mmseg - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.6 (default, Jan  8 2020, 19:59:22) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: TITAN Xp
CUDA_HOME: /mnt/lustre/share/polaris/dep/cuda-9.0-cudnn7.6.5
NVCC: Cuda compilation tools, release 9.0, V9.0.176
GCC: gcc (GCC) 5.4.0
PyTorch: 1.5.0
PyTorch compiling details: PyTorch built with:
  - GCC 5.4
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 912ce228837d1ce28e1a61806118835de03f5751)
  - OpenMP 201307 (a.k.a. OpenMP 4.0)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 9.0
  - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70
  - CuDNN 7.6.5
  - Magma 2.5.0
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.6.0
OpenCV: 4.2.0
MMCV: 1.1.5
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 9.0
MMSegmentation: 0.8.0+993be25
------------------------------------------------------------

2020-12-06 18:08:17,335 - mmseg - INFO - Distributed training: True
2020-12-06 18:08:17,706 - mmseg - INFO - Config:
model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(
        type='UNet',
        in_channels=3,
        base_channels=64,
        num_stages=5,
        strides=(1, 1, 1, 1, 1),
        enc_num_convs=(2, 2, 2, 2, 2),
        dec_num_convs=(2, 2, 2, 2),
        downsamples=(True, True, True, True),
        enc_dilations=(1, 1, 1, 1, 1),
        dec_dilations=(1, 1, 1, 1),
        with_cp=True,
        conv_cfg=None,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        act_cfg=dict(type='ReLU'),
        upsample_cfg=dict(type='InterpConv'),
        norm_eval=False),
    decode_head=dict(
        type='FCNHead',
        in_channels=64,
        in_index=4,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=150,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=128,
        in_index=3,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=150,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)))
train_cfg = dict()
test_cfg = dict(mode='slide', crop_size=(512, 512), stride=(341, 341))
data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    train=dict(
        type='ADE20KDataset',
        data_root='data1/ade/ADEChallengeData2016',
        img_dir='images/training',
        ann_dir='annotations/training',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', reduce_zero_label=True),
            dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
            dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
            dict(type='RandomFlip', prob=0.5),
            dict(type='PhotoMetricDistortion'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_semantic_seg'])
        ]),
    val=dict(
        type='ADE20KDataset',
        data_root='data1/ade/ADEChallengeData2016',
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='ADE20KDataset',
        data_root='data1/ade/ADEChallengeData2016',
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
log_config = dict(
    interval=50, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
lr_config = dict(policy='poly', power=0.9, min_lr=0.0001, by_epoch=False)
runner = dict(type='IterBasedRunner', max_iters=160000)
checkpoint_config = dict(by_epoch=False, interval=16000, max_keep_ckpts=3)
evaluation = dict(interval=16000, metric='mIoU')
work_dir = 'apcnet/configs/./work_dirs/unet-512x512-160k-ade20k'
gpu_ids = range(0, 1)

2020-12-06 18:08:17,707 - mmseg - INFO - Set random seed to 0, deterministic: False
2020-12-06 18:08:18,756 - mmseg - INFO - EncoderDecoder(
  (backbone): UNet(
    (encoder): ModuleList(
      (0): Sequential(
        (0): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (1): Sequential(
        (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        (1): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (2): Sequential(
        (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        (1): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (3): Sequential(
        (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        (1): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (4): Sequential(
        (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        (1): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
    )
    (decoder): ModuleList(
      (0): UpConvBlock(
        (conv_block): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        (upsample): InterpConv(
          (interp_upsample): Sequential(
            (0): Upsample(scale_factor=2.0, mode=bilinear)
            (1): ConvModule(
              (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (1): UpConvBlock(
        (conv_block): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        (upsample): InterpConv(
          (interp_upsample): Sequential(
            (0): Upsample(scale_factor=2.0, mode=bilinear)
            (1): ConvModule(
              (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (2): UpConvBlock(
        (conv_block): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        (upsample): InterpConv(
          (interp_upsample): Sequential(
            (0): Upsample(scale_factor=2.0, mode=bilinear)
            (1): ConvModule(
              (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (3): UpConvBlock(
        (conv_block): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(1024, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        (upsample): InterpConv(
          (interp_upsample): Sequential(
            (0): Upsample(scale_factor=2.0, mode=bilinear)
            (1): ConvModule(
              (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
    )
  )
  (decode_head): FCNHead(
    input_transform=None, ignore_index=255, align_corners=False
    (loss_decode): CrossEntropyLoss()
    (conv_seg): Conv2d(64, 150, kernel_size=(1, 1), stride=(1, 1))
    (dropout): Dropout2d(p=0.1, inplace=False)
    (convs): Sequential(
      (0): ConvModule(
        (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activate): ReLU(inplace=True)
      )
    )
  )
  (auxiliary_head): FCNHead(
    input_transform=None, ignore_index=255, align_corners=False
    (loss_decode): CrossEntropyLoss()
    (conv_seg): Conv2d(64, 150, kernel_size=(1, 1), stride=(1, 1))
    (dropout): Dropout2d(p=0.1, inplace=False)
    (convs): Sequential(
      (0): ConvModule(
        (conv): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activate): ReLU(inplace=True)
      )
    )
  )
)
2020-12-06 18:08:19,306 - mmseg - INFO - Loaded 20210 images
2020-12-06 18:08:24,536 - mmseg - INFO - Loaded 2000 images
2020-12-06 18:08:24,537 - mmseg - INFO - Start running, host: hejunjun@SH-IDC2-172-20-20-64, work_dir: /mnt/lustre/hejunjun/OpenMMLab/DecoupleSegNet/mmsegmentation/apcnet/configs/work_dirs/unet-512x512-160k-ade20k
2020-12-06 18:08:24,537 - mmseg - INFO - workflow: [('train', 1)], max: 160000 iters
2020-12-06 18:09:47,419 - mmseg - INFO - Iter [50/160000]   lr: 9.997e-03, eta: 2 days, 13:47:58, time: 1.391, data_time: 0.006, memory: 8548, decode.loss_seg: 3.4276, decode.acc_seg: 15.5703, aux.loss_seg: 1.4907, aux.acc_seg: 13.5636, loss: 4.9183
2020-12-06 18:10:50,344 - mmseg - INFO - Iter [100/160000]  lr: 9.994e-03, eta: 2 days, 10:50:26, time: 1.259, data_time: 0.027, memory: 8548, decode.loss_seg: 2.8677, decode.acc_seg: 20.3184, aux.loss_seg: 1.2721, aux.acc_seg: 18.1589, loss: 4.1397
2020-12-06 18:11:53,316 - mmseg - INFO - Iter [150/160000]  lr: 9.992e-03, eta: 2 days, 9:51:20, time: 1.259, data_time: 0.006, memory: 8548, decode.loss_seg: 2.7569, decode.acc_seg: 21.3303, aux.loss_seg: 1.1914, aux.acc_seg: 19.7417, loss: 3.9483
2020-12-06 18:12:56,055 - mmseg - INFO - Iter [200/160000]  lr: 9.989e-03, eta: 2 days, 9:18:09, time: 1.255, data_time: 0.006, memory: 8548, decode.loss_seg: 2.6558, decode.acc_seg: 23.1496, aux.loss_seg: 1.1254, aux.acc_seg: 22.9028, loss: 3.7812
2020-12-06 18:13:58,873 - mmseg - INFO - Iter [250/160000]  lr: 9.986e-03, eta: 2 days, 8:58:41, time: 1.256, data_time: 0.006, memory: 8548, decode.loss_seg: 2.5775, decode.acc_seg: 24.2433, aux.loss_seg: 1.0851, aux.acc_seg: 23.6550, loss: 3.6626
2020-12-06 18:15:02,085 - mmseg - INFO - Iter [300/160000]  lr: 9.983e-03, eta: 2 days, 8:48:50, time: 1.264, data_time: 0.006, memory: 8548, decode.loss_seg: 2.5133, decode.acc_seg: 24.8185, aux.loss_seg: 1.0561, aux.acc_seg: 23.5870, loss: 3.5694
2020-12-06 18:16:05,196 - mmseg - INFO - Iter [350/160000]  lr: 9.981e-03, eta: 2 days, 8:40:43, time: 1.262, data_time: 0.007, memory: 8548, decode.loss_seg: 2.5072, decode.acc_seg: 24.7626, aux.loss_seg: 1.0462, aux.acc_seg: 23.8259, loss: 3.5534
2020-12-06 18:17:08,388 - mmseg - INFO - Iter [400/160000]  lr: 9.978e-03, eta: 2 days, 8:34:55, time: 1.264, data_time: 0.006, memory: 8548, decode.loss_seg: 2.4782, decode.acc_seg: 26.0928, aux.loss_seg: 1.0248, aux.acc_seg: 25.6890, loss: 3.5029
2020-12-06 18:18:11,584 - mmseg - INFO - Iter [450/160000]  lr: 9.975e-03, eta: 2 days, 8:30:12, time: 1.264, data_time: 0.006, memory: 8548, decode.loss_seg: 2.4079, decode.acc_seg: 25.6402, aux.loss_seg: 0.9936, aux.acc_seg: 25.5241, loss: 3.4015
2020-12-06 18:19:15,011 - mmseg - INFO - Iter [500/160000]  lr: 9.972e-03, eta: 2 days, 8:27:27, time: 1.269, data_time: 0.006, memory: 8548, decode.loss_seg: 2.4201, decode.acc_seg: 27.0654, aux.loss_seg: 0.9920, aux.acc_seg: 27.0333, loss: 3.4121
2020-12-06 18:20:18,271 - mmseg - INFO - Iter [550/160000]  lr: 9.969e-03, eta: 2 days, 8:24:10, time: 1.265, data_time: 0.007, memory: 8548, decode.loss_seg: 2.3706, decode.acc_seg: 27.1034, aux.loss_seg: 0.9722, aux.acc_seg: 27.3096, loss: 3.3428

@Junjun2016 Thanks Bro. I found a question. How do you change the channels of FCNhead,or DepthwiseSeparableASPPHead. When I use your channels, the unet works. When I add DepthwiseSeparableASPPHead, I found it still has a problem(RuntimeError: Given groups=1, weight of size [48, 256, 1, 1], expected input[6, 1024, 18, 18] to have 256 channels, but got 1024 channels instead) Just like this. Can you teach me how to design or calculate the in_channels and channels. Very Thanks

@Junjun2016 Thanks Bro. I found a question. How do you change the channels of FCNhead,or DepthwiseSeparableASPPHead. When I use your channels, the unet works. When I add DepthwiseSeparableASPPHead, I found it still has a problem(RuntimeError: Given groups=1, weight of size [48, 256, 1, 1], expected input[6, 1024, 18, 18] to have 256 channels, but got 1024 channels instead) Just like this. Can you teach me how to design or calculate the in_channels and channels. Very Thanks

Hi, if you use UNet backbone, you may use decode_head without skip connection (FCN, PSP, ASPP).

UNet configs for 4 retinal vessel segmentation.

UNet configs for 4 retinal vessel segmentation.

The decode_head is FCN and we will add benchmark of PSP and ASPP docode_head ASAP.

UNet configs for 4 retinal vessel segmentation.

The decode_head is FCN and we will add benchmark of PSP and ASPP docode_head ASAP.

How can one decide the crop_size and the stride to be used for making the U-Net on the custom dataset?

@YLyeliang Thanks for your reply. After reading your reply, my understanding is the decode_head of mmseg is just predict the 8× downsampling of the backbone，without the upsampling option to decode? Am I right? If I only select the backbone of unet to train，what is the type of model I should choose? I see there only are 'EncoderDecoder' and 'CascadeEncoderDecoder' which must select the decoder_head? Or it can not work for the architecture of mmseg. I must write the true backbone( without the decode phase) and skip connection neck? Thanks a lot！

e.g. UNet (backbone) + FCN (decode_head):
model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(
        type='UNet',
        in_channels=3,
        base_channels=64,
        num_stages=5,
        strides=(1, 1, 1, 1, 1),
        enc_num_convs=(2, 2, 2, 2, 2),
        dec_num_convs=(2, 2, 2, 2),
        downsamples=(True, True, True, True),
        enc_dilations=(1, 1, 1, 1, 1),
        dec_dilations=(1, 1, 1, 1),
        with_cp=False,
        conv_cfg=None,
        norm_cfg=norm_cfg,
        act_cfg=dict(type='ReLU'),
        upsample_cfg=dict(type='InterpConv'),
        norm_eval=False),
    decode_head=dict(
        type='FCNHead',
        in_channels=64,
        in_index=4,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=128,
        in_index=3,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)))

Is this equivalent to the official U-Net paper? Thanks in advance.

@YLyeliang Thanks for your reply. After reading your reply, my understanding is the decode_head of mmseg is just predict the 8× downsampling of the backbone，without the upsampling option to decode? Am I right? If I only select the backbone of unet to train，what is the type of model I should choose? I see there only are 'EncoderDecoder' and 'CascadeEncoderDecoder' which must select the decoder_head? Or it can not work for the architecture of mmseg. I must write the true backbone( without the decode phase) and skip connection neck? Thanks a lot！

e.g. UNet (backbone) + FCN (decode_head):
model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(
        type='UNet',
        in_channels=3,
        base_channels=64,
        num_stages=5,
        strides=(1, 1, 1, 1, 1),
        enc_num_convs=(2, 2, 2, 2, 2),
        dec_num_convs=(2, 2, 2, 2),
        downsamples=(True, True, True, True),
        enc_dilations=(1, 1, 1, 1, 1),
        dec_dilations=(1, 1, 1, 1),
        with_cp=False,
        conv_cfg=None,
        norm_cfg=norm_cfg,
        act_cfg=dict(type='ReLU'),
        upsample_cfg=dict(type='InterpConv'),
        norm_eval=False),
    decode_head=dict(
        type='FCNHead',
        in_channels=64,
        in_index=4,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=128,
        in_index=3,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)))
Is this equivalent to the official U-Net paper? Thanks in advance.

Empirical. The larger the image, the larger the crop size.

@Junjun2016 I am using a crop size of 256x256 with a stride of 170 for an image of size 540x360 and it is giving satisfactory results. Should I modify the crop size to something smaller? Moreover, can you tell me how can we set the optimal learning rate for training the model. One of the solutions is to implement the lr finders such as: https://github.com/davidtvs/pytorch-lr-finder/blob/master/examples/lrfinder_cifar10.ipynb. How can we achieve this in mmsegmentation framework?

For a fair comparison, we use the same learning rate, but you can tune the learning rate and crop size according to your task. There is no general rule.

For a fair comparison, we use the same learning rate, but you can tune the learning rate and crop size according to your task. There is no general rule. Hey @Junjun2016, Thanks for the reply. One more question: Does mmsegmentation allow us to use a hyperparameter search framework such as optuna or Lr finder for finding the optimal learning rate? Because it is very difficult to search the LR manually. Also can you explain what is the difference between the decode loss and the auxiliary loss? Also can be use different losses for these two heads?

That's a really good question. This is in our future plans, but we are currently understaffed. If you are interested, you can contribute together. As for decode loss and the auxiliary loss, usually, we use the same loss and the two can also be different. The auxiliary helps optimize the learning process and we drop it in the inference phrase.

For a fair comparison, we use the same learning rate, but you can tune the learning rate and crop size according to your task. There is no general rule. Hey @Junjun2016, Thanks for the reply. One more question: Does mmsegmentation allow us to use a hyperparameter search framework such as optuna or Lr finder for finding the optimal learning rate? Because it is very difficult to search the LR manually. Also can you explain what is the difference between the decode loss and the auxiliary loss? Also can be use different losses for these two heads?

That's a really good question. This is in our future plans, but we are currently understaffed. If you are interested, you can contribute together. As for decode loss and the auxiliary loss, usually, we use the same loss and the two can also be different. The auxiliary helps optimize the learning process and we drop it in the inference phrase.

Thanks for the explanation. Sure I'll be happy to contribute. I am working to implement lr finder for mmsegmentation framework.

Hi @rubeea Thanks in advance.

Hi @Junjun2016 one last question. As far as I understand, I believe that decode and aux heads are being used in parallel in the model config right? Or are they cascaded? Thanks in advance.

Hi @Junjun2016 one last question. As far as I understand, I believe that decode and aux heads are being used in parallel in the model config right? Or are they cascaded? Thanks in advance.

In parallel. Supervise different stages.

Hi @Junjun2016 one last question. As far as I understand, I believe that decode and aux heads are being used in parallel in the model config right? Or are they cascaded? Thanks in advance.

In parallel. Supervise different stages.

Can you kindly provide a graphical illustration of the mmsegmentation U-Net with FCN decode and auxiliary heads to help us understand the architecture in a better manner. Moreover, can you specify what is the input feature index specified by the "in_index" argument in the based decode_head.py. Thanks in advance.

Hi @Junjun2016 one last question. As far as I understand, I believe that decode and aux heads are being used in parallel in the model config right? Or are they cascaded? Thanks in advance.

In parallel. Supervise different stages.

Can you kindly provide a graphical illustration of the mmsegmentation U-Net with FCN decode and auxiliary heads to help us understand the architecture in a better manner. Moreover, can you specify what is the input feature index specified by the "in_index" argument in the based decode_head.py. Thanks in advance.

I have the same problem

Hi @Junjun2016 one last question. As far as I understand, I believe that decode and aux heads are being used in parallel in the model config right? Or are they cascaded? Thanks in advance.

In parallel. Supervise different stages.

Can you kindly provide a graphical illustration of the mmsegmentation U-Net with FCN decode and auxiliary heads to help us understand the architecture in a better manner. Moreover, can you specify what is the input feature index specified by the "in_index" argument in the based decode_head.py. Thanks in advance.

I have the same problem

@ChenJiangxi Hi, I was able to figure out the working mechanism. You can find illustration on my github page: https://github.com/rubeea/focal_phi_loss_mmsegmentation

open-mmlab / mmsegmentation

How to make the unet? #289