open-mmlab / mmsegmentation

OpenMMLab Semantic Segmentation Toolbox and Benchmark.
https://mmsegmentation.readthedocs.io/en/main/
Apache License 2.0
8.26k stars 2.62k forks source link

How to make the unet? #289

Closed lzcstar closed 3 years ago

lzcstar commented 3 years ago

Hi, the mmsegmentation is very great and useful. But,when I change for the Unet,some questions defused me.I'd like to ask you a few questions,Thx.

  1. The backbone type = ‘Unet’,what is the decode_head? I see the source code of unet is already made up of encode and decode。
  2. If I want to change the network for the unet, for example using the resnet. Is there any convenient operation? Please forgive for my poor English,Thanks a lot.
YLyeliang commented 3 years ago
  1. You should write a unet_head inherited from BaseDecodeHead with some modification.

  2. I think the architecture of mmseg is not developed for unet-like segmentation network. Because the mmseg defined the segmentation network as backbone-neck(option)-decode_head. The backbone extracts the multi-scale features by setting different dilation and strides, to restrict the feature map in 8x downsampling as the input of decode_head to predict the final mask. While the Unet-like network use encode downsample step-by-step, and upsample step-by-step. In the current architecture design, there is no convenient way to modify the Unet-like network. However, I think you can change the phase of the neck as the decode phase, the backbone as the true backbone( without the decode phase). Then, the processing pahse became into backbone(encode)-neck(skip connection & decode)-decode_head. And it's very convenient to change the architecture of the network.

lzcstar commented 3 years ago

@YLyeliang Thanks for your reply. After reading your reply, my understanding is the decode_head of mmseg is just predict the 8× downsampling of the backbone,without the upsampling option to decode? Am I right?

If I only select the backbone of unet to train,what is the type of model I should choose? I see there only are 'EncoderDecoder' and 'CascadeEncoderDecoder' which must select the decoder_head? Or it can not work for the architecture of mmseg. I must write the true backbone( without the decode phase) and skip connection neck? Thanks a lot!

Junjun2016 commented 3 years ago

@YLyeliang Thanks for your reply. After reading your reply, my understanding is the decode_head of mmseg is just predict the 8× downsampling of the backbone,without the upsampling option to decode? Am I right?

If I only select the backbone of unet to train,what is the type of model I should choose? I see there only are 'EncoderDecoder' and 'CascadeEncoderDecoder' which must select the decoder_head? Or it can not work for the architecture of mmseg. I must write the true backbone( without the decode phase) and skip connection neck? Thanks a lot!

e.g. UNet (backbone) + FCN (decode_head):

model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(
        type='UNet',
        in_channels=3,
        base_channels=64,
        num_stages=5,
        strides=(1, 1, 1, 1, 1),
        enc_num_convs=(2, 2, 2, 2, 2),
        dec_num_convs=(2, 2, 2, 2),
        downsamples=(True, True, True, True),
        enc_dilations=(1, 1, 1, 1, 1),
        dec_dilations=(1, 1, 1, 1),
        with_cp=False,
        conv_cfg=None,
        norm_cfg=norm_cfg,
        act_cfg=dict(type='ReLU'),
        upsample_cfg=dict(type='InterpConv'),
        norm_eval=False),
    decode_head=dict(
        type='FCNHead',
        in_channels=64,
        in_index=4,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=128,
        in_index=3,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)))
Junjun2016 commented 3 years ago

Hi @lzcstar, you can also replace FCN decode_head with other decode_head (e.g. ASPP, PSP).

lzcstar commented 3 years ago

@Junjun2016 ,Thanks. I just do it following your answer ,but there have two questions. 1.inputs = inputs[self.in_index] ,IndexError: list index out of range ; Your decode_head's in_index=4 ,so I do change it =-1; 2.when running ,RuntimeError: Given groups=1, weight of size [64, 64, 3, 3], expected input[16, 1024, 18, 18] to have 64 channels, but got 1024 channels instead. can you work it? I do exactly as you do.

Junjun2016 commented 3 years ago

@Junjun2016 ,Thanks. I just do it following your answer ,but there have two questions. 1.inputs = inputs[self.in_index] ,IndexError: list index out of range ; Your decode_head's in_index=4 ,so I do change it =-1; 2.when running ,RuntimeError: Given groups=1, weight of size [64, 64, 3, 3], expected input[16, 1024, 18, 18] to have 64 channels, but got 1024 channels instead. can you work it? I do exactly as you do.

It works. Full config:

model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(
        type='UNet',
        in_channels=3,
        base_channels=64,
        num_stages=5,
        strides=(1, 1, 1, 1, 1),
        enc_num_convs=(2, 2, 2, 2, 2),
        dec_num_convs=(2, 2, 2, 2),
        downsamples=(True, True, True, True),
        enc_dilations=(1, 1, 1, 1, 1),
        dec_dilations=(1, 1, 1, 1),
        with_cp=False,
        conv_cfg=None,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        act_cfg=dict(type='ReLU'),
        upsample_cfg=dict(type='InterpConv'),
        norm_eval=False),
    decode_head=dict(
        type='FCNHead',
        in_channels=64,
        in_index=4,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=150,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=128,
        in_index=3,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=150,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)))

train_cfg = dict()
test_cfg = dict(mode='slide', crop_size=(512, 512), stride=(341, 341))

data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    train=dict(
        type='ADE20KDataset',
        data_root='data1/ade/ADEChallengeData2016',
        img_dir='images/training',
        ann_dir='annotations/training',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', reduce_zero_label=True),
            dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
            dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
            dict(type='RandomFlip', prob=0.5),
            dict(type='PhotoMetricDistortion'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_semantic_seg'])
        ]),
    val=dict(
        type='ADE20KDataset',
        data_root='data1/ade/ADEChallengeData2016',
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='ADE20KDataset',
        data_root='data1/ade/ADEChallengeData2016',
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
log_config = dict(
    interval=50, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
lr_config = dict(policy='poly', power=0.9, min_lr=0.0001, by_epoch=False)
runner = dict(type='IterBasedRunner', max_iters=160000)
checkpoint_config = dict(by_epoch=False, interval=16000, max_keep_ckpts=3)
evaluation = dict(interval=16000, metric='mIoU')

log:

2020-12-06 18:08:17,330 - mmseg - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.6 (default, Jan  8 2020, 19:59:22) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: TITAN Xp
CUDA_HOME: /mnt/lustre/share/polaris/dep/cuda-9.0-cudnn7.6.5
NVCC: Cuda compilation tools, release 9.0, V9.0.176
GCC: gcc (GCC) 5.4.0
PyTorch: 1.5.0
PyTorch compiling details: PyTorch built with:
  - GCC 5.4
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 912ce228837d1ce28e1a61806118835de03f5751)
  - OpenMP 201307 (a.k.a. OpenMP 4.0)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 9.0
  - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70
  - CuDNN 7.6.5
  - Magma 2.5.0
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.6.0
OpenCV: 4.2.0
MMCV: 1.1.5
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 9.0
MMSegmentation: 0.8.0+993be25
------------------------------------------------------------

2020-12-06 18:08:17,335 - mmseg - INFO - Distributed training: True
2020-12-06 18:08:17,706 - mmseg - INFO - Config:
model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(
        type='UNet',
        in_channels=3,
        base_channels=64,
        num_stages=5,
        strides=(1, 1, 1, 1, 1),
        enc_num_convs=(2, 2, 2, 2, 2),
        dec_num_convs=(2, 2, 2, 2),
        downsamples=(True, True, True, True),
        enc_dilations=(1, 1, 1, 1, 1),
        dec_dilations=(1, 1, 1, 1),
        with_cp=True,
        conv_cfg=None,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        act_cfg=dict(type='ReLU'),
        upsample_cfg=dict(type='InterpConv'),
        norm_eval=False),
    decode_head=dict(
        type='FCNHead',
        in_channels=64,
        in_index=4,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=150,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=128,
        in_index=3,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=150,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)))
train_cfg = dict()
test_cfg = dict(mode='slide', crop_size=(512, 512), stride=(341, 341))
data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    train=dict(
        type='ADE20KDataset',
        data_root='data1/ade/ADEChallengeData2016',
        img_dir='images/training',
        ann_dir='annotations/training',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', reduce_zero_label=True),
            dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
            dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
            dict(type='RandomFlip', prob=0.5),
            dict(type='PhotoMetricDistortion'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_semantic_seg'])
        ]),
    val=dict(
        type='ADE20KDataset',
        data_root='data1/ade/ADEChallengeData2016',
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='ADE20KDataset',
        data_root='data1/ade/ADEChallengeData2016',
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
log_config = dict(
    interval=50, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
lr_config = dict(policy='poly', power=0.9, min_lr=0.0001, by_epoch=False)
runner = dict(type='IterBasedRunner', max_iters=160000)
checkpoint_config = dict(by_epoch=False, interval=16000, max_keep_ckpts=3)
evaluation = dict(interval=16000, metric='mIoU')
work_dir = 'apcnet/configs/./work_dirs/unet-512x512-160k-ade20k'
gpu_ids = range(0, 1)

2020-12-06 18:08:17,707 - mmseg - INFO - Set random seed to 0, deterministic: False
2020-12-06 18:08:18,756 - mmseg - INFO - EncoderDecoder(
  (backbone): UNet(
    (encoder): ModuleList(
      (0): Sequential(
        (0): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (1): Sequential(
        (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        (1): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (2): Sequential(
        (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        (1): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (3): Sequential(
        (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        (1): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (4): Sequential(
        (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        (1): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
    )
    (decoder): ModuleList(
      (0): UpConvBlock(
        (conv_block): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        (upsample): InterpConv(
          (interp_upsample): Sequential(
            (0): Upsample(scale_factor=2.0, mode=bilinear)
            (1): ConvModule(
              (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (1): UpConvBlock(
        (conv_block): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        (upsample): InterpConv(
          (interp_upsample): Sequential(
            (0): Upsample(scale_factor=2.0, mode=bilinear)
            (1): ConvModule(
              (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (2): UpConvBlock(
        (conv_block): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        (upsample): InterpConv(
          (interp_upsample): Sequential(
            (0): Upsample(scale_factor=2.0, mode=bilinear)
            (1): ConvModule(
              (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
      (3): UpConvBlock(
        (conv_block): BasicConvBlock(
          (convs): Sequential(
            (0): ConvModule(
              (conv): Conv2d(1024, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (1): ConvModule(
              (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        (upsample): InterpConv(
          (interp_upsample): Sequential(
            (0): Upsample(scale_factor=2.0, mode=bilinear)
            (1): ConvModule(
              (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): SyncBatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
      )
    )
  )
  (decode_head): FCNHead(
    input_transform=None, ignore_index=255, align_corners=False
    (loss_decode): CrossEntropyLoss()
    (conv_seg): Conv2d(64, 150, kernel_size=(1, 1), stride=(1, 1))
    (dropout): Dropout2d(p=0.1, inplace=False)
    (convs): Sequential(
      (0): ConvModule(
        (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activate): ReLU(inplace=True)
      )
    )
  )
  (auxiliary_head): FCNHead(
    input_transform=None, ignore_index=255, align_corners=False
    (loss_decode): CrossEntropyLoss()
    (conv_seg): Conv2d(64, 150, kernel_size=(1, 1), stride=(1, 1))
    (dropout): Dropout2d(p=0.1, inplace=False)
    (convs): Sequential(
      (0): ConvModule(
        (conv): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activate): ReLU(inplace=True)
      )
    )
  )
)
2020-12-06 18:08:19,306 - mmseg - INFO - Loaded 20210 images
2020-12-06 18:08:24,536 - mmseg - INFO - Loaded 2000 images
2020-12-06 18:08:24,537 - mmseg - INFO - Start running, host: hejunjun@SH-IDC2-172-20-20-64, work_dir: /mnt/lustre/hejunjun/OpenMMLab/DecoupleSegNet/mmsegmentation/apcnet/configs/work_dirs/unet-512x512-160k-ade20k
2020-12-06 18:08:24,537 - mmseg - INFO - workflow: [('train', 1)], max: 160000 iters
2020-12-06 18:09:47,419 - mmseg - INFO - Iter [50/160000]   lr: 9.997e-03, eta: 2 days, 13:47:58, time: 1.391, data_time: 0.006, memory: 8548, decode.loss_seg: 3.4276, decode.acc_seg: 15.5703, aux.loss_seg: 1.4907, aux.acc_seg: 13.5636, loss: 4.9183
2020-12-06 18:10:50,344 - mmseg - INFO - Iter [100/160000]  lr: 9.994e-03, eta: 2 days, 10:50:26, time: 1.259, data_time: 0.027, memory: 8548, decode.loss_seg: 2.8677, decode.acc_seg: 20.3184, aux.loss_seg: 1.2721, aux.acc_seg: 18.1589, loss: 4.1397
2020-12-06 18:11:53,316 - mmseg - INFO - Iter [150/160000]  lr: 9.992e-03, eta: 2 days, 9:51:20, time: 1.259, data_time: 0.006, memory: 8548, decode.loss_seg: 2.7569, decode.acc_seg: 21.3303, aux.loss_seg: 1.1914, aux.acc_seg: 19.7417, loss: 3.9483
2020-12-06 18:12:56,055 - mmseg - INFO - Iter [200/160000]  lr: 9.989e-03, eta: 2 days, 9:18:09, time: 1.255, data_time: 0.006, memory: 8548, decode.loss_seg: 2.6558, decode.acc_seg: 23.1496, aux.loss_seg: 1.1254, aux.acc_seg: 22.9028, loss: 3.7812
2020-12-06 18:13:58,873 - mmseg - INFO - Iter [250/160000]  lr: 9.986e-03, eta: 2 days, 8:58:41, time: 1.256, data_time: 0.006, memory: 8548, decode.loss_seg: 2.5775, decode.acc_seg: 24.2433, aux.loss_seg: 1.0851, aux.acc_seg: 23.6550, loss: 3.6626
2020-12-06 18:15:02,085 - mmseg - INFO - Iter [300/160000]  lr: 9.983e-03, eta: 2 days, 8:48:50, time: 1.264, data_time: 0.006, memory: 8548, decode.loss_seg: 2.5133, decode.acc_seg: 24.8185, aux.loss_seg: 1.0561, aux.acc_seg: 23.5870, loss: 3.5694
2020-12-06 18:16:05,196 - mmseg - INFO - Iter [350/160000]  lr: 9.981e-03, eta: 2 days, 8:40:43, time: 1.262, data_time: 0.007, memory: 8548, decode.loss_seg: 2.5072, decode.acc_seg: 24.7626, aux.loss_seg: 1.0462, aux.acc_seg: 23.8259, loss: 3.5534
2020-12-06 18:17:08,388 - mmseg - INFO - Iter [400/160000]  lr: 9.978e-03, eta: 2 days, 8:34:55, time: 1.264, data_time: 0.006, memory: 8548, decode.loss_seg: 2.4782, decode.acc_seg: 26.0928, aux.loss_seg: 1.0248, aux.acc_seg: 25.6890, loss: 3.5029
2020-12-06 18:18:11,584 - mmseg - INFO - Iter [450/160000]  lr: 9.975e-03, eta: 2 days, 8:30:12, time: 1.264, data_time: 0.006, memory: 8548, decode.loss_seg: 2.4079, decode.acc_seg: 25.6402, aux.loss_seg: 0.9936, aux.acc_seg: 25.5241, loss: 3.4015
2020-12-06 18:19:15,011 - mmseg - INFO - Iter [500/160000]  lr: 9.972e-03, eta: 2 days, 8:27:27, time: 1.269, data_time: 0.006, memory: 8548, decode.loss_seg: 2.4201, decode.acc_seg: 27.0654, aux.loss_seg: 0.9920, aux.acc_seg: 27.0333, loss: 3.4121
2020-12-06 18:20:18,271 - mmseg - INFO - Iter [550/160000]  lr: 9.969e-03, eta: 2 days, 8:24:10, time: 1.265, data_time: 0.007, memory: 8548, decode.loss_seg: 2.3706, decode.acc_seg: 27.1034, aux.loss_seg: 0.9722, aux.acc_seg: 27.3096, loss: 3.3428
lzcstar commented 3 years ago

@Junjun2016 Thanks Bro. I found a question. How do you change the channels of FCNhead,or DepthwiseSeparableASPPHead. When I use your channels, the unet works. When I add DepthwiseSeparableASPPHead, I found it still has a problem(RuntimeError: Given groups=1, weight of size [48, 256, 1, 1], expected input[6, 1024, 18, 18] to have 256 channels, but got 1024 channels instead) Just like this. Can you teach me how to design or calculate the in_channels and channels. Very Thanks

Junjun2016 commented 3 years ago

@Junjun2016 Thanks Bro. I found a question. How do you change the channels of FCNhead,or DepthwiseSeparableASPPHead. When I use your channels, the unet works. When I add DepthwiseSeparableASPPHead, I found it still has a problem(RuntimeError: Given groups=1, weight of size [48, 256, 1, 1], expected input[6, 1024, 18, 18] to have 256 channels, but got 1024 channels instead) Just like this. Can you teach me how to design or calculate the in_channels and channels. Very Thanks

Hi, if you use UNet backbone, you may use decode_head without skip connection (FCN, PSP, ASPP).

Junjun2016 commented 3 years ago

UNet configs for 4 retinal vessel segmentation.

Junjun2016 commented 3 years ago

UNet configs for 4 retinal vessel segmentation.

The decode_head is FCN and we will add benchmark of PSP and ASPP docode_head ASAP.

rubeea commented 3 years ago

UNet configs for 4 retinal vessel segmentation.

The decode_head is FCN and we will add benchmark of PSP and ASPP docode_head ASAP.

How can one decide the crop_size and the stride to be used for making the U-Net on the custom dataset?

rubeea commented 3 years ago

@YLyeliang Thanks for your reply. After reading your reply, my understanding is the decode_head of mmseg is just predict the 8× downsampling of the backbone,without the upsampling option to decode? Am I right? If I only select the backbone of unet to train,what is the type of model I should choose? I see there only are 'EncoderDecoder' and 'CascadeEncoderDecoder' which must select the decoder_head? Or it can not work for the architecture of mmseg. I must write the true backbone( without the decode phase) and skip connection neck? Thanks a lot!

e.g. UNet (backbone) + FCN (decode_head):

model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(
        type='UNet',
        in_channels=3,
        base_channels=64,
        num_stages=5,
        strides=(1, 1, 1, 1, 1),
        enc_num_convs=(2, 2, 2, 2, 2),
        dec_num_convs=(2, 2, 2, 2),
        downsamples=(True, True, True, True),
        enc_dilations=(1, 1, 1, 1, 1),
        dec_dilations=(1, 1, 1, 1),
        with_cp=False,
        conv_cfg=None,
        norm_cfg=norm_cfg,
        act_cfg=dict(type='ReLU'),
        upsample_cfg=dict(type='InterpConv'),
        norm_eval=False),
    decode_head=dict(
        type='FCNHead',
        in_channels=64,
        in_index=4,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=128,
        in_index=3,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)))

Is this equivalent to the official U-Net paper? Thanks in advance.

Junjun2016 commented 3 years ago

@YLyeliang Thanks for your reply. After reading your reply, my understanding is the decode_head of mmseg is just predict the 8× downsampling of the backbone,without the upsampling option to decode? Am I right? If I only select the backbone of unet to train,what is the type of model I should choose? I see there only are 'EncoderDecoder' and 'CascadeEncoderDecoder' which must select the decoder_head? Or it can not work for the architecture of mmseg. I must write the true backbone( without the decode phase) and skip connection neck? Thanks a lot!

e.g. UNet (backbone) + FCN (decode_head):

model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(
        type='UNet',
        in_channels=3,
        base_channels=64,
        num_stages=5,
        strides=(1, 1, 1, 1, 1),
        enc_num_convs=(2, 2, 2, 2, 2),
        dec_num_convs=(2, 2, 2, 2),
        downsamples=(True, True, True, True),
        enc_dilations=(1, 1, 1, 1, 1),
        dec_dilations=(1, 1, 1, 1),
        with_cp=False,
        conv_cfg=None,
        norm_cfg=norm_cfg,
        act_cfg=dict(type='ReLU'),
        upsample_cfg=dict(type='InterpConv'),
        norm_eval=False),
    decode_head=dict(
        type='FCNHead',
        in_channels=64,
        in_index=4,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=128,
        in_index=3,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)))

Is this equivalent to the official U-Net paper? Thanks in advance.

Empirical. The larger the image, the larger the crop size.

rubeea commented 3 years ago

@Junjun2016 I am using a crop size of 256x256 with a stride of 170 for an image of size 540x360 and it is giving satisfactory results. Should I modify the crop size to something smaller? Moreover, can you tell me how can we set the optimal learning rate for training the model. One of the solutions is to implement the lr finders such as: https://github.com/davidtvs/pytorch-lr-finder/blob/master/examples/lrfinder_cifar10.ipynb. How can we achieve this in mmsegmentation framework?

Junjun2016 commented 3 years ago

For a fair comparison, we use the same learning rate, but you can tune the learning rate and crop size according to your task. There is no general rule.

rubeea commented 3 years ago

For a fair comparison, we use the same learning rate, but you can tune the learning rate and crop size according to your task. There is no general rule. Hey @Junjun2016, Thanks for the reply. One more question: Does mmsegmentation allow us to use a hyperparameter search framework such as optuna or Lr finder for finding the optimal learning rate? Because it is very difficult to search the LR manually. Also can you explain what is the difference between the decode loss and the auxiliary loss? Also can be use different losses for these two heads?

Junjun2016 commented 3 years ago

For a fair comparison, we use the same learning rate, but you can tune the learning rate and crop size according to your task. There is no general rule. Hey @Junjun2016, Thanks for the reply. One more question: Does mmsegmentation allow us to use a hyperparameter search framework such as optuna or Lr finder for finding the optimal learning rate? Because it is very difficult to search the LR manually. Also can you explain what is the difference between the decode loss and the auxiliary loss? Also can be use different losses for these two heads?

That's a really good question. This is in our future plans, but we are currently understaffed. If you are interested, you can contribute together. As for decode loss and the auxiliary loss, usually, we use the same loss and the two can also be different. The auxiliary helps optimize the learning process and we drop it in the inference phrase.

rubeea commented 3 years ago

For a fair comparison, we use the same learning rate, but you can tune the learning rate and crop size according to your task. There is no general rule. Hey @Junjun2016, Thanks for the reply. One more question: Does mmsegmentation allow us to use a hyperparameter search framework such as optuna or Lr finder for finding the optimal learning rate? Because it is very difficult to search the LR manually. Also can you explain what is the difference between the decode loss and the auxiliary loss? Also can be use different losses for these two heads?

That's a really good question. This is in our future plans, but we are currently understaffed. If you are interested, you can contribute together. As for decode loss and the auxiliary loss, usually, we use the same loss and the two can also be different. The auxiliary helps optimize the learning process and we drop it in the inference phrase.

Thanks for the explanation. Sure I'll be happy to contribute. I am working to implement lr finder for mmsegmentation framework.

Junjun2016 commented 3 years ago

Hi @rubeea Thanks in advance.

rubeea commented 3 years ago

Hi @Junjun2016 one last question. As far as I understand, I believe that decode and aux heads are being used in parallel in the model config right? Or are they cascaded? Thanks in advance.

Junjun2016 commented 3 years ago

Hi @Junjun2016 one last question. As far as I understand, I believe that decode and aux heads are being used in parallel in the model config right? Or are they cascaded? Thanks in advance.

In parallel. Supervise different stages.

rubeea commented 3 years ago

Hi @Junjun2016 one last question. As far as I understand, I believe that decode and aux heads are being used in parallel in the model config right? Or are they cascaded? Thanks in advance.

In parallel. Supervise different stages.

Can you kindly provide a graphical illustration of the mmsegmentation U-Net with FCN decode and auxiliary heads to help us understand the architecture in a better manner. Moreover, can you specify what is the input feature index specified by the "in_index" argument in the based decode_head.py. Thanks in advance.

ChenJiangxi commented 2 years ago

Hi @Junjun2016 one last question. As far as I understand, I believe that decode and aux heads are being used in parallel in the model config right? Or are they cascaded? Thanks in advance.

In parallel. Supervise different stages.

Can you kindly provide a graphical illustration of the mmsegmentation U-Net with FCN decode and auxiliary heads to help us understand the architecture in a better manner. Moreover, can you specify what is the input feature index specified by the "in_index" argument in the based decode_head.py. Thanks in advance.

I have the same problem

rubeea commented 2 years ago

Hi @Junjun2016 one last question. As far as I understand, I believe that decode and aux heads are being used in parallel in the model config right? Or are they cascaded? Thanks in advance.

In parallel. Supervise different stages.

Can you kindly provide a graphical illustration of the mmsegmentation U-Net with FCN decode and auxiliary heads to help us understand the architecture in a better manner. Moreover, can you specify what is the input feature index specified by the "in_index" argument in the based decode_head.py. Thanks in advance.

I have the same problem

@ChenJiangxi Hi, I was able to figure out the working mechanism. You can find illustration on my github page: https://github.com/rubeea/focal_phi_loss_mmsegmentation