open-mmlab / mmsegmentation

OpenMMLab Semantic Segmentation Toolbox and Benchmark.
https://mmsegmentation.readthedocs.io/en/main/
Apache License 2.0
7.94k stars 2.57k forks source link

Deeplabv3plus inference time #1531

Open wwjwy opened 2 years ago

wwjwy commented 2 years ago

HI, I have a confusion hoping to get help. Why is the inference speed of Deeplabv3plus much lower than other models with similar parameters and computational complexity(GFLOPs)?

MeowZheng commented 2 years ago

Might give us more clear comparison? like what are the 'other models'?

wwjwy commented 2 years ago

modified_twins_upernet

Input shape: (3, 512, 512) Flops: 237.35 GFLOPs Params: 71.05 M

14.5fps

convnext_deeplabv3plus

Input shape: (3, 512, 512) Flops: 58.12 GFLOPs Params: 64.71 M

15.36fps

The amount of calculation and parameters of deeplanv3plus are smaller, and the inference speed is not very faster?

timothylimyl commented 2 years ago

@wwjwy can you share your config file that swaps the backbone to ConvNext?

wwjwy commented 2 years ago

@wwjwy can you share your config file that swaps the backbone to ConvNext?

model = dict( type='EncoderDecoder', pretrained=None, backbone=dict( type='mmcls.ConvNeXt', arch='small', out_indices=[0, 1, 2, 3], drop_path_rate=0.3, layer_scale_init_value=1.0, gap_before_final_norm=False, init_cfg=dict( type='Pretrained', checkpoint= 'https://download.openmmlab.com/mmclassification/v0/convnext/downstream/convnext-small_3rdparty_32xb128-noema_in1k_20220301-303e75e3.pth', prefix='backbone.')), decode_head=dict( type='DepthwiseSeparableASPPHead', in_channels=768, in_index=3, channels=512, dilations=(1, 12, 24, 36), c1_in_channels=96, c1_channels=48, dropout_ratio=0.1, num_classes=5, norm_cfg=dict(type='SyncBN', requires_grad=True), align_corners=False, loss_decode=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), auxiliary_head=dict( type='FCNHead', in_channels=384, in_index=2, channels=256, num_convs=1, concat_input=False, dropout_ratio=0.1, num_classes=5, norm_cfg=dict(type='SyncBN', requires_grad=True), align_corners=False, loss_decode=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), train_cfg=dict(), test_cfg=dict(mode='whole'))

timothylimyl commented 2 years ago

@wwjwy I think there is some fundamental problem with the way ConvNeXt is providing the feature map output.

from mmcls.models import ConvNeXt

import torch

self = ConvNeXt(arch="small", out_indices=(0, 1, 2, 3))
self.eval()
inputs = torch.rand(1, 3, 1024, 1024)
level_outputs = self.forward(inputs)

for level_out in level_outputs:

    print(tuple(level_out.shape))

>>>> (1, 96)
>>>> (1, 192)
>>>> (1, 384)
>>>> (1, 768)

As you can see, the feature map is gone (height, width dimension). Not too sure why is this the case with mmcls for convnext.

Edit: saw that you already put gap_before_final_norm=False, that fixes this issue but as per commented by the authors:

                    # The output of LayerNorm2d may be discontiguous, which
                    # may cause some problem in the downstream tasks