Inconsistency between paper and code

akshaykulkarni07 commented 3 years ago

Hi, congratulations on your great work and acceptance in CVPR '21. Thanks for releasing the code and model weights.

In the paper, you mention using DeepLabv2 with ResNet101 backbone. However, your code actually makes use of a modified ASPP module (ClassifierModule2 in models/deeplabv2.py) while actually ClassifierModule has to be used for DeepLabv2. Similar issues were raised here and here which mention that this type of ASPP module is used in DeepLabv3+ which has a much better performance compared to DeepLabv2 (both issues were raised in Jan. 2020). Could you please confirm this point and if you have also performed experiments with the original DeepLabv2 model, could you report those results for a fair comparison with prior arts?

panzhang0212 commented 3 years ago

Hi, this modified ASPP is borrowed from IJCAI (2020) and IJCV paper and is not the same as the modified ASPP in Deeplabv3 or Deeplabv3+. Also, as the ablation study in the Deeplabv3 paper, the most effective modification in ASPP is image-level features pooling with the multi-grid method. And their effective modification is not included in our code. So we think the modified ASPP in our code will not change the result.

Why we do not use the original ASPP in Deeplabv2? Because we need to calculate the prototype, this needs a feature point（dim:1xC） to represent a pixel in the image, that is to say, we need an FC layer or 1x1 Conv layer as a classifier. So we borrow this block from IJCAI (2020) and IJCV paper

panzhang0212 commented 3 years ago

And we can see the ablation study in our paper. The conventional self-training is trained by our modified ASPP, the performance is similar to the original ASPP in deeplabv2(45.9 mIoU reported in CRST).

super233 commented 3 years ago

Why not modify the out_channels=256 of Conv2d in ASPP and add an extra Conv2d with in_channels=256, out_channels=num_classes to achieve this? I think this is the easiest way to get a feature point of each pixel in an image. Have you tried and proved useless of this way? Please forgive me if this is a stupid question. 😀

class MultiOutASPP(nn.Module):
    def __init__(self, inplanes, dilation_series=[6, 12, 18, 24], padding_series=[6, 12, 18, 24], outplanes=19):
        super(MultiOutASPP, self).__init__()
        self.conv2d_list = nn.ModuleList()
        for dilation, padding in zip(dilation_series, padding_series):
            self.conv2d_list.append(
                nn.Conv2d(inplanes, 256, kernel_size=3, stride=1, padding=padding, dilation=dilation, bias=True))

        self.classifier = nn.Conv2d(256, outplanes, kernel_size=1, padding=0, dilation=1, bias=True)

        for m in self.conv2d_list:
            m.weight.data.normal_(0, 0.01)

        self.classifier.weight.data.normal_(0, 0.01)

    def forward(self, x):
        feat = self.conv2d_list[0](x)
        for i in range(len(self.conv2d_list) - 1):
            feat += self.conv2d_list[i + 1](x)

        out = self.classifier(feat)

        return {'feat': feat, 'out': out}

panzhang0212 commented 3 years ago

If we modify the out_channels=256 of Conv2d in ASPP, the capability of ASPP will smaller than standard ASPP(out_channels=1024). For a fair comparison with Seg_Uncertainty, we borrow this block from it.

jiangzhengkai commented 3 years ago

I think it's better for you to report results with Deeplabv2(should not be difficult). The mainstream choice of segmentation is Deeplabv2, like SDCA, FADA paper.

super233 commented 2 years ago

Anything new? Is there anyone that has reproduced ProDA with minimum changes of ASPP?

microsoft / ProDA

Inconsistency between paper and code #2