open-mmlab / mmsegmentation

OpenMMLab Semantic Segmentation Toolbox and Benchmark.
https://mmsegmentation.readthedocs.io/en/main/
Apache License 2.0
7.94k stars 2.57k forks source link

Error implementation of deeplabv3plus #1465

Open jianlong-yuan opened 2 years ago

jianlong-yuan commented 2 years ago

We found that the method you implement is different from the official implementation. For example, the channel in decoder is different from the official implementation. Why don't you use same architecture ?

MengzhangLI commented 2 years ago

Due to its old age, please list detailed differences which are not identical with original paper. Thus we could try to answer this question.

jianlong-yuan commented 2 years ago
  1. Channel in decode_head' channels=512, acturally, deeplabv3+' channel is 256.
  2. Deeplabv3+ doesn't have aux layers.
MengzhangLI commented 2 years ago

Thanks for your feedback.

I've checked DeeplabV3+ of mmseg, it is pushed along with initial commit: https://github.com/open-mmlab/mmsegmentation/commit/b2724da80bf2dc203b3db69497541ad52ce74ba0.

If it is really different from original paper, we would add notification in Deeplabv3+ readme to remind users asap.

Best,

jianlong-yuan commented 2 years ago

any news?

MengzhangLI commented 2 years ago

any news?

Sorry for late reply. That because when re-implementing Deeplabv3plus, it has been a long time and we just followed some popular/ famous third-party repos of Deeplabv3plus, which are also widely used. Thus we do not strictly follow setting of original paper.

jianlong-yuan commented 2 years ago

some popular/ famous third-party repos ? Where is? Could you provide it? Let's learn about it?

timothylimyl commented 2 years ago

@MengzhangLI

I agree with @jianlong-yuan , the decode_head channels (if following original) is supposed to be 256, as per the paper (page6, Proposed Decoder:

"We apply another 1 × 1 convolution on the low-level features to reduce the number of channels, since the corresponding low- level features usually contain a large number of channels (e.g., 256 or 512) which may outweigh the importance of the rich encoder features (only 256 channels in our model) and make the training harder"

original code ref: 1 2, just took a quick look, the code is very messy but I think these are the references to the aspp filters

However, I think this is just a minor hyperparameter tweak. Regarding the aux layers, I think it can be commented out if you do not want to use it during training (have not personally tried it).