vpulab / Semantic-Aware-Scene-Recognition

Code repository for paper https://www.sciencedirect.com/science/article/pii/S0031320320300613 @ Pattern Recognition 2020
MIT License
102 stars 17 forks source link

Why did the semantic score map become 152 channels #28

Open 2762481675 opened 2 years ago

2762481675 commented 2 years ago

Hello! First of all, thank you very much for sharing the source code of your paper, which has played a very positive role in my research work. It is well known that the number of object obtained from dataset ADE20K is 150, but for some reason, you set the number of channels in the source code to 152. Can you explain why?

Once again, I would like to express my sincere respect for your work

alexlopezcifuentes commented 2 years ago

Hi!

Thanks for taking the time to use the code and asking a question!

ADE20K dataset semantic segmentation labels are, if I remember properly, 151 not 150. There is an "unknown" class also encoded in the labels.

The number of channels is set to 152 because when we convert a semantic segmentation image with 151 classes to a one-hot encoded tensor the size of that tensor should be 151 + 1 in function make_one_hot:

https://github.com/vpulab/Semantic-Aware-Scene-Recognition/blob/2b78a3c5279207af74fb594d3381045e4767473c/Libs/Utils/utils.py#L258

Hope this solves your question!

Alex.

2762481675 commented 2 years ago

Hello! Thank you for your prompt reply in spite of your busy schedule I have debugged the source code of UPerNet and the resulting score graph is indeed 150. Of course, there may be some changes, so I will not worry about it.

I looked at the make_one_hot function, but I still don't understand why there is a +1 operation So the input to the actual score graph which is the semantic branch becomes 152 dimensions, and I think the 152nd dimension is all zeros, does that make any sense?

2762481675 commented 2 years ago

This is a screenshot of me debugging the source code of UPerNet

image