lhoyer / HRDA

[ECCV22] Official Implementation of HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation
Other
233 stars 31 forks source link

something about the scale attention #4

Closed Renp1ngs closed 2 years ago

Renp1ngs commented 2 years ago

Hi, I have some questions about scale attention.

  1. About the scale attention decoder, there seems to be some difference between paper and released code? Segformer decoder in the paper, DAFormer decoder in code. Will there be any difference in performance?
  2. In addition, can scale attention be understood as adding an additional segmentation head to process the context crop and get the result of the detail crop corresponding to the context crop? In The second paragraph on page 8, The scale attention decode... Is there something wrong with scale attention? It should be f^A(f^E(x_c))?
Renp1ngs commented 2 years ago
image

and a tiny bug

lhoyer commented 2 years ago
  1. About the scale attention decoder, there seems to be some difference between paper and released code? Segformer decoder in the paper, DAFormer decoder in code. Will there be any difference in performance?

The code uses a SegFormer decoder as scale attention decoder as described in the paper: https://github.com/lhoyer/HRDA/blob/a57d967e62f9280e67c45ad17a6fedaded827e7c/mmseg/models/decode_heads/hrda_head.py#L44

  1. In addition, can scale attention be understood as adding an additional segmentation head to process the context crop and get the result of the detail crop corresponding to the context crop?

The attention head predicts a weight map based on the features of the context crop in order to calculate the weighted sum of the segmentation predictions of context and detail crop in the region of the detail crop. The alignment of the corresponding pixels of context and detail prediction is based on the known crop regions and is not learned by the scale attention.

In The second paragraph on page 8, The scale attention decode... Is there something wrong with scale attention? It should be f^A(f^E(x_c))?

Yes, you are right. Thank you for spotting this typo. It will be fixed in a future version of the paper.

Renp1ngs commented 2 years ago

Thank you for your reply.

2 In addition, can scale attention be understood as adding an additional segmentation head to process the context crop and get the result of the detail crop corresponding to the context crop?

There may be some problems with my expression here. What I mean is that the added scale attention decoder can be understood from the code as adding an extra segmentation head to process the feature map, and multiply these calculated results with the original segmentation head?

lhoyer commented 2 years ago

What I mean is that the added scale attention decoder can be understood from the code as adding an extra segmentation head to process the feature map, and multiply these calculated results with the original segmentation head?

Yes, implementation-wise it can be seen that way.

Renp1ngs commented 2 years ago

Thank you for your reply, looking forward to more of your work.