lhoyer / HRDA

[ECCV22] Official Implementation of HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation
Other
233 stars 31 forks source link

The theory underpinning advantage #33

Closed wjpoet closed 1 year ago

wjpoet commented 1 year ago

Hi, Ihoyer. What are the advantages of such an H/L resolution cropping choice over FPN? thank you

lhoyer commented 1 year ago

Hi @wjpoet,

Thank you for your interest in our work!

The main problem with FPN operating on full high-resolution images is that it would not fit into the GPU memory. Please note that the used DAFormer network already uses a feature pyramid similar to FPN. Therefore, we propose to train with small high-resolution crops and large low-resolution crops and learn how to fuse the predictions from both scales. Besides GPU memory, the multi-resolution fusion has the advantage that the adaptation can work on the better suited resolution. For example, high-resolution texture details of large objects are often domain-specific and LR inputs hide them away why they are better suited than HR inputs in this case. The scale attention learns to focus only on HR if it is necessary due to small objects or fine segmentation details. For more details, please refer to page 2+3, Sec. 4.2, Fig. 5, and Fig. 6.

Best, Lukas

wjpoet commented 1 year ago

Your answer is very helpful to me.Thank you!

---Original--- From: "Lukas @.> Date: Fri, Mar 31, 2023 16:02 PM To: @.>; Cc: @.**@.>; Subject: Re: [lhoyer/HRDA] The theory underpinning advantage (Issue #33)

Hi @wjpoet,

Thank you for your interest in our work!

The main problem with FPN operating on full high-resolution images is that it would not fit into the GPU memory. Please note that the used DAFormer network already uses a feature pyramid similar to FPN. Therefore, we propose to train with small high-resolution crops and large low-resolution crops and learn how to fuse the predictions from both scales. Besides GPU memory, the multi-resolution fusion has the advantage that the adaptation can work on the better suited resolution. For example, high-resolution texture details of large objects are often domain-specific and LR inputs hide them away why they are better suited than HR inputs in this case. The scale attention learns to focus only on HR if it is necessary due to small objects or fine segmentation details. For more details, please refer to page 2+3, Sec. 4.2, Fig. 5, and Fig. 6.

Best, Lukas

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>