The paper is very interesting to me, since SwinIR suffers from high memory consumption and slow convergence. I recently have two questions about the proposed framework.
Firstly, two consecutive GMSAs can share the attention maps, while the shifted window is applied to partition neighboring pixels together, which should derive different attention patterns. How is it addressed or is interleaved sharing mechanism adopted?
Secondly, the results in Table 3 show the reduction of FLOPs and Latency by using the shifted mechanism. How could this method reduce the computational footprint? Is it solely due to the removal of the masking and relative positional encoding used in SwinIR?
Finally, could you present the convergence of ELAN, compared with SwinIR and other CNN-based models? It can provide a more comprehensive comparison and better show the advantages of ELAN.
Thanks a lot.
BTW, the neat model architecture is definitely appealing.
Hello,
The paper is very interesting to me, since SwinIR suffers from high memory consumption and slow convergence. I recently have two questions about the proposed framework.
Firstly, two consecutive GMSAs can share the attention maps, while the shifted window is applied to partition neighboring pixels together, which should derive different attention patterns. How is it addressed or is interleaved sharing mechanism adopted?
Secondly, the results in Table 3 show the reduction of FLOPs and Latency by using the shifted mechanism. How could this method reduce the computational footprint? Is it solely due to the removal of the masking and relative positional encoding used in SwinIR?
Finally, could you present the convergence of ELAN, compared with SwinIR and other CNN-based models? It can provide a more comprehensive comparison and better show the advantages of ELAN.
Thanks a lot.
BTW, the neat model architecture is definitely appealing.