megvii-research / TLC

Test-time Local Converter
Other
225 stars 10 forks source link

About MACs when using TLC #15

Open shnj1101 opened 1 year ago

shnj1101 commented 1 year ago

Hi, thank you very much for your exciting research. Your paper "4.3 Extensibility and Complexity" confirms that adding TLC to UNet, including SE, significantly improves performance with a slight increase in computational complexity. I have a question on this point.

In SE without TLC, the feature map size is (C, 1, 1) after global pooling. After that, it changes to (C/r, 1, 1) and (C, 1, 1) at Fully Connected layers. The MACs are (2C^2)/r. With TLC, the feature map is re-weighted by the element-wise attention, so the feature map size after global pooling is (C, H, W). Therefore, the computational complexity, including the Fully Connected layer, is HW*(2C^2)/r.

Is this correct? If so, I think the computation has increased. Do I misunderstand something?

Thank you in advance for your response.

achusky commented 1 year ago

Thanks for your interest.

Yes, the extra computation cost mainly comes from fully connected (FC) layers in SE. However, the detailed MACs are determined by the implementation.

In detail, when using TLC, the global pooling in SE module is converted to a local pooling operation with stride 1 and (Kh $\times$ Kw) window. Given a $H \times W$ feature as input, the size of the output (pooled) feature is $H' \times W'$, where $H'=H-Kh+1$ and $W' = W - Wh + 1$ because we do not pad the input.

For naive implementation, the pooled feature is padded to $H\times W$ first and then it goes through the FC layer. In this case, the MACs are HW(2C^2)/r.
While for faster implementation, the pooled feature can go through the FC layer first and then is padded to $H\times W$. In this case, the MACs are (H-Kh+1)
(W-Wh+1)*(2C^2)/r. The above two implementations are equivalent when using replication padding.

I hope this helps.

shnj1101 commented 1 year ago

Thank you for your response.

I recognized the following: The influence of image size cannot be eliminated. However, with exemplary implementation, the influence can be much smaller for practical image sizes (e.g., 512 x 512, also used in the paper).

Many papers citing TLC did not mention this point, which was helpful.

Thank you again for your time.