tensorflow / tpu

Reference models and tools for Cloud TPUs.
https://cloud.google.com/tpu/
Apache License 2.0
5.21k stars 1.77k forks source link

EfficientNet SE number of filters #616

Open dkatsios opened 4 years ago

dkatsios commented 4 years ago

Why is the squeezed number of filters at SE based on the block's input filters number and not the tensor's channels? Based on the SE-Nets paper (https://arxiv.org/pdf/1709.01507.pdf) Fig. 3 it is C/r where C the number of input channels. For example at EfficientetB0 at the 2nd block the input_filters is 16 but inside the SE block the tensor has 96 filters (expanded). the squeezed tensor is 1, 1, 4 (16/4) instead of 1, 1, 24 (96/4). So it goes 96 -> 4 -> 96 instead of 96->24->96 (which seems more intuitive).

Renthal commented 4 years ago

From herr it appears that indeed, the size of the squeezed tensor is input_channels/SE_ratio, but as per original paper implementation this should be expanded_size/SE_ratio. The SE size appears to be consistent to the one it would have if it would be applied to the residual layer as proposed in MobileNetV3. However, from here this does not seem to be the case, but I am not entirely sure about it. I am also a bit puzzled on this.

mingxingtan commented 4 years ago

Hi @dkatsios and @Renthal,

Yes, we intended to use SE ratio respecting to the input filters, following the same design as MnasNet. Our purpose is to minimize the runtime overhead caused by SE.

MobileNetV3 uses SE ratio respecting to expanded_size, as explained in section 5.3. Notably, MobileNetV3 uses hard-swish, and we have to write a highly-optimized and dedicated low-level kernel implementation for hard-swish in order to make it fast on mobile CPUs. With all these extra system-level (TF&TFLite) optimizations, MobileNetV3 uses effectively larger SE ratio.