thomasverelst / dynconv

Code for Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference (CVPR2020)
https://arxiv.org/abs/1912.03203
126 stars 14 forks source link

A question about soft-mask calculation #1

Open peter943 opened 4 years ago

peter943 commented 4 years ago

Wonderful job!I studied your paper and code these days, which is very enlightening to me.

I have a question about the code to calculate the soft-mask by soft = self.maskconv(x). I'm not quite sure what the reason for choosing this (conv+fc) network to calculate the soft-mask. Thank you for your kind help.

thomasverelst commented 4 years ago

Hi, thanks for your interest in our work! I suppose you are referring to the classification code. In classification, we use the same mask unit as the work we compare with (SACT). The mask unit is shown in Figure 6 of their paper ( https://arxiv.org/abs/1612.02297 ). It uses a convolution combined with a global average pooling to capture image context. We refer to this as "squeeze unit" as it resembles a squeeze operation of Squeezenet.

However, on pose estimation we noticed the squeeze unit had a significant performance hit (even though the amount of FLOPS is negligible). Table 2 in our paper ( https://arxiv.org/pdf/1912.03203.pdf ) compares the accuracy and inference speed of a simple 1x1 convolution and the squeeze unit. The 1x1 convolution results in slightly lower accuracy but faster inference.

Therefore: Classification experiments -> Squeeze unit for accuracy reasons Pose estimation -> 1x1 convolution for inference speed reasons

peter943 commented 4 years ago

@thomasverelst Thank you for your patient reply. It helps me a lot.