Any latency information to be reported?

JUGGHM commented 1 year ago

Thank you for your impressive work Tsai! I am wondering whether there are any latency comparisons against other convnet/transformer models? Since the network is built by efficient 3x3 convolution and linear operators, it is expected to have better throughputs.

nightsnack commented 1 year ago

Hi JUGGHM, The answer is no. Though 3x3 conv could be faster than 7x7, multi-level fusion can bring in much more memory cost. And revcol is deeper and narrower than convnext/swin/vit, this could be slow.

Here is a similar question from reviewer and our replies.

"Real throughput/latency needs to be measured to more accurately validate the model budget, not just FLOPs or params. The introduced connections seem to introduce larger latency on real hardware which is not so related to FLOPs numbers."

Indeed we are aware of the current model variants of RevCol introduce large latency compared with other works of similar #params and #FLOPs. However, we did this work mainly on research purpose rather than providing an efficient substitute for ConvNeXt. And we do made an analysis for latency. We analyze the constitution of latency and get the following results. Among all the factors of latency, fragmented access of memory takes a large parts. In specific, RevCol-L consists 88 building blocks with (8, 16, 48, 16) blocks in each level. ConvNeXt-L only consists only 36 blocks. To make a fairly comparison, we construct a 88-block ConvNeXt of similar flops, ConvNeXt-L (deep), then measure the latency. In Tab. 1, we can see compared to ConvNeXt-L (deep), RevCol-L introduce 23% computation overhead. Next, we further analyze the fusion module. After removing the up-sample and down-sample the connection, the overhead of RevCol-L is negligible. Note that the massive blocks design and up/down-sample connection is not a necessity for reversible and disentanglement of information (e.g. In our RevCol-ViT, which is isotropic, feature fusion can be implemented as a simple summation) as in Appendix B. We think these can be overcame by some hardware or compiler optimization (e.g. online operator fuse like JIT). Besides, if we could find a wide and shallow building block for each level, the latency would not be a problem. This is a direction for further research.

Model	#Blocks	Latency/ms	ΔΔ
ConvNeXt-L	3,3,27,3	78.3
ConvNeXt-L (deep)	8,16,48,16	100.5	0%
RevCol-L	8,16,48,16	119.9	19.89%
RevCol-L - upsample	8,16,48,16	111.8	11.79%
RevCol-L - upsample - downsample	8,16,48,16	103.8	3.79%

JUGGHM commented 1 year ago

Thank you for your detailed reply!

megvii-research / RevCol

Any latency information to be reported? #2