Any hints on Batch=1 inference?

lixinghe1999 commented 1 year ago

Thanks for your work! I have tested the model, and when batch=32, the latency is good. However, even the batch=1, DynamicViT can be slower than ViT. The reason should be CUDA acceleration.

Can you give me some of your understanding of this problem? Or do you think it is possible to solve it?

raoyongming commented 1 year ago

Hi @lixinghe1999, it is indeed an interesting observation. I just tried to set the batch size to 1, and DynamicViT is slower than ViT in this case. Actual speed-up performance may be related to many factors about hardware and software. I guess the more operations brought by the prediction network cause the slower inference speed. Since when batch size = 1, we usually have enough computational resources to compute all sub-operations (mul, add, etc) in a layer in parallel. The actual speed is mainly determined by the number of sequentially executed operations instead of FLOPs. When bs=1, I also find the speeds of DynamicViT and ViT become closer if I set the number of prediction networks to 1 instead of 3.

lixinghe1999 commented 1 year ago

Thank you for your clarification. As a result, seems DynamicViT (the prediction network) don't need to be launched if the batch_size is small enough, respect to the cuda cores/ number of threads of the GPU (I am not sure).

raoyongming / DynamicViT

Any hints on Batch=1 inference? #30