Closed sanbuphy closed 8 months ago
Thanks for your question. I had a quick pass through this paper.
Our model family has been named EfficientViT since May 2022. So, our work is not a follow-up work of this CVPR2023 paper.
While the CVPR2023 work used the same name as ours, the target task of their model family is different from ours. Our model family is mainly designed for efficient high-resolution dense prediction, while their model family is primarily designed for image classification. The core building block is also different. Our model family is based on a multi-scale linear attention module, while their model family is based on a variant of the original softmax attention.
Thanks for your question. I had a quick pass through this paper.
Our model family has been named EfficientViT since May 2022. So, our work is not a follow-up work of this CVPR2023 paper.
While the CVPR2023 work used the same name as ours, the target task of their model family is different from ours. Our model family is mainly designed for efficient high-resolution dense prediction, while their model family is primarily designed for image classification. The core building block is also different. Our model family is based on a multi-scale linear attention module, while their model family is based on a variant of the original softmax attention.
Thank you for your detail description, I want to ask if you can show me the result (speed or parameters or precision) between that? I'm afraid of making a mistake in determining which one will be better.
Here is the comparison on ImageNet:
Model | Resolution | ImageNet Top1 Acc | ImageNet Top5 Acc | Params | MACs | A100 Throughput | Checkpoint |
---|---|---|---|---|---|---|---|
CVPR2023-EfficientViT-M5 | 512x512 | 80.8 | 95.5 | 12M | 2.7G | 3713 image/s | |
EfficientViT-L1 | 224x224 | 84.484 | 96.862 | 53M | 5.3G | 6207 image/s | link |
EfficientViT-L2 | 224x224 | 85.050 | 97.090 | 64M | 6.9G | 4998 image/s | link |
Hope this is helpful for you.
Best, Han
Hi, how do you measure the throughput on A100? I saw that the latency on the edge device is measured with bs=1. I wonder what bs is on A100. Is it still 1, or a fixed number, or making the full use of memory?
@TsingWei By default, the batch size is 256. When the model is too large, the batch size will be reduced from 256 to 128/64/32/... until it can fit into the GPU memory.
Here is the comparison on ImageNet:
Model Resolution ImageNet Top1 Acc ImageNet Top5 Acc Params MACs A100 Throughput Checkpoint CVPR2023-EfficientViT-M5 512x512 80.8 95.5 12M 2.7G 3713 image/s
EfficientViT-L1 224x224 84.484 96.862 53M 5.3G 6207 image/s link EfficientViT-L2 224x224 85.050 97.090 64M 6.9G 4998 image/s link Hope this is helpful for you.Best, Han
Thank you very much! It seems that ICCV EfficientViT is a better option. I appreciate your patience and excellent work.
Hi, i see the different efficient-vit model on cvpr2023 https://github.com/microsoft/Cream/tree/main/EfficientViT, Could you show me the difference? Thank you very much.