mit-han-lab / efficientvit

EfficientViT is a new family of vision models for efficient high-resolution vision.
Apache License 2.0
1.59k stars 141 forks source link

what's the different between cvpr2023 efficientvit and iccv2023 efficientvit? #36

Closed sanbuphy closed 8 months ago

sanbuphy commented 8 months ago

Hi, i see the different efficient-vit model on cvpr2023 https://github.com/microsoft/Cream/tree/main/EfficientViT, Could you show me the difference? Thank you very much.

han-cai commented 8 months ago

Thanks for your question. I had a quick pass through this paper.

Our model family has been named EfficientViT since May 2022. So, our work is not a follow-up work of this CVPR2023 paper.

While the CVPR2023 work used the same name as ours, the target task of their model family is different from ours. Our model family is mainly designed for efficient high-resolution dense prediction, while their model family is primarily designed for image classification. The core building block is also different. Our model family is based on a multi-scale linear attention module, while their model family is based on a variant of the original softmax attention.

sanbuphy commented 8 months ago

Thanks for your question. I had a quick pass through this paper.

Our model family has been named EfficientViT since May 2022. So, our work is not a follow-up work of this CVPR2023 paper.

While the CVPR2023 work used the same name as ours, the target task of their model family is different from ours. Our model family is mainly designed for efficient high-resolution dense prediction, while their model family is primarily designed for image classification. The core building block is also different. Our model family is based on a multi-scale linear attention module, while their model family is based on a variant of the original softmax attention.

Thank you for your detail description, I want to ask if you can show me the result (speed or parameters or precision) between that? I'm afraid of making a mistake in determining which one will be better.

han-cai commented 8 months ago

Here is the comparison on ImageNet:

Model Resolution ImageNet Top1 Acc ImageNet Top5 Acc Params MACs A100 Throughput Checkpoint
CVPR2023-EfficientViT-M5 512x512 80.8 95.5 12M 2.7G 3713 image/s
EfficientViT-L1 224x224 84.484 96.862 53M 5.3G 6207 image/s link
EfficientViT-L2 224x224 85.050 97.090 64M 6.9G 4998 image/s link

Hope this is helpful for you.

Best, Han

TsingWei commented 8 months ago

Hi, how do you measure the throughput on A100? I saw that the latency on the edge device is measured with bs=1. I wonder what bs is on A100. Is it still 1, or a fixed number, or making the full use of memory?

han-cai commented 8 months ago

@TsingWei By default, the batch size is 256. When the model is too large, the batch size will be reduced from 256 to 128/64/32/... until it can fit into the GPU memory.

sanbuphy commented 8 months ago

Here is the comparison on ImageNet:

Model Resolution ImageNet Top1 Acc ImageNet Top5 Acc Params MACs A100 Throughput Checkpoint CVPR2023-EfficientViT-M5 512x512 80.8 95.5 12M 2.7G 3713 image/s
EfficientViT-L1 224x224 84.484 96.862 53M 5.3G 6207 image/s link EfficientViT-L2 224x224 85.050 97.090 64M 6.9G 4998 image/s link Hope this is helpful for you.

Best, Han

Thank you very much! It seems that ICCV EfficientViT is a better option. I appreciate your patience and excellent work.