Closed maocaixia closed 1 year ago
Hi @xinyuliu-jeffrey , Thanks a lot! I test the infer time, and find the infer time of efficientvit_m0 is 19ms, but the infer time of vit_base is 7ms, It seems that there is no advantage. the test is in RTX4090 and Batch=1. vit_base: model = timm.create_model("vit_base_patch16_384", pretrained=True). Is it normal ?
@maocaixia This phenomenon is quite strange. Although we reported batch size 2048 speed, efficientvitm0 with batch size 1 still shows a satisfactory speed at my side. Have you tried the speed_test here? Or could you please provide more details about your testing, e.g., input resolution, testing scripts, etc. Thanks!
import os import sys from timm.models import create_model import timm import torch import time import numpy as np
os.environ['CUDA_VISIBLE_DEVICES'] = '6' from model.build import EfficientViT_M0 model = EfficientViT_M0(pretrained='efficientvit_m0') out = model(image) model = model.cuda() model = model.eval() image = torch.randn(1,3,224,224) image = image.cuda() cnt = 0 times = [] while cnt < 100: cnt += 1 t1 = time.time() out = model(image) if cnt > 30: times.append(time.time()-t1) print('infer time: ', np.array(times).mean())
@xinyuliu-jeffrey Here are my code. And what is the infer time at your side ? what kind of gpu.
Hi @maocaixia , thanks for the information. I will try to test your script on my side and respond to you asap.
Hi @maocaixia , thanks for your question again.
I tested the speed of EfficientViT and ViT on a T4 GPU, results are shown below: | model\throughput(images/s) | bs=1 | bs=16 | bs=64 | bs=256 | bs=1024 |
---|---|---|---|---|---|---|
vit_base_patch16_224 | 182 | 231 | 224 | 223 | 221 | |
efficientvit_m0 | 150 | 1803 | 7028 | 8375 | 8350 |
The s/image is the reciprocal of the throughput. EfficientViT is much superior to ViT on large batch size settings, and indeed has a slight deficiency on bs=1.
However, it should be noticed that testing inference speed on GPU with batch size 1 may not make much sense for measuring the efficiency of the model. On one hand, the speed measurements may be dominated by data transfer time rather than actual computation time if only one image is fed to the model, which makes it hard to judge whether the model is efficient or not. On the other hand, processing with GPU with only 1 batch size does not utilize the parallel processing power of a GPU effectively, and neither unleash the efficiency of the model. That's why most works report their throughput/latency which could maximize the GPU use (e.g. ViT and DeiT use ~300-~3000, Swin uses 64), and EfficientViT shows remarkable speed with these popular settings.
Hope this clarifies, thank you!
Best, Xinyu
Hi @xinyuliu-jeffrey , thanks for your reply.
How do you test the throughput with batch size 1? Is that use the single-model inference time calculated when batch=1, and then calculate the inverse? like infer time(batch=1, t) and throughput=1/t.
I test infer time is 19ms with batch 1 on RTX4090, so the throughput is 1000/19=52 images/s. Is it true ?
I use the model on a hardware with small computing power only 1TOPS, so I focus more on the infer time.
Hi @maocaixia , for throughput computing, please kindly refer to https://github.com/microsoft/Cream/blob/main/EfficientViT/classification/speed_test.py.
OK, thanks a lot.
Hi @maocaixia ,
The script files are updated. Thanks!