microsoft / Cream

This is a collection of our NAS and Vision Transformer work.
MIT License
1.69k stars 230 forks source link

There is no dist_test.sh file in Instance Segmentation test? #174

Closed maocaixia closed 1 year ago

xinyuliu-jeffrey commented 1 year ago

Hi @maocaixia ,

The script files are updated. Thanks!

maocaixia commented 1 year ago

Hi @xinyuliu-jeffrey , Thanks a lot! I test the infer time, and find the infer time of efficientvit_m0 is 19ms, but the infer time of vit_base is 7ms, It seems that there is no advantage. the test is in RTX4090 and Batch=1. vit_base: model = timm.create_model("vit_base_patch16_384", pretrained=True). Is it normal ?

xinyuliu-jeffrey commented 1 year ago

@maocaixia This phenomenon is quite strange. Although we reported batch size 2048 speed, efficientvitm0 with batch size 1 still shows a satisfactory speed at my side. Have you tried the speed_test here? Or could you please provide more details about your testing, e.g., input resolution, testing scripts, etc. Thanks!

maocaixia commented 1 year ago

import os import sys from timm.models import create_model import timm import torch import time import numpy as np

os.environ['CUDA_VISIBLE_DEVICES'] = '6' from model.build import EfficientViT_M0 model = EfficientViT_M0(pretrained='efficientvit_m0') out = model(image) model = model.cuda() model = model.eval() image = torch.randn(1,3,224,224) image = image.cuda() cnt = 0 times = [] while cnt < 100: cnt += 1 t1 = time.time() out = model(image) if cnt > 30: times.append(time.time()-t1) print('infer time: ', np.array(times).mean())

maocaixia commented 1 year ago

@xinyuliu-jeffrey Here are my code. And what is the infer time at your side ? what kind of gpu.

xinyuliu-jeffrey commented 1 year ago

Hi @maocaixia , thanks for the information. I will try to test your script on my side and respond to you asap.

xinyuliu-jeffrey commented 1 year ago

Hi @maocaixia , thanks for your question again.

I tested the speed of EfficientViT and ViT on a T4 GPU, results are shown below: model\throughput(images/s) bs=1 bs=16 bs=64 bs=256 bs=1024
vit_base_patch16_224 182 231 224 223 221
efficientvit_m0 150 1803 7028 8375 8350

The s/image is the reciprocal of the throughput. EfficientViT is much superior to ViT on large batch size settings, and indeed has a slight deficiency on bs=1.

However, it should be noticed that testing inference speed on GPU with batch size 1 may not make much sense for measuring the efficiency of the model. On one hand, the speed measurements may be dominated by data transfer time rather than actual computation time if only one image is fed to the model, which makes it hard to judge whether the model is efficient or not. On the other hand, processing with GPU with only 1 batch size does not utilize the parallel processing power of a GPU effectively, and neither unleash the efficiency of the model. That's why most works report their throughput/latency which could maximize the GPU use (e.g. ViT and DeiT use ~300-~3000, Swin uses 64), and EfficientViT shows remarkable speed with these popular settings.

Hope this clarifies, thank you!

Best, Xinyu

maocaixia commented 1 year ago

Hi @xinyuliu-jeffrey , thanks for your reply.

How do you test the throughput with batch size 1? Is that use the single-model inference time calculated when batch=1, and then calculate the inverse? like infer time(batch=1, t) and throughput=1/t.

I test infer time is 19ms with batch 1 on RTX4090, so the throughput is 1000/19=52 images/s. Is it true ?

I use the model on a hardware with small computing power only 1TOPS, so I focus more on the infer time.

xinyuliu-jeffrey commented 1 year ago

Hi @maocaixia , for throughput computing, please kindly refer to https://github.com/microsoft/Cream/blob/main/EfficientViT/classification/speed_test.py.

maocaixia commented 1 year ago

OK, thanks a lot.