Open Amarintine opened 4 years ago
On GPUs, when FLOPS is small, the main constrain of speed is the bandwidth. Depthwise conv is not so fast on GPUs but super fast on CPUs. That is why MobileNet and GhostNet mainly test speed on ARM/CPU. If you are familar to Chinese, you could refer to https://zhuanlan.zhihu.com/p/122943688 and https://www.zhihu.com/question/339909499.
In addition, your code for testing GPU time is not correct. You should add torch.cuda.synchronize()
after forward. Please refer to https://discuss.pytorch.org/t/measuring-gpu-tensor-operation-speed/2513
Hi,i have tested the network with mobilenet,but i can see that the speed is not so fast compared with mobilenetv2, #ghostnet flops:147.505M, params:3.903M
mobilenetv2 flops:312.852M, params:2.225M
so i don't know what's wrong with it?the ghostnet params is higher, in fact test running time code: x = torch.randn(32, 3, 224, 224) for _ in range(30): with torch.no_grad(): inputs = x.cuda() outputs = model(inputs) print(time.time() - t) t = time.time() it seems that mobilenetv2 is faster than ghostnet???