wang-xinyu / tensorrtx

Implementation of popular deep learning networks with TensorRT network definition API
MIT License
6.89k stars 1.76k forks source link

why is yolov4 slower than yolov3-spp? #10

Closed LukeAI closed 4 years ago

LukeAI commented 4 years ago

why is yolov4 slower than yolov3-spp in this repo?

Is the NMS being done on the cpu, rather than the GPU?

(thanks for this really interesting and educational repo.!)

wang-xinyu commented 4 years ago

@LukeAI

yolov4 is larger than yolov3-spp.

NMS is on CPU.

Thanks for your interests!

lavinan26 commented 4 years ago

@wang-xinyu I found yolov3 with 416 dimension much faster than yolov4 with 416 dimension running on C++ TensorRT implementation. Is this an expected output because of unoptimized mish activation function or I should get faster results for yolov4?

wang-xinyu commented 4 years ago

@lavinan26 are you using the newest codes? What's the specific latency did you get for yolov4 and yolov3? and what's the GPU are you using?

LukeAI commented 4 years ago

yolov3 is expected to be slightly faster than yolov4 at the same resolution. but not much. would be good to know your results?

wang-xinyu commented 4 years ago

@LukeAI For the yolov4 FPS test result, you can refer to https://github.com/AlexeyAB/darknet/pull/5453#issuecomment-624116856,

On my machine, GTX1080, yolov4 608x608, I got 20fps in darknet, and got about 40fps in my tensorrt implementation.

lavinan26 commented 4 years ago

@wang-xinyu @LukeAI I am just using a virtual machine with Tesla K80 GPU. The average latency I get for yolov4 416x416 with FP16 precision is 71 ms (14 FPS) and for yolov3 with same dimensions and precision Is 58 ms (17.3 FPS).

lavinan26 commented 4 years ago

@LukeAI I just rechecked the graph in yolov4 paper and found that for same dimension, yolov4 is supposed to be a little bit slower. Thanks.