Do you speed up by TensorRT ?

wu-ruijie commented 4 years ago

v5 is so fast! I even dare to imagine how fast it is speeded up by TensorRT, Do you has any job about it ?

github-actions[bot] commented 4 years ago

Hello @wu-ruijie, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

glenn-jocher commented 4 years ago

@wang-xinyu did a great TensorRT implementation of our https://github.com/ultralytics/yolov3 repo here (which supports both YOLOv3 and YOLOv4), he might best answer this question. https://github.com/wang-xinyu/tensorrtx/tree/master/yolov3-spp

HaxThePlanet commented 4 years ago

Tensor core support would be amazing!

sljlp commented 4 years ago

Hi! I tested yolo5-s on cpu by directly running detect.py and the inference speed is only 3 fps.Could you please give me some advice?I want to make it 30 fps at least.

glenn-jocher commented 4 years ago

@sljlp you might want to see 'Running yolov5 on CPU' #37

The default --img-size for detect.py is 640, which you can reduce significantly to get the FPS you are looking for.

glenn-jocher commented 4 years ago

@sljlp one caveat is --img-size must be a multiple of the largest stride, 32. So acceptable sizes are 320, 288, 256, etc.

glenn-jocher commented 4 years ago

Update: I've pushed more robust error-checking on --img-size now in 099e6f5ebd31416f33d047249382624ad5489550, so if a user accidentally requests an invalid size (which is not divisible by 32), the code will warn and automatically correct the value to the nearest valid --img-size.

thancaocuong commented 4 years ago

@glenn-jocher Can you provide yolov5.weights file. I've found that to convert yolo to tensorrt, we need the weights file to use with (https://github.com/wang-xinyu/tensorrtx/)

glenn-jocher commented 4 years ago

@thancaocuong there is no such file.

TrojanXu commented 4 years ago

I have a python implementation here, with NMS, https://github.com/TrojanXu/yolov5-tensorrt

wang-xinyu commented 4 years ago

Hi @glenn-jocher

I just implemented yolov5-s in my repo https://github.com/wang-xinyu/tensorrtx/tree/master/yolov5 , and test on my machine. yolov5-m, yolov5-l, etc, will come out soon.

Models	Device	BatchSize	Mode	Input Shape(HxW)	FPS
YOLOv3-spp(darknet53)	Xeon E5-2620/GTX1080	1	FP16	608x608	38.5
YOLOv4(CSPDarknet53)	Xeon E5-2620/GTX1080	1	FP16	608x608	35.7
YOLOv5-s	Xeon E5-2620/GTX1080	1	FP16	608x608	167
YOLOv5-s	Xeon E5-2620/GTX1080	4	FP16	608x608	182
YOLOv5-s	Xeon E5-2620/GTX1080	8	FP16	608x608	186

wang-xinyu commented 4 years ago

Update! My tensorrt implementation already updated according to this commit https://github.com/ultralytics/yolov5/commit/364fcfd7dba53f46edd4f04c037a039c0a287972

The PANet updated.

Please find my repo https://github.com/wang-xinyu/tensorrtx

alexandrebvd commented 4 years ago

Update! My tensorrt implementation already updated according to this commit 364fcfd

The PANet updated.

Please find my repo https://github.com/wang-xinyu/tensorrtx

Thanks for sharing! Do you have plans to implement other yolov5 versions as well?

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

wang-xinyu commented 4 years ago

We have updated the yolov5 tensorrt according to the v2.0 release of this repo.

And made speed test on my machine.

Models	Device	BatchSize	Mode	Input Shape(HxW)	FPS
YOLOv5-s	Xeon E5-2620/GTX1080	1	FP16	608x608	142
YOLOv5-s	Xeon E5-2620/GTX1080	4	FP16	608x608	173
YOLOv5-s	Xeon E5-2620/GTX1080	8	FP16	608x608	190
YOLOv5-m	Xeon E5-2620/GTX1080	1	FP16	608x608	71
YOLOv5-l	Xeon E5-2620/GTX1080	1	FP16	608x608	40
YOLOv5-x	Xeon E5-2620/GTX1080	1	FP16	608x608	27

please find https://github.com/wang-xinyu/tensorrtx.

@glenn-jocher could you also add a link to https://github.com/wang-xinyu/tensorrtx in your Tutorials section?

glenn-jocher commented 4 years ago

@wang-xinyu thanks, yes this is a good idea. Can you submit a PR for the README please?

EDIT: I'll add a link to the export tutorial also.

ttanzhiqiang commented 3 years ago

https://github.com/ttanzhiqiang/onnx_tensorrt_project

fire717 commented 3 years ago

We have updated the yolov5 tensorrt according to the v2.0 release of this repo.

And made speed test on my machine.

Models Device BatchSize Mode Input Shape(HxW) FPS YOLOv5-s Xeon E5-2620/GTX1080 1 FP16 608x608 142 YOLOv5-s Xeon E5-2620/GTX1080 4 FP16 608x608 173 YOLOv5-s Xeon E5-2620/GTX1080 8 FP16 608x608 190 YOLOv5-m Xeon E5-2620/GTX1080 1 FP16 608x608 71 YOLOv5-l Xeon E5-2620/GTX1080 1 FP16 608x608 40 YOLOv5-x Xeon E5-2620/GTX1080 1 FP16 608x608 27 please find https://github.com/wang-xinyu/tensorrtx.

@glenn-jocher could you also add a link to https://github.com/wang-xinyu/tensorrtx in your Tutorials section?

Thx for your work, I just wonder how do u test FPS with batchsize. Cause our video is just one img flow, every img is in a serial line, so why could u use batchsize more than 1?

ultralytics / yolov5

Do you speed up by TensorRT ? #45