yasenh / libtorch-yolov5

A LibTorch inference implementation of the yolov5
MIT License
372 stars 114 forks source link

time of post process is way too long(后处理的时间太长了) #38

Closed ZOUYIyi closed 3 years ago

ZOUYIyi commented 3 years ago

在python代码中用yolov5x模型一张图的预测时间是40ms。包括了得到pred的时间和nms的时间。 在c++代码中,pred的时间在20ms左右,但是nms时间达到了50ms,但实际上,nms中最耗时的部分是从gpu到cpu的数据转化,实际的nms计算并不会这样耗时,这个应该是有办法可以优化的,盼能提点一二。

yasenh commented 3 years ago

@ZOUYIyi Could you try to run benchmark with following command: $ CUDA_LAUNCH_BLOCKING=1 ./libtorch-yolov5 --source ../images/bus.jpg --weights ../weights/yolov5s.torchscript.pt --gpu --view-img

ZOUYIyi commented 3 years ago

@ZOUYIyi Could you try to run benchmark with following command: $ CUDA_LAUNCH_BLOCKING=1 ./libtorch-yolov5 --source ../images/bus.jpg --weights ../weights/yolov5s.torchscript.pt --gpu --view-img

我在win10上使用这个程序,需要找一下是怎么设置的。但是猜想可知,如果把gpu的并行停止,虽然数据转化时间很短,但是forward得到pred的时间一定会变长,这样总时间应该还是很长。

yasenh commented 3 years ago

@ZOUYIyi Yes, I think the most easiest way is to use some API from torchvision, I am not using it here because I want to minimize the dependency in this repo. But feel free to try it by your own.

You can also refer to issue #3

ZOUYIyi commented 3 years ago

@yasenh 我也是这样猜测的,应该把从pred取box和nms等等的操作都放在torch的gpu里面完成,到最后的取box数据的时候才从gpu里面拿数据出来,但是python的版本里面,这个过程用到了torchvison的nms,我也不希望引入太多依赖,那么自己写一个基于torch的nms的函数?这样可行吗?

yasenh commented 3 years ago

@ZOUYIyi Yes, you can write the nms on gpu by yourself.