time of post process is way too long(后处理的时间太长了)

yasenh / libtorch-yolov5

A LibTorch inference implementation of the yolov5

MIT License

372 stars 114 forks source link

time of post process is way too long(后处理的时间太长了) #38

Closed ZOUYIyi closed 3 years ago

ZOUYIyi commented 3 years ago

在python代码中用yolov5x模型一张图的预测时间是40ms。包括了得到pred的时间和nms的时间。在c++代码中，pred的时间在20ms左右，但是nms时间达到了50ms,但实际上，nms中最耗时的部分是从gpu到cpu的数据转化，实际的nms计算并不会这样耗时，这个应该是有办法可以优化的，盼能提点一二。

yasenh commented 3 years ago

@ZOUYIyi Could you try to run benchmark with following command: $ CUDA_LAUNCH_BLOCKING=1 ./libtorch-yolov5 --source ../images/bus.jpg --weights ../weights/yolov5s.torchscript.pt --gpu --view-img

ZOUYIyi commented 3 years ago

@ZOUYIyi Could you try to run benchmark with following command: $ CUDA_LAUNCH_BLOCKING=1 ./libtorch-yolov5 --source ../images/bus.jpg --weights ../weights/yolov5s.torchscript.pt --gpu --view-img

我在win10上使用这个程序，需要找一下是怎么设置的。但是猜想可知，如果把gpu的并行停止，虽然数据转化时间很短，但是forward得到pred的时间一定会变长，这样总时间应该还是很长。

yasenh commented 3 years ago

@ZOUYIyi Yes, I think the most easiest way is to use some API from torchvision, I am not using it here because I want to minimize the dependency in this repo. But feel free to try it by your own.

You can also refer to issue #3

ZOUYIyi commented 3 years ago

@yasenh 我也是这样猜测的，应该把从pred取box和nms等等的操作都放在torch的gpu里面完成，到最后的取box数据的时候才从gpu里面拿数据出来，但是python的版本里面，这个过程用到了torchvison的nms,我也不希望引入太多依赖，那么自己写一个基于torch的nms的函数？这样可行吗？

yasenh commented 3 years ago

@ZOUYIyi Yes, you can write the nms on gpu by yourself.