Closed JonyJiang123 closed 2 years ago
Try ./yolov5 -d
instead of .py.
刚试了./yolov5 -d 确实显存只有1G,为什么trt.py增加了一倍
You can try to remove torchvision.nms in .py.
刚试了下,跟之前一样,还是占用4G显存。
代码段显存占用情况
def __init__(self, engine_file_path):
self.ctx = cuda.Device(0).make_context()
stream = cuda.Stream()
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
runtime = trt.Runtime(TRT_LOGGER)
with open(engine_file_path, "rb") as f:
engine_data = f.read()
这里加载完就已经占用了
You can try tensorrt 8.0.
测试了Tensorrt 8.0.1.6,问题依旧一样。 无Tensort 显存: 1500M ./yolov5 -d 显存: 1100M yolov5_trt.py 显存: 4300M
No idea, many others have almost same memory cost in c++ and py. You can try other machine or open a thread in nvidia devtalk.
I have the exact same issue but using a Jetson Nano 4GB which simply doesn't have enough memory to run the yolo5_trt.py
script. I didn't even notice the problem on a different machine since it has much more memory, maybe that's why no one reported this issue before?
@wang-xinyu a penny for your thoughts: might this have something to do with the .so plugin used here? I now noticed that the RetinaNet example also uses a plugin (here) but currently I don't have the option of checking whether it eats up as much memory as well.
Question: Might this be caused by ctypes loading the whole static library with a lot of redundant sub-libraries, which the Cpp version does not do?
@MTDzi You can verify your thoughts by removing yololayer plugin in createEngine() in yolov5.cpp.
@wang-xinyu I wasn't sure how to do (I couldn't find the createEngine
function, I assumed you meant build_engine
) so this is my attempt:
I replaced the following in yolov5.cpp
:
auto yolo = addYoLoLayer(network, weightMap, det0, det1, det2);
yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
network->markOutput(*yolo->getOutput(0));
with
network->markOutput(*det0->getOutput(0));
and built the .engine file, then commented out the following line in yolov5_trt.py
:
# ctypes.CDLL(PLUGIN_LIBRARY)
and ran python yolov5_trt.py
.
But the memory consumption is the same, I'm looking at an increase from 1.5GB to 3.5GB (so that's ~2GB worth of RAM).
When running the Cpp version I get an increase of ~800MB.
If this is not what you were asking for, please give me a hint.
That‘s right, removing yololayer.
So it seems not the issue from plugin.
I think probably tensorrt python or other py packages causing the memory increase.
You might be right, I found this thread on NVIDIA's forum. Also this is an interesting lead where they suggest using cgroups to limit the amount of available memory which would, I guess, force CUDA runtime to free up some pieces of memory.
And, well, that would explain why the Cpp version needs roughly 800MB. But what I don't get is why the Python version needs so much more.
I'll give the cgroups approach a try and let you know how it went.
Followup: the cgroups solution didn't help, I see only a slightly lower memory consumption (3.3GB).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I meet this problem too. Using multiprocessing to set individual prcess may solve it.
Env
Yolov5
在我尝试tensorrt加速yolov5时 我的步骤
鑫宇大佬帮忙看看