Closed YoungjaeDev closed 1 year ago
Hi @youngjae-avikus , sorry for missing this ticket. I think this is expected, seems it will do serialization operation first and it's a very time-consuming process to do this serialization.
@zhiqwang
thank you Sorry for the confusion. I would like to tell you the exact issue process
To be precise, using the export_tensorrt_engine
api consumes most of the enqueue time, and the memcpy code mentioned above does not take much time. However, if I directly insert the onnx extracted from ultralytics (the reason for this is that the newly renovated architecture has a new Block or Layer, so it cannot be converted directly from yolo-rt-stack), strangely, the enqueue is short, but the memcpy is long.The conclusion is that Enqueue+memcpy takes the same time (Less test cases, but probably..!) but there is a time difference depending on how the engine is created. Can you tell me why?
I think this should also meet expectations. NMS is equivalent to a filter, putting NMS into the model will greatly reduce the number of output tensors. You can observe that the D2H time will be greatly reduced, but the cost is to put this part of the calculation on the device, which may increase the enqueue time.
@zhiqwang
Thanks your answer bugt, Well, it seems that the two methods(input pt or onnx) I'm comparing are the same in that the engine with the same NMS_plugin added comes out in the end. However, if you put the onnx extracted from ultralytics directly into export_tensorrt_engine, the enqueue time for the engine that comes out is less, and the memcpy time is longer.
@zhiqwang
If I build and use the engine with Nms_TRT as End2End with the above code, D2H takes more than EnqueueV2. Do you have any other secrets to yolo-rt-stack about make engine
If I build and use the engine with Nms_TRT as End2End with the above code, D2H takes more than EnqueueV2.
I remember that the part of his repository that integrates NMS was originally inherited from the yolort repo, but I don't know if he later added some new techniques to it.
Do you have any other secrets to yolo-rt-stack about make engine
Nope
Since the mAP is almost the same, there is no problem using it, right? I mentioned the phenomenon because it was a little strange.
Since the mAP is almost the same, there is no problem using it, right?
Yep
I mentioned the phenomenon because it was a little strange.
Indeed, but the information here is limited, and sorry I cannot analyze the reasons behind it.
🐛 Describe the bug
There is no problem when I put pt file directly into export_tensorrt_engine If put the onnx file in the export_tensorrt_engine model_path immediately after exporting onnx from ultralytics, the model is created. (0.1ms vs 6ms in my PC)
https://github.com/zhiqwang/yolov5-rt-stack/blob/8b578eb9a7910f1dcb28188a36c8c540d15a9430/deployment/tensorrt/main.cpp#L382-L408
Versions
No specifics