Open IamShubhamGupto opened 7 months ago
TensorRT export logs and statistics - trt.log
mode | observed FPS |
---|---|
XFeat (cuda) | 21.4 |
XFeat (tensorrt) | 24.7 |
XFeat (cuda)
XFeat (tensorrt)
TensoRT is more sparse for almost similar poses but generally seems to have the same performance if not better.
Device: Nvidia Jetson AGX Xavier - Jetpack 5.1
Docker image: dustynv/l4t-ml:r35.4.1
Hello @IamShubhamGupto and @acai66, thank you so much for your contributions!
I notice that both of you are providing an ONNX export, and it probably doesn't make sense to merge both since they serve the same purpose. Do you have any suggestions on how to proceed? In my opinion, it makes sense to choose the one that supports the earliest operator set, as it would maximize the model's compatibility with the greatest number of devices and libraries
Hello @IamShubhamGupto and @acai66, thank you so much for your contributions!
I notice that both of you are providing an ONNX export, and it probably doesn't make sense to merge both since they serve the same purpose. Do you have any suggestions on how to proceed? In my opinion, it makes sense to choose the one that supports the earliest operator set, as it would maximize the model's compatibility with the greatest number of devices and libraries
sure, maybe me and @acai66 can work on collaborating our work. In summary, there is no specific operator set defined in this PR and is up to the user to provide one, by default it is the latest supported and that is 17. Users can choose to use an older op set such as 13 by providing command line arguments.
python3 export.py --onnx_opset 13
Another key difference in our PRs is @acai66 can quantize the XFeatModel
and XFeat
modules where as I choose to only quantize XFeatModel
. This feature can be merged into my branch as the code to then export TensorRT engine is already available here .
Let me know what you guys think
Hey just bumping this conversation up again @guipotje @acai66
Hello @IamShubhamGupto and @acai66, thank you so much for your contributions! I notice that both of you are providing an ONNX export, and it probably doesn't make sense to merge both since they serve the same purpose. Do you have any suggestions on how to proceed? In my opinion, it makes sense to choose the one that supports the earliest operator set, as it would maximize the model's compatibility with the greatest number of devices and libraries
sure, maybe me and @acai66 can work on collaborating our work. In summary, there is no specific operator set defined in this PR and is up to the user to provide one, by default it is the latest supported and that is 17. Users can choose to use an older op set such as 13 by providing command line arguments.
python3 export.py --onnx_opset 13
Another key difference in our PRs is @acai66 can quantize the
XFeatModel
andXFeat
modules where as I choose to only quantizeXFeatModel
. This feature can be merged into my branch as the code to then export TensorRT engine is already available here .Let me know what you guys think
Thank you for your suggestion. I concur with your viewpoint and believe that merging our work would indeed be a beneficial course of action.
My PR is aimed at improving the convenience and compatibility of deployment, especially in environments without PyTorch. I've uploaded an example of using the ONNX Runtime Python API, which can be easily reimplemented using the ONNX Runtime C++ API.
My PR is aimed at improving the convenience and compatibility of deployment, especially in environments without PyTorch. I've uploaded an example of using the ONNX Runtime Python API, which can be easily reimplemented using the ONNX Runtime C++ API.
that's amazing, we can then find some time this week to start merging the examples. I'll give your editor access to my fork.
My PR is aimed at improving the convenience and compatibility of deployment, especially in environments without PyTorch. I've uploaded an example of using the ONNX Runtime Python API, which can be easily reimplemented using the ONNX Runtime C++ API.
that's amazing, we can then find some time this week to start merging the examples. I'll give your editor access to my fork.
Fantastic, @IamShubhamGupto and @acai66! While I'm not experienced in ONNX, I'm here to help with anything related to the original code — just let me know.
My PR is aimed at improving the convenience and compatibility of deployment, especially in environments without PyTorch. I've uploaded an example of using the ONNX Runtime Python API, which can be easily reimplemented using the ONNX Runtime C++ API.
that's amazing, we can then find some time this week to start merging the examples. I'll give your editor access to my fork.
Thank you very much for the invitation. I am concerned that I have little experience with TensorRT deployment. I have invited you to be a collaborator on the forked repository.
My PR is aimed at improving the convenience and compatibility of deployment, especially in environments without PyTorch. I've uploaded an example of using the ONNX Runtime Python API, which can be easily reimplemented using the ONNX Runtime C++ API.
that's amazing, we can then find some time this week to start merging the examples. I'll give your editor access to my fork.
Fantastic, @IamShubhamGupto and @acai66! While I'm not experienced in ONNX, I'm here to help with anything related to the original code — just let me know.
I apologize for my poor English; I am using translation software to communicate. I have made some modifications to the source code to improve the compatibility and convenience of exporting ONNX. However, I am unsure if this will affect the original PyTorch functionality and performance. I am unable to conduct sufficient related tests.
@acai66 for now I am pausing development on my branch. we will merge onnx export from your PR and then I will continue TensorRT development here.
I will review and commit changes on your PR in the next couple of hours.
Hello @guipotje and team,
In this PR, I am adding support for exporting ONNX and TensorRT models. On running the TensorRT engine, we can observe improvement in FPS. Since the evaluation code is currently unavailable, I cannot quantify the accuracy but it feels similar to the provided
xfeat.pt
file performance. The following is currently supported and changed:realtime_demo.py
with enginerealtime_demo.py
backwards compatible withcv2
version4.5.0
README.md
for TensorRT exportWe can possibly further improve FPS by simplifying the onnx export using
onnx-simplier
but that can be an iterative PR.I will be attaching logs and performance observations in comments below. Let me know if there any changes required. Thank you again for the amazing work!