Runtime Analysis - Githubissues

sair-lab / AirVO

An Illumination-Robust Point-Line Visual Odometry (IROS 2023)

BSD 3-Clause "New" or "Revised" License

479 stars 61 forks source link

Runtime Analysis #108

Open Wu-ZW opened 1 month ago

Wu-ZW commented 1 month ago

when I run euroc datasets, some parameters as follows: image size is set 640*480, superpoint max_keypoints is set 200, the time of detecting and tracking feature points for one image is about 92 ms, more than 65 ms in paper. Is any parameters i need to adjust?

Wu-ZW commented 1 month ago

l used Orin NX (16GB) edition.

xukuanHIT commented 1 month ago

@Wu-ZW HI, we run on the Nvidia Jetson AGX Xavier for runtime analysis. The computing capability of Jetson AGX Xavier is better than your Orin NX.

Wu-ZW commented 1 month ago

thanks for your replay! emmm, the computing capability of jeston orin nx (8.7) is larger than AGX Xavier(7.2).

xukuanHIT commented 1 month ago

Sorry for my mistake. The comparison can be found here, but it is true that jeston orin nx is more powerful than AGX Xavier. Can you tell me your version of TensorRT? and have you tried the original Python version of superpoint and superglue on your jeston orin nx?

Wu-ZW commented 1 month ago

I have run original code of sp and sg using PyTorch-CUDA, it is Inefficient. the consumption of point extraction and point matching is unstable, the parameters as follow: tensorrt + CUDA ....version as follows:

xukuanHIT commented 1 month ago

Can you compare the average runtime of C++ and Python code? On our Jetson platform, the C++ feature extraction and matching is about 6x faster than the Python version. I think if a similar improvement can be achieved on your Jetson, we can conclude that Orin NX is actually not as good as AGX Xavier.

Wu-ZW commented 1 month ago

parameters: max_keypoints:200 keypoint_threshold:0.004 match_threshold:0.1 Python (Pytorch-CUDA) Code: Superpoint time: 0.10092282295227051s Superglue time: 0.10480713844299316s C++ TensorRT Code: sp:adout 40 ms sg:about 50 ms

Wu-ZW commented 1 month ago

In jeston Orin NX， the C++ feature extraction and matching is about 2x faster than the Python version. the improvement is not very obvious.

xukuanHIT commented 1 month ago

It's strange. I am not sure whether it is also caused by the TensorRT version. We have also encountered the efficiency problem when using TensorRT 8.5 on GeForce RTX GPUs that are lower than 40 series. It is recommended to try to downgrade the TensorRT version to 8.4.1.5 if possible.

xukuanHIT commented 1 month ago

May be similar to this issue.

Wu-ZW commented 1 month ago

thanks a lot. I am also confused about above results. Addition, could I ask a question about how to export the PyTorch model to ONNX (int64)? I saw the scripts of convert_int32.py, is it convert INT64 to INT32 or FLOAT 32 to INT32?

Wu-ZW commented 1 month ago

May be similar to this issue. I doubt that "read image again", the same code runs normal on other platform.

xukuanHIT commented 1 month ago

thanks a lot. I am also confused about above results. Addition, could I ask a question about how to export the PyTorch model to ONNX (int64)? I saw the scripts of convert_int32.py, is it convert INT64 to INT32 or FLOAT 32 to INT32?

Yes, just run the script.

xukuanHIT commented 1 month ago

May be similar to this issue. I doubt that "read image again", the same code runs normal on other platform.

I mean the different environments may affect the runtime.