nnstreamer / nnstreamer

:twisted_rightwards_arrows: Neural Network (NN) Streamer, Stream Processing Paradigm for Neural Network Apps/Devices.
https://nnstreamer.ai
GNU Lesser General Public License v2.1
683 stars 171 forks source link

Upgrade to tensorrt 10? #4477

Open bveldhoen opened 4 weeks ago

bveldhoen commented 4 weeks ago

Hello,

I'm running into an issue w.r.t. the tensor_filter using tensorrt. I'm trying to build it, but need to revert back to ubuntu 20.04 when using the nvidia container nvcr.io/nvidia/cuda:11.0.3-cudnn8-devel-ubuntu20.04. However, I would like to use a more recent version of this container (for instance, nvcr.io/nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04).

From the meson.build file, line 276, I assume that cuda 11.0 is the most recent version supported.

1) Are there any plans to upgrade the version of tensorrt?

2) Is there an example showing the usage of tensorrt in the tensor_filter? (I couldn't find an example in nnstreamer-examples repository.)

Thank you in advance!

taos-ci commented 4 weeks ago

:octocat: cibot: Thank you for posting issue #4477. The person in charge will reply soon.

myungjoo commented 3 weeks ago

Could you please build and test with recent versions of CUDA? I think it will probably work with recent CUDA, but I just don't have test reports from such.

Then, after testing it, you can add supported versions to the list of cuda_vers and upstream it.

bveldhoen commented 3 weeks ago

Hi @myungjoo, Thanks for your reply!

My issue isn't so much with the version of cuda, but more with the version of tensorrt. I couldn't find the UffParser anymore, it might have been deprecated. Next to that, I'd really want nnstreamer to work with our existing tensorrt 10 .engine model files, which we are currently using in a different solution.

I went ahead and made some changes in order to compile against tensorrt 10. Please find this branch: https://github.com/bveldhoen/nnstreamer/tree/feature/upgrade_tensorrt_to_10

It is still heavy Work-In-Progress with much to be improved.

I'm currently trying (and failing) to test a pipeline with the appropriate elements and configuration, with a model with the following characteristics (output generated with polygraphy) :

[I] ==== TensorRT Engine ====
    Name: Unnamed Network 0 | Explicit Batch Engine

    ---- 1 Engine Input(s) ----
    {images [dtype=float32, shape=(1, 3, 640, 640)]}

    ---- 1 Engine Output(s) ----
    {output0 [dtype=float32, shape=(1, 25200, 10)]}

If you have suggestions for how to test a pipeline with such a model, please let me know!

myungjoo commented 3 weeks ago
v4l2src ! videoconvert ! videoscale ! videorate !
video/x-raw width=640,height=640,format=RGB !
tensor_converter !
tensor_transform mode=transpose option=1:2:0:3 !         ### This changes 3:640:640 (1, 640, 640, 3) --> 640:640:3 (1, 3, 640, 640)
tensor_transform mode=arithmetic option=typecast:float32,add:-127.5,div:127.5 !  ### Assuming that your model wants -1.0 ~ 1.0 float32. This differs per model, please check the specification of your test model.
tensor_filter ${YOUR OPTIONS FOR TENSORRT} !
filesink location=${OUTPUT_FILE}

Then, you can verify with the stored ${OUTPUT_FILE}. If you want to create a test case for automated testing, you may create an input file and replace v4l2src with filesrc.

bveldhoen commented 3 weeks ago

Hi @myungjoo,

I managed to get it to work, after shooting myself in the foot a couple of times :). See the branch above.

Using the ultralytics yolov5su model from here: https://github.com/ultralytics/ultralytics

Command line to test it:

gst-launch-1.0 \
  v4l2src name=cam_src ! videoconvert ! videoscale ! \
    video/x-raw,width=1000,height=1000,format=RGB,pixel-aspect-ratio=1/1,framerate=30/1 ! tee name=t \
  t. ! queue leaky=2 max-size-buffers=2 ! videoscale ! \
    video/x-raw,width=640,height=640,format=RGB ! tensor_converter ! \
    tensor_transform mode=transpose option=1:2:0:3 ! \
    tensor_transform mode=arithmetic option=typecast:float32,div:255.0 ! \
    tensor_filter framework=tensorrt model=/mnt/projects/nnstreamer-example/models/yolov5su_gpu_1_3_640_640.engine latency=1 ! \
    other/tensors,num_tensors=1,types=float32,dimensions=8400:84:1,format=static ! \
    tensor_transform mode=transpose option=1:0:2:3 ! \
    tensor_decoder mode=bounding_boxes option1=yolov8 option2=/mnt/projects/nnstreamer-example/bash_script/example_yolo/coco.txt option3=1 option4=1000:1000 option5=640:640 ! \
    video/x-raw,width=1000,height=1000,format=RGBA ! mix.sink_0 \
  t. ! queue leaky=2 max-size-buffers=10 ! mix.sink_1 \
  compositor name=mix sink_0::zorder=2 sink_1::zorder=1 ! videoconvert ! autovideosink sync=false

What I still don't understand completely is the fact that I need to use

tensor_decoder mode=bounding_boxes option1=yolov8

while it is a yolov5 model.

Anyway, I'll round things up by adding a test case with file input.

bveldhoen commented 2 weeks ago

I've added a basic runTest.sh to be able to invoke the tensorrt tensor_filter, but the test doesn't do any validation. Do you have a suggestion/pointer on how this test can be improved?

Also, I've encountered a rather strange issue when testing the example in nnstreamer-example. When I tested with a 640x640 model, the output video looked correct, with correct bounding boxes.However, when testing with a 320x320 model, the bounding boxes didn't show, however the object types were printed on the top of the screen:

image

This is from a recording of me holding up a cup, therefore the string "cupson" ("person" overwritten with "cup").

Considering that the filter and decoding steps seem to work (because of the successful decoding of the object types, and I assume also the bounding boxes), I think this may be related to the way the video is displayed. Do you have any ideas?

myungjoo commented 2 weeks ago

Hi @myungjoo,

I managed to get it to work, after shooting myself in the foot a couple of times :). See the branch above.

Using the ultralytics yolov5su model from here: https://github.com/ultralytics/ultralytics

Command line to test it:

gst-launch-1.0 \
  v4l2src name=cam_src ! videoconvert ! videoscale ! \
    video/x-raw,width=1000,height=1000,format=RGB,pixel-aspect-ratio=1/1,framerate=30/1 ! tee name=t \
  t. ! queue leaky=2 max-size-buffers=2 ! videoscale ! \
    video/x-raw,width=640,height=640,format=RGB ! tensor_converter ! \
    tensor_transform mode=transpose option=1:2:0:3 ! \
    tensor_transform mode=arithmetic option=typecast:float32,div:255.0 ! \
    tensor_filter framework=tensorrt model=/mnt/projects/nnstreamer-example/models/yolov5su_gpu_1_3_640_640.engine latency=1 ! \
    other/tensors,num_tensors=1,types=float32,dimensions=8400:84:1,format=static ! \
    tensor_transform mode=transpose option=1:0:2:3 ! \
    tensor_decoder mode=bounding_boxes option1=yolov8 option2=/mnt/projects/nnstreamer-example/bash_script/example_yolo/coco.txt option3=1 option4=1000:1000 option5=640:640 ! \
    video/x-raw,width=1000,height=1000,format=RGBA ! mix.sink_0 \
  t. ! queue leaky=2 max-size-buffers=10 ! mix.sink_1 \
  compositor name=mix sink_0::zorder=2 sink_1::zorder=1 ! videoconvert ! autovideosink sync=false

What I still don't understand completely is the fact that I need to use

tensor_decoder mode=bounding_boxes option1=yolov8

while it is a yolov5 model.

Maybe... it is because the two have the same output tensor formats and none of tensor_decoder's bounding-boxes contributors have never cared to look at yolov5.

myungjoo commented 2 weeks ago

I've added a basic runTest.sh to be able to invoke the tensorrt tensor_filter, but the test doesn't do any validation. Do you have a suggestion/pointer on how this test can be improved?

As in https://github.com/nnstreamer/nnstreamer/blob/main/tests/nnstreamer_filter_lua/runTest.sh ?

You can run a small object classification model with a small png/jpg files (e.g., apple, banana, pen, ...) and check the detected class; then you can ensure that tensorrt has run the given model properly. It is recommended to use really tiny models for such tests. (MNIST is perfect, too.)

For negative test cases, you may put an invalid model file, unavailable file paths, or invalid input streams.

Also, I've encountered a rather strange issue when testing the example in nnstreamer-example. When I tested with a 640x640 model, the output video looked correct, with correct bounding boxes.However, when testing with a 320x320 model, the bounding boxes didn't show, however the object types were printed on the top of the screen:

image

This is from a recording of me holding up a cup, therefore the string "cupson" ("person" overwritten with "cup").

Considering that the filter and decoding steps seem to work (because of the successful decoding of the object types, and I assume also the bounding boxes), I think this may be related to the way the video is displayed. Do you have any ideas?

I guess that it is caused by the text stream format of GStreamer. GStreamer's standard text streams do NOT append '\0' at the end of a string in a text stream. You need to process accordingly when you convert strings between C/C++ and GStreamer text stream.

bveldhoen commented 2 weeks ago

I've added a basic runTest.sh to be able to invoke the tensorrt tensor_filter, but the test doesn't do any validation. Do you have a suggestion/pointer on how this test can be improved?

As in https://github.com/nnstreamer/nnstreamer/blob/main/tests/nnstreamer_filter_lua/runTest.sh ?

Pls. see runTest.sh in the PR below: https://github.com/nnstreamer/nnstreamer/pull/4482/files#diff-6c39b882d0d1ad2d81487d1aed2d7875a2a4e5c8eb8e8678e7725b9ec5a1e879 It uses the yolov5 nano model of around 10 MB, which is of similar size to other test model files (in tests/test_models/models). I also made this model file part of the commit: https://github.com/nnstreamer/nnstreamer/pull/4482/files#diff-046f353916309673df1ba6ce199214d4bbe8b1a0996de06c7a448835ec43d12c

You can run a small object classification model with a small png/jpg files (e.g., apple, banana, pen, ...) and check the detected class; then you can ensure that tensorrt has run the given model properly. It is recommended to use really tiny models for such tests. (MNIST is perfect, too.)

The current implementation requires a tensorrt .engine file, or a compatible .onnx file. There's no onnx mnist model file available in nnstreamer test models. The mobilenet onnx file couldn't be converted to a tensorrt engine, most likely due to OPSets, which are not supported by tensorrt:

$ polygraphy inspect model ../../tests/test_models/models/mobilenet_v2_quant.onnx
[I] Loading model: /mnt/projects/nnstreamer/tests/test_models/models/mobilenet_v2_quant.onnx
[I] ==== ONNX Model ====
    Name: torch-jit-export | ONNX Opset: 12 | Other Opsets: {'ai.onnx.ml': 2, 'ai.onnx.training': 1, 'ai.onnx.preview.training': 1, 'com.microsoft': 1, 'com.microsoft.experimental': 1, 'com.microsoft.nchwc': 1, 'com.microsoft.mlfeaturizers': 1}
...

I can look into it further, but I could also try to use the yolov5nu.onnx file. What do you prefer?

For negative test cases, you may put an invalid model file, unavailable file paths, or invalid input streams.

Also, I've encountered a rather strange issue when testing the example in nnstreamer-example. When I tested with a 640x640 model, the output video looked correct, with correct bounding boxes.However, when testing with a 320x320 model, the bounding boxes didn't show, however the object types were printed on the top of the screen: image

With the 320x320 model only, there is an additional problem in the display that the bounding boxes aren't drawn at all. Note that this seems to be a text (and bounding box) drawing problem, not a string termination problem.

This is from a recording of me holding up a cup, therefore the string "cupson" ("person" overwritten with "cup"). Considering that the filter and decoding steps seem to work (because of the successful decoding of the object types, and I assume also the bounding boxes), I think this may be related to the way the video is displayed. Do you have any ideas?

I guess that it is caused by the text stream format of GStreamer. GStreamer's standard text streams do NOT append '\0' at the end of a string in a text stream. You need to process accordingly when you convert strings between C/C++ and GStreamer text stream.

The example uses the (unchanged) nnstreamer elements, such as tensor_decoder mode=bounding_boxes and ximagesink, in order to interpret and draw the model output. I would assume that the tensor_decoder performs the proper string conversions? Please see this example: https://github.com/nnstreamer/nnstreamer-example/pull/338/files#diff-2b84c14651d9a96e8767854fe61758668592c0703b16259e9195e591abdd7ca5