Closed EnlNovius closed 2 years ago
TensorRt does not currently manage tensor2D
Interesting, 95% of the network consists of either 1D convolutions or Linear layers. The network contains an internal normalization layer (with a padding), which most likely is the cause of the problem, since it always gave us grief during exports. This is one of the reasons we decided to keep PyTorch and ONNX formats.
I am trying to translate the supplied ONXX network (files/silero_vad.onnx) to TensorRT (.trt).
But the main question is, why?
But the main question is, why?
I'm trying to use silero-vad in real time on an embedded system already using several TensorRt neural networks. Switching from a pytorch model to a TensorRt model normally allows optimization at the inference level (https://developer.nvidia.com/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/).
I just tried onnx_simplifier (https://github.com/daquexian/onnx-simplifier). The simplification of the tree solves both the first and the second problem.
$ onnxsim files/silero_vad.onnx files/silero_vad_onnxsim.onnx
However, a new problem arises:
$ trtexec --onnx=files/silero_vad_onnxsim.onnx
...
08/03/2022-16:00:23] [E] Error[4]: [graphShapeAnalyzer.cpp::processCheck::587] Error Code 4: Internal Error (Conv_81: spatial dimension of convolution output cannot be negative (build-time output dimension of axis 2 is -5))
[08/03/2022-16:00:23] [E] Error[2]: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
...
Problem solved:
Using onnx2trt
I got the following error:
[2022-08-04 09:00:13 ERROR] 4: [network.cpp::validate::2965] Error Code 4: Internal Error (Network has dynamic or shape inputs, but no optimization profile has been defined.)
I couldn't find a solution to solve this problem with the two tools, so I went back to the nvidia documentation (https://developer.nvidia.com/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/ and https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work_dynamic_shapes)
First, simplify the ONNX file:
$ onnxsim files/silero_vad.onnx files/silero_vad_onnxsim.onnx
Convert using python (or c++) the file to TensorRt:
Using the Code for test
on a 4 minutes audio file, average time of get_speech_timestamps
function (average on 100 calls):
Model | Average time to process an audio file of 4min |
---|---|
torch.jit.load('files/silero_vad.jit') | 4.34s |
OnnxWrapper('files/silero_vad.onnx') | 1.11s |
OnnxWrapper('files/silero_vad_onnxsim.onnx') | 0.93s |
TrtWrapper('files/silero_vad.engine') | 1.03s |
on a 4 minutes audio file
One audio chunk should take about ~1ms on one CPU thread. ONNX was similar.
By a simple calculation, a 4 minute audio file should take about:
The fact that this takes ~40 times slower is strange.
import pycuda.autoinit # You need this to init cuda
There is very little reason to run VAD on CUDA, because there is very little to gain. The model is very fast, most likely it will just incur an overhead to copy to and from the GPU.
The fact that this takes ~40 times slower is strange.
I wasn't clear in my results, I edited the post to correct that. The results that were displayed were those for running the network 100 times over 4min of video, so we were at 0.5ms / frame.
There is very little reason to run VAD on CUDA, because there is very little to gain. The model is very fast, most likely it will just incur an overhead to copy to and from the GPU.
For my project, the audio data is already on the gpu, since it goes through other networks. According to the documentation "Using batching or GPU can also improve performance considerably.. Having already the data on GPU and the GPU being supposed to accelerate the processing times a bit more, I wanted to try to see what it could give, even if the performance on silero CPU is already remarkable.
Since I managed to run silero on TensorRT, I think the issue is complete. Thanks for the answers and for the very good work done on silero-vad :smiley:
so we were at 0.5ms / frame.
Seems in line with our bechmarks, albeit we did not test on GPU.
"Using batching or GPU can also improve performance considerably.. Having already the data on GPU and the GPU being supposed to accelerate the processing times a bit more,
Well, batching would work for multiple streams at the same time better. You can find more details in the discussion via this link. Basically each batch element in one separate stream.
In any case many thanks for your input on this conversion. Hope someone finds it useful for their usecase.
Since I managed to run silero on TensorRT
Another questions is whether the model outputs are similar.
I will create a copy of this ticket as a discussion.
Export silero-vad ONNX to TensorRT
I am trying to translate the supplied ONXX network (
files/silero_vad.onnx
) to TensorRT (.trt). I tried two tools:trtexec
from nvidia (https://github.com/NVIDIA/TensorRT/tree/main/samples/trtexec)onnx2trté
(https://github.com/onnx/onnx-tensorrt)With both tools, I get the following error:
From what I could find, the problem would come from the fact that TensorRt does not currently manage tensor2D (https://forums.developer.nvidia.com/t/ishufflelayer-applied-to-shape-tensor-must-have-0-or-1-reshape-dimensions-dimensions-were-1-2/200183). A solution proposed in response is to use
polygraphy
with surgeon:Now if I apply one of my two tools, the problem seems to be solved, but another problem arises further on:
I haven't found a solution to this problem yet, does anyone have any idea how to solve this problem?