Open cwentland0 opened 4 years ago
The discrepancy between TRT and ONNXRuntime seems to be irrelevant to the converter itself. It looks the ConvTranspose operator upgrading from opset 1 to opset 11 only clarify some behavior like padding. Yes, the converter only tag the converted model with opset 11, but opset 9 is also good. BTW, can TRT engine support the opset 11 now?
I am building and training a convolutional autoencoder in tf.keras, and converting the two halves of the networks (the encoder and the decoder) separately first to ONNX (via keras2onnx) and then to a serialized TensorRT engine (via trtexec). The network only utilizes a few fundamental elements: 2D convolution kernels (in the encoder), 2D transpose-convolution kernels (in the decoder), fully-connected layers, flatten layers, reshape layers, ELU activations, and linear activations. The software versions used are as follows:
Tensorflow (Keras) 2.0.0 keras2onnx 1.6.5 TensorRT 7.0.0.11 CUDA 10.2 (Tensorflow uses 10.1)
The tf.keras models are 32-bit float networks. When I test the inference of the ONNX model (with ONNX Runtime) and the TRT engine (with the TRT Python API) on random inputs, I get some very strange answers. First, the TRT encoder exactly matches the output of the TF encoder, so apparently no problem there. However, the ONNX encoder reports O(1e-6) error compared to the TF encoder! How the TRT parser is able to fix this issue I have no idea.
Even stranger are the decoder results. The TRT and ONNX decoders both report O(1e-7) error compared to the TF decoder. Strangely, the minimum and maximum error are often slightly different (O(1e-8)) between the TRT and ONNX decoder results. Even worse, on some rare occasions the minimum and maximum error values will change from test to test, despite the fact that the NumPy RNG is seeded with the same value every time. This would imply there is some stochastic behavior in the ONNX/TRT decoder inference.
My testing code is given below:
Some sample output is shown below (warnings just a given due to the fixed network size):
Interestingly, I note that keras2onnx reports that the maximum opset needed is opset 9, despite that the ONNX documentation reports that ConvTranspose was first added in opset 11 (although ConvTranspose-1 existed in the initial release). Even if I target opset 11, the resulting model is built with opset 9 (according to trtexec).
Ultimately, I don’t care about the ONNX engine as long as the eventual TRT engine is correct. But the fact that the intermediate format (ONNX) is incorrect leaves me with no guarantee that the final format (TRT) will be correct. Therefore, I’m asking here before I move on to the TensorRT Github page. Any insight would be very much appreciated.
I am happy to share the trained models (TF, TFT, and ONNX) is someone else would like to inspect them. As a side note, if anyone could explain the phrase "Weights must be an initializer" in the requirements for the ConvTranspose operation found at the onnx-tensorrt page, that would be immensely appreciated.