Open FrancescoSaverioZuppichini opened 1 year ago
Has anyone ever created a mixed precision model?
See https://github.com/microsoft/onnxconverter-common/issues/251 and https://github.com/microsoft/onnxconverter-common/issues/252 but they are not tensorrt related
Hi @FrancescoSaverioZuppichini if I may ask which issue do you expect to fix? The warning can be ignored if your model runs correctly.
No it doesn't, it takes a lot of time every time I want to run the onnx session due to casting. The issue shifted from making tensorrt working, to correctly export a mixed precision model and it looks like nobody to this date has ever successfully converted a model from PyTorch to onnx in mixed precision
@FrancescoSaverioZuppichini Hi ,I have the same problem as you, I can reason correctly but it takes a long time, and I also warn INT64 that it needs to be converted to INT32, how do you solve the problem of long reasoning time in the end?
@FrancescoSaverioZuppichini Probably because of the forced conversion precision, Tensorrt is not even as fast as regular CUDA inference
@Bruce-320 Tensorrt is always faster than normal CUDA inference, usually at least 2x
@Bruce-320 I don't know how to solve the problem, apparently no one in the industry has never run mixed precision model on onnx lol - so I am still trying to figure it out but my approach is to ask directly to the software devs
Thank your for your answer and I would appreciate it if you can share your next progress.
@Bruce-320 Tensorrt is always faster than normal CUDA inference, usually at least 2x
@Bruce-320 I don't know how to solve the problem, apparently no one in the industry has never run mixed precision model on onnx lol - so I am still trying to figure it out but my approach is to ask directly to the software devs
No it doesn't, it takes a lot of time every time I want to run the onnx session due to casting.
@FrancescoSaverioZuppichini Do you mean the weights casting? Could you please share a log of that?
@zhenhuaw-me yes, like it takes a lot of time to cast the weights EVERY time I need to do inference, can you try to reproduce using my code so we can double-check there is something weird on my stack?
Description
Hello There!
I hope you are all doing well :)
There are other similar issues but no even one of them has a fix to this problem.
Tensorrt takes a lot of time casting IN64 to IN32 making it impossible to use in real life
my conversion code
The code takes around 3/4m to convert the weights, then it outputs the following
Environment
I am using the latest nvidia container TensorRT Version: 8.5.1 ONNX-TensorRT Version / Branch: GPU Type: RTX 3090 Nvidia Driver Version: CUDA Version: 11.7 CUDNN Version: Operating System + Version: Python Version (if applicable): 3.8 TensorFlow + TF2ONNX Version (if applicable): 1.13 PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
Relevant Files
my onnx model ( a convnext)
link to drive
Steps To Reproduce
Copy this code