Open mattroos opened 3 years ago
Hi, @mattroos - how often the case 4 true/true would crash? I tried your code using master branch dated on 04/26, I can't repro it with 40 runs. Perhaps I can try again with 1.7.0 later when I get a chance. Now that we have released 1.7.2, I wonder if you can also try it? Thanks.
@ytaous, in my code, the model it is run for 100 trials, and it crashes on one of those 100 trials on nearly every execution of the code. I'll install 1.7.2 and see if that changes anything. Thanks.
@ytaous rather than rebuild, I just did a pip install of 1.7.0 (I had previously been using my own build of 1.7.0). Specifically I did a pip install onnxruntime-gpu
. After doing so, the crashes stopped occurring. However, something is strange and I don't think it's actually creating the engines correctly (apologies if my terminology is not correct), and/or may always be loading from cache. And yet, it never seems to cache the engines even when requested to, or, it is saving them somewhere other than my specified cache path. Even if I use this ...
os.environ["ORT_TENSORRT_CACHE_PATH"] = os.path.expanduser('~') + '/.gatekeeper_cache/'
os.environ["ORT_TENSORRT_FP16_ENABLE"] = "0" # Disable/enable FP16 precision
os.environ["ORT_TENSORRT_INT8_ENABLE"] = "0" # Disable/enable INT8 precision
os.environ["ORT_TENSORRT_ENGINE_CACHE_ENABLE"] = "1" # Disable/enable engine caching
... prior to calling InferenceSession(), there are no files saved in the specified cache path afterwards. Any advice?
[EDIT]:
Oh, I see now that the onnxruntime-gpu
is a generic GPU implementation, and doesn't use TensorRT. I'll try building from source again.
Describe the bug When I export and then use a model than included an InstanceNorm2d layer, it often (but not always) crashes when using dynamic width.
Urgency I'm forced to abandon ONNX and try other methods for accelerating my model.
System information
To reproduce, and expected behavior
The error from these four models (100 trials each) is below. The numbers are the image width for a given data sample and trials. Note that on this particular run, the last model (with InstanceNorm2d and dynamic width) ran successfully on the first trial with a data width of 1180, then crashed on or after the next trial, with a width of 512. There seems to be no discernible pattern relating data width to when a crash occurs. It happens regularly, but must be related to the data values in the input, or the model parameters (which are randomly initialized).
The code