microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.25k stars 2.87k forks source link

"Windows fatal exception: access violation" when trying to run custom ONNX model. #5872

Open eublefar opened 3 years ago

eublefar commented 3 years ago

Describe the bug I am exporting combined Flowtron and WaveGlow to produce TTS ONNX model. Exported model fails silently during run, when enabling faulthandler captures Windows fatal exception: access violation

Urgency Project-related deadline in 3 weeks.

System information

To Reproduce

Expected behavior Model should run correctly or throw explicit error

Additional context WaveGlow weights: https://drive.google.com/file/d/1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF/view Flowtron weights: https://drive.google.com/file/d/1KhJcPawFgmfvwV7tQAOeC253rYstLrs8/view?usp=drive_open Exported model: https://drive.google.com/file/d/10Mujy0farKOJu4O1dXYV6wtN3_sKovAi/view?usp=sharing

Total output from the command with faulthandler.enable():

>python export_onnx.py -c config_onnx.json -f models/flowtron_libritts.pt -w models/waveglow_256channels_universal_v5.pt -i 83
C:\Users\evil_unicorn\Miniconda3\envs\onnxrt\lib\site-packages\torch\serialization.py:658: SourceChangeWarning: source code of class 'torch.nn.modules.conv.ConvTranspose1d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
C:\Users\evil_unicorn\Miniconda3\envs\onnxrt\lib\site-packages\torch\serialization.py:658: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
C:\Users\evil_unicorn\Miniconda3\envs\onnxrt\lib\site-packages\torch\serialization.py:658: SourceChangeWarning: source code of class 'torch.nn.modules.conv.Conv1d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
Loaded checkpoint 'models/flowtron_libritts.pt')
Number of speakers : 123
F:\git_projects\pull_reqs\flowtron\flowtron_onnx.py:323: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  n_half = int(audio.size(1)/2)
C:\Users\evil_unicorn\Miniconda3\envs\onnxrt\lib\site-packages\torch\jit\_trace.py:966: TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error:
With rtol=1e-05 and atol=1e-05, found 102198 element(s) (out of 102400) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.3796519637107849 (0.2296285480260849 vs. -0.1500234156847), which occurred at index (0, 10260).
  _module_class,
C:\Users\evil_unicorn\Miniconda3\envs\onnxrt\lib\site-packages\torch\onnx\symbolic_opset9.py:1805: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model.
  "or define the initial states (h0/c0) as inputs of the model. ")
Running test:
Loading model.
Model loaded, running tts.
Windows fatal exception: access violation

Current thread 0x000032fc (most recent call first):
  File "...site-packages\onnxruntime\capi\session.py", line 123 in run
  File "export_onnx.py", line 122 in export
  File "export_onnx.py", line 178 in <module>
zhanghuanrong commented 3 years ago

Thanks for reporting. We may need some time to investigate this.

eublefar commented 3 years ago

Found workaround: replacing while loop with for loop and total_output.append(output) with total_output += [output] inside the loop. Diff with a fix https://github.com/eublefar/flowtron/commit/5733f734f08f073210d1ab7424e67dfbe13fe2f4#diff-1f52f80ab66a787214526109625cba4ed6833b395f96198a324d3366438e969b

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.