Closed duchy closed 3 years ago
Below crash happed when try to parse the ONNX models to plan format in multi-thread environment. Depends: Tensorrt: 7.2.2.3 Cuda: 11.1 OS: Windows 10 Pro
TensorRT Interface: Create the parser with:
TENSORRTAPI IParser* createParser(nvinfer1::INetworkDefinition& network, nvinfer1::ILogger& logger)
And parse onnx buffer with:bool nvonnxparser::IParser::parse(const void *serialized_onnx_model, size_t serialized_onnx_model_size)
Below is the crash logs:
CONTEXT: (.ecxr) rax=0000000000000000 rbx=00000000c0000374 rcx=0000000000000000 rdx=000000ad7edf8360 rsi=0000000000000001 rdi=00007ffb940d77f0 rip=00007ffb9406f0b9 rsp=000000ad7edf8960 rbp=0000000000000000 r8=00007ffb11f9ff98 r9=000001dcf7b306c0 r10=00007ffb11f9d3d7 r11=000000ad7edf7ca0 r12=0000000000000000 r13=000001dc0abf72d0 r14=000001dc0abf72c0 r15=0000000000000001 iopl=0 nv up ei pl nz na po nc cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206 ntdll!RtlReportFatalFailure+0x9: 00007ffb`9406f0b9 eb00 jmp ntdll!RtlReportFatalFailure+0xb (00007ffb`9406f0bb) Resetting default scope EXCEPTION_RECORD: (.exr -1) ExceptionAddress: 00007ffb9406f0b9 (ntdll!RtlReportFatalFailure+0x0000000000000009) ExceptionCode: c0000374 ExceptionFlags: 00000001 NumberParameters: 1 Parameter[0]: 00007ffb940d77f0 PROCESS_NAME: Application.exe ERROR_CODE: (NTSTATUS) 0xc0000374 - <Unable to get error code text> EXCEPTION_CODE_STR: c0000374 EXCEPTION_PARAMETER1: 00007ffb940d77f0 STACK_TEXT: 000000ad`7edf8960 00007ffb`9406f083 : 00007ffb`46145c7c 000000ad`7edff2c0 000001dd`4c1b4c90 00007ffb`00000006 : ntdll!RtlReportFatalFailure+0x9 000000ad`7edf89b0 00007ffb`94077e02 : 00000000`00000005 00007ffb`940d77f0 00000000`00000003 000001dc`0e450000 : ntdll!RtlReportCriticalFailure+0x97 000000ad`7edf8aa0 00007ffb`940780ea : 00000000`00000003 00000000`00000000 000001dc`0e450000 00007ffb`11e1ccfc : ntdll!RtlpHeapHandleError+0x12 000000ad`7edf8ad0 00007ffb`9407dd71 : 000001dc`0e450000 000001dc`0e450000 000000ad`7edfbb70 00000000`00000007 : ntdll!RtlpHpHeapHandleError+0x7a 000000ad`7edf8b00 00007ffb`94017102 : 000001dc`0abf74b0 000000ad`7edf8bb9 000000ad`7edf8d80 00000000`0000000f : ntdll!RtlpLogHeapFailure+0x45 000000ad`7edf8b30 00007ffb`93f947b1 : 0000232f`c687da2e 000001dc`0e450000 000000ad`7edf8d90 00000000`00000000 : ntdll!RtlpFreeHeapInternal+0x819f2 000000ad`7edf8bf0 00007ffb`13039a94 : 000000ad`7edf8db0 000001dd`14dd2c40 000001dc`8d1dc010 00007ffb`122159d6 : ntdll!RtlFreeHeap+0x51 000000ad`7edf8c30 00007ffb`11e18ca1 : 000001dd`14dd2c40 000001dc`00000000 00000000`0000003f 000001dc`8d1e0c00 : nvinfer!cask_trt::WeightGradientShader::isNhwcOutput+0x38c004 000000ad`7edf8c60 00007ffb`11e7c876 : 000001dc`8d1dc010 00000000`0000001f 000000ad`7edf8d90 000001dc`0b27ec88 : nvinfer+0xb8ca1 000000ad`7edf8c90 00007ffb`11e7dd3b : 00000000`00000000 000001dc`0b27b328 000000ad`7edf9100 000001dc`0b27b328 : nvinfer!cask_trt::ShaderList<cask_trt::LinkableConvShader,cask_trt::Convolution>::end+0x1a56 000000ad`7edf9000 00007ffb`11f2b512 : 000001dd`14dd2ac0 00007ffb`93f947b1 000001dc`f79d5c80 000001dd`14dd2ac0 : nvinfer!cask_trt::TensorDesc::getDim+0x97b 000000ad`7edf9910 00007ffb`11eb1062 : 000000ad`7edf99f0 000001dc`f7b20b30 000001dd`14dd2ac0 000000ad`7edf9c70 : nvinfer!cask_trt::PoolingShader::outputScalarsPerElement+0x3ca92 000000ad`7edf9990 00007ffb`11eb96bd : 000000ad`7edfa918 00000000`00000010 000001dc`f7b2fa50 000001dc`f7b2fa50 : nvinfer!cask_trt::TensorDesc::getDim+0x33ca2 000000ad`7edfa860 00007ffb`11fb12d6 : 00007ffb`15880168 000000ad`7edfac60 000001dd`14dd2fa0 000000ad`7edfab90 : nvinfer!cask_trt::TensorDesc::getDim+0x3c2fd 000000ad`7edfaa70 00007ffb`11fafd4e : 000000ad`7edfadc0 000001dc`f793f7a8 000001dc`f793f7a8 000001dc`f7d34700 : nvinfer!cask_trt::Shader::getKernelInfo+0x22946 000000ad`7edfad90 00007ffb`11ebba5b : 000000ad`7edfb7c0 000000ad`7edfd3e0 000000ad`7edfd3e0 000000ad`7edfd3e0 : nvinfer!cask_trt::Shader::getKernelInfo+0x213be 000000ad`7edfaea0 00007ffb`11eac95b : 00007ffb`15880168 00000404`95bb5b00 00000404`95bb5b00 000000ad`7edfd3e0 : nvinfer!cask_trt::TensorDesc::getDim+0x3e69b 000000ad`7edfd180 00007ffb`11e6e97f : 000000ad`00000000 000001dc`4f322640 000001dc`7e328910 00007ffb`93f95ba1 : nvinfer!cask_trt::TensorDesc::getDim+0x2f59b 000000ad`7edfd370 00007ffb`11e6e8c4 : 000001dc`07ede430 000001dc`4f322510 000001dc`7e328910 000001dc`7e328910 : nvinfer!nvinfer1EnableInternalBuildFlags+0x316f 000000ad`7edfd3b0 00007ffb`45bfab08 : 000001dc`4f322510 00000000`00000000 000001dc`07f403b0 000000ad`7edfdb69 : nvinfer!nvinfer1EnableInternalBuildFlags+0x30b4
Is it a race issue? The model can be parsed successfully one by one from onnx file paths in a single thread.
Repost this issue to Nvidia/Tensorrt.
Below crash happed when try to parse the ONNX models to plan format in multi-thread environment. Depends: Tensorrt: 7.2.2.3 Cuda: 11.1 OS: Windows 10 Pro
TensorRT Interface: Create the parser with:
TENSORRTAPI IParser* createParser(nvinfer1::INetworkDefinition& network, nvinfer1::ILogger& logger)
And parse onnx buffer with:bool nvonnxparser::IParser::parse(const void *serialized_onnx_model, size_t serialized_onnx_model_size)
Below is the crash logs:
tensorrt_crash.txt