microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.7k stars 2.93k forks source link

Failed to use self created stream for a new cuda session #8578

Closed sunhmy closed 3 years ago

sunhmy commented 3 years ago

Discussed in https://github.com/microsoft/onnxruntime/discussions/8460

Originally posted by **sunhmy** July 22, 2021 Hi, I'd like to pick up the multi-stream feature added by using the has_user_compute_stream flag . However, I run into the following a segfault with the following code: Ort::SessionOptions session_options; session_options.SetIntraOpNumThreads(10); session_options.SetGraphOptimizationLevel(ORT_ENABLE_BASIC); #ifdef USE_CUDA printf("Use cuda\n"); cudaStream_t stream; cudaStreamCreate(&stream); OrtCUDAProviderOptions cuda_options{ 0, OrtCudnnConvAlgoSearch::EXHAUSTIVE, std::numeric_limits::max(), 0, false, true, stream}; session_options.AppendExecutionProvider_CUDA(cuda_options); ---------------------------- my OnnxRuntime version is 1.7.0, CUDA Toolkit version is 10.2, backtrace is as below: 2021-07-22 19:57:46.640710391 [E:onnxruntime:, inference_session.cc:1294 operator()] Exception during initialization: /home/smy/onnxruntime/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:123 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudnnStatus_t; bool THRW = true] /home/smy/onnxruntime/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:117 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudnnStatus_t; bool THRW = true] CUDNN failure 7: CUDNN_STATUS_MAPPING_ERROR ; GPU=0 ; hostname=dg01-baymax-k8s-test001-node-10-52-138-206 ; expr=cudnnSetStream(cudnn_handle_, stream); terminate called after throwing an instance of 'Ort::Exception' what(): Exception during initialization: /home/smy/onnxruntime/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:123 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudnnStatus_t; bool THRW = true] /home/smy/onnxruntime/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:117 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudnnStatus_t; bool THRW = true] CUDNN failure 7: CUDNN_STATUS_MAPPING_ERROR ; GPU=0 ; hostname=dg01-baymax-k8s-test001-node-10-52-138-206 ; expr=cudnnSetStream(cudnn_handle_, stream); Aborted (core dumped) Program received signal SIGABRT, Aborted. 0x00007fffeff5d207 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libstdc++-4.8.5-36.el7_6.2.x86_64 (gdb) bt #0 0x00007fffeff5d207 in raise () from /lib64/libc.so.6 #1 0x00007fffeff5e8f8 in abort () from /lib64/libc.so.6 #2 0x00007ffff086c7d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6 #3 0x00007ffff086a746 in ?? () from /lib64/libstdc++.so.6 #4 0x00007ffff086a773 in std::terminate() () from /lib64/libstdc++.so.6 #5 0x00007ffff086a993 in __cxa_throw () from /lib64/libstdc++.so.6 #6 0x000000000040672f in Ort::ThrowOnError(OrtApi const&, OrtStatus*) () #7 0x000000000040678e in Ort::ThrowOnError(OrtStatus*) () #8 0x0000000000406a87 in Ort::Session::Session(Ort::Env&, char const*, Ort::SessionOptions const&) () #9 0x0000000000405a44 in main () (gdb) ^CQuit Any suggestion for how I should overcome this? Thanks!
yuslepukhin commented 3 years ago

Looks like terminate() is called because your main() is not handling the exception. This is according to the C++ standard.

The cause for the exception is an error that cudnnSetStream() returns: CUDNN_STATUS_MAPPING_ERROR. There may be multiple reasons for this.

sunhmy commented 3 years ago

The cause for the exception is an error that cudnnSetStream() returns: CUDNN_STATUS_MAPPING_ERROR. There may be multiple reasons for this.

Thanks for the reply. Any suggestion for how to overcome this error?

yuslepukhin commented 3 years ago

I would suggest first to check error code from cudaCreateStream() to see that it did not fail and you are feeding a valid stream. Also, make sure you read this

sunhmy commented 3 years ago

Thank you for the suggestion. I see what's going on finally. I didn't realize that the CUDA Stream is device specific, so that I have to set device properly before using those streams on another devices. Now it's working for me and I'm closing this issue.