Closed oscarriddle closed 5 years ago
---
Device: TITAN Xp
CUDA version: 9000
cuDNN version: 7005
u-cuDNN version: 1.1.0 (commit 0b4663f12f1205514933354bbc7a7571e1331247)
---
2019-09-06 17:04:48.845554: I tensorflow/stream_executor/stream.cc:697] Called Stream::ThenConvolveWithAlgorithm(input_descriptor=b16d3s299 299 , input_data=0x7f6659022000, filter_descriptor=od32id3s3 3 , filter_data=0x7f665a080b00, convolution_descriptor=p0:0_p1:0_s0:2_s1:2_d0:1_d1:1, output_descriptor=b16d32s149 149 , output=0x7f665a081900, algorithm_config=0, -1) stream=0xa9d79e0
2019-09-06 17:04:48.845585: I tensorflow/stream_executor/stream.cc:698] A Get in strema ThenConvolveWithAlgorithm()
2019-09-06 17:04:48.845630: I tensorflow/stream_executor/cuda/cuda_dnn.cc:2288] A Get in cuda_dnn ThenConvolveWithAlgorithm()
Environment variable UCUDNN_BATCH_SIZE_POLICY is not set. Using "powerOfTwo" instead.
Environment variable UCUDNN_BENCHMARK_DEVICES is not set. Using "" instead.
Environment variable UCUDNN_TOTAL_WORKSPACE_SIZE is not set. Using "0" instead.
Environment variable UCUDNN_DATABASE is not set. Using "" instead.
2019-09-06 17:04:48.845766: F tensorflow/stream_executor/cuda/cuda_dnn.cc:2294] failed to set stream for cudnn handle: CUDNN_STATUS_MAPPING_ERROR
Aborted (core dumped)
Error came from:
auto status = wrap::cudnnSetStream(parent_, ToHandle(dnn_handle_),
AsCUDAStreamValue(stream));
According to cuDNN developer guide:
4.206. cudnnSetStream
cudnnStatus_t cudnnSetStream(
cudnnHandle_t handle,
cudaStream_t streamId)
This function sets the user's CUDA stream in the cuDNN handle. The new stream will be used to launch cuDNN GPU kernels or to synchronize to this stream when cuDNN kernels are launched in the internal streams. If the cuDNN library stream is not set, all kernels use the default (NULL) stream. Setting the user stream in the cuDNN handle guarantees the issue-order execution of cuDNN calls and other GPU kernels launched in the same stream.
Parameters
handle
Input. Pointer to the cuDNN handle.
streamID
Input. New CUDA stream to be written to the cuDNN handle.
Returns
CUDNN_STATUS_BAD_PARAM
Invalid (NULL) handle.
CUDNN_STATUS_MAPPING_ERROR
Mismatch between the user stream and the cuDNN handle context.
CUDNN_STATUS_SUCCESS
The new stream was set successfully.
Maybe a wrapped API for cudnnSetStream is needed, or whether this setting can be bypassed.
I tried to bypass the cudnnSetStream by just commented out below codes:
//auto status = wrap::cudnnSetStream(parent_, ToHandle(dnn_handle_), AsCUDAStreamValue(stream));//.getCudnnHandle(),
//if (status != CUDNN_STATUS_SUCCESS) {
// LOG(FATAL) << "failed to set stream for cudnn handle: " << ToString(status);
//}
So actually cudnn handle will be set at stream 0 by default. The cuda_dnn.cc proceeds to status = wrap::cudnnGetConvolutionForwardWorkspaceSize() around line 2400.
This function actually called getWorkspaceSize in ucudnnHandle.cpp, then called getAlgorithm in ucudnnHandle.cpp, then called optimizer, finally called cudnnFindConvolutionGenericAlgorithm() in utils.cpp, then error out with cudnnStatus_t 3, which is CUDNN_STATUS_BAD_PARAM.
In API cudnnFindConvolutionGenericAlgorithm(), CUDNN_STATUS_BAD_PARAM indicates:
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
handle is not allocated properly.
xDesc, wDesc or yDesc is not allocated properly.
xDesc, wDesc or yDesc has fewer than 1 dimension.
Either returnedCount or perfResults is nil.
requestedCount is less than 1.
Here I found either failed at cudnnSetStream, or at getalgorithm, so there must exists some different between TF and normal way of implementation of cudnn.
Would you help address this problem, or give some advice?
Solved by changing the way of Tensorflow to deal with cudnn handle. Now ucudnn API can be successfully called by Tensorflow.
This is great. Would you like to upload a pull request?
@tbennun please refer to #6
Sys info: CentOS7, TF1.6, Python3.6, CUDA9.0, cuDNN7.0.5, gcc 4.9.3,
Description:
<1> I compiled ucudnn and run the test, it works seems well. <2> Then I tried to patch it to TF1.6 with specific commit version and successfully compiled the TF1.6. <3> I tried to run a python script that normally create a graph to run normal Inceptionv3, then segment fault after printing: ``` --- Device: TITAN Xp CUDA version: 9000 cuDNN version: 7005 u-cuDNN version: 1.1.0 (commit 0b4663f12f1205514933354bbc7a7571e1331247) --- Thread 165 "python" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffc86bfd700 (LWP 36259)] 0x00007fff6b1147bb in ?? () from /lib64/libcuda.so.1 ``` <4> Then I use gdb to debug it and here is the trace back: ``` (gdb) bt #0 0x00007fff6b1147bb in ?? () from /lib64/libcuda.so.1 #1 0x00007fff6b036dcd in ?? () from /lib64/libcuda.so.1 #2 0x00007fff6b1c253b in cuEventRecord () from /lib64/libcuda.so.1 #3 0x00007fff5a77c57b in ?? () from /home/web_server/xiaolun/cuda-9.0-cudnn-7.0.5/lib64/libcudnn.so.7 #4 0x00007fff5a7c302b in ?? () from /home/web_server/xiaolun/cuda-9.0-cudnn-7.0.5/lib64/libcudnn.so.7 #5 0x00007fff59b16bda in cudnnSetStream () from /home/web_server/xiaolun/cuda-9.0-cudnn-7.0.5/lib64/libcudnn.so.7 #6 0x00007fff759f005b in bool perftools::gputools::cuda::CudnnSupport::DoConvolveImpl