Problem vl::impl::nnconv_forward_cudnn

germanRos commented 9 years ago

Hi everyone.

I have been working with matconvnet and cudnn for 4 months with no problem at all. However, today I have discovered a cuddn-r2 bug that appears when calling "cudnnGetConvolutionForwardWorkspaceSize" for some specific kernel sizes (e.g., 7x7x64x11).

The problem seems to be the algorithm selected by the previous call to "cudnnGetConvolutionForwardAlgorithm". When user selects "CUDNN_CONVOLUTION_FWD_PREFER_FASTEST", the amount of extra workspace memory required goes up to almost 20 GB (when using data that is 360x480x64x10). The output of this operation should take around 145 MB without counting the auxiliary memory and buffers. But yet, I don't think that the required auxiliary memory is actually 20 GB. Also, if the size of the kernel is slightly different, the amount of required memory is pretty normal. So, this looks like a bug, and NVidia already clarified back in February that this sort of bug existed in cudnn.

The most surprising thing is that I was only able to reproduce this behaviour with Titan X but never with K40.

I managed to solve the problem my modifying vl::impl::nnconv_forward_cudnn a bit. I changed CUDNN_CONVOLUTION_FWD_PREFER_FASTEST by CUDNN_CONVOLUTION_FWD_NO_WORKSPACE and that did the trick.

A cleverer way of proceeding would be to call cudnnGetConvolutionForwardAlgorithm with the CUDNN_CONVOLUTION_FWD_PREFER_FASTEST, then call cudnnGetConvolutionForwardWorkspaceSize and check the resulting workspace. If the workspace is too high we can call cudnnGetConvolutionForwardAlgorithm with CUDNN_CONVOLUTION_FWD_NO_WORKSPACE.

What do you guys think?

Best regards, German Ros.

vedaldi commented 9 years ago

Hi, this seems like a good idea!

On 23 Jul 2015, at 16:00, germanRos notifications@github.com wrote:

Hi everyone.

I have been working with matconvnet and cudnn for 4 months with no problem at all. However, today I have discovered a cuddn-r2 bug that appears when calling "cudnnGetConvolutionForwardWorkspaceSize" for same specific kernel sizes (e.g., 7x7x64x11).

The problem seems to be the algorithm selected by the previous call to "cudnnGetConvolutionForwardAlgorithm". When user selects "CUDNN_CONVOLUTION_FWD_PREFER_FASTEST", the amount of extra workspace memory required goes up to almost 20 GB (when using data that is 360x480x64x10). The output of this operation should take around 145 MB without counting the auxiliary memory and buffers. But yet, I don't think that the required auxiliary memory is actually 20 GB. Also, in the size of the kernel is slightly changed, the amount of required memory is pretty normal. So, this looks like a bug, and NVidia already clarified back in February that this sort of bug existed in cudnn.

The most surprising thing is that I was only able to reproduce this behaviour with Titan X but never with K40.

I managed to solve the problem my modifying vl::impl::nnconv_forward_cudnn a bit. I changed CUDNN_CONVOLUTION_FWD_PREFER_FASTEST by CUDNN_CONVOLUTION_FWD_NO_WORKSPACE and that did the trick.

A cleverer way of proceeding would be to call cudnnGetConvolutionForwardAlgorithm with the CUDNN_CONVOLUTION_FWD_PREFER_FASTEST, then call cudnnGetConvolutionForwardWorkspaceSize and check the resulting workspace. If the workspace is too high we can call cudnnGetConvolutionForwardAlgorithm with CUDNN_CONVOLUTION_FWD_NO_WORKSPACE.

What do you guys think?

Best regards, German Ros.

— Reply to this email directly or view it on GitHub https://github.com/vlfeat/matconvnet/issues/213.

dbbert commented 8 years ago

@germanRos solution of replacing CUDNN_CONVOLUTION_FWD_PREFER_FASTEST with CUDNN_CONVOLUTION_FWD_NO_WORKSPACE worked for me. Without this fix I get out-of-memory errors. Maybe a good idea to push this fix in the next beta release?

germanRos commented 8 years ago

The thing is that I am not sure if that problem is still present in cudnn3. Which cudnn version are you using?

Best regards, German.

dbbert commented 8 years ago

I am using cudnn3 with MatConvNet v1.0-beta16. Are you not seeing the issue anymore with cudnn3?

haoliyoupai commented 7 years ago

@germanRos I also encounter the same problem, the matlab shows error using vl::impl::nnconv_cudnn::forward:cudnn error. And in the nnconv_cudnn.cu I find the cudnnGetConvolutionForwardAlgorithm, but I could not find the CUDNN_CONVOLUTION_FWD_PREFER_FASTEST you mentioned, could you please help me. thank you

vlfeat / matconvnet

Problem vl::impl::nnconv_forward_cudnn #213