opencv / opencv

Open Source Computer Vision Library
https://opencv.org
Apache License 2.0
79.01k stars 55.81k forks source link

Cuda streams do not run concurrently when using the convolve function #11149

Open echoGee opened 6 years ago

echoGee commented 6 years ago
System information (version)
Detailed description

Using streams to run multiple different convolutions do not run concurrently . I have not used any compiler flags such as –default-stream per-thread

Steps to reproduce

The code is in a .cpp file.

Ptr<cuda::Convolution> convolver;
cuda::Stream s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
convolver->convolve(im_32fc1_d, x, a_d, false, s1);

convolver->convolve(im_32fc1_d, xy, b_d, false, s2);

convolver->convolve(im_32fc1_d, xx, c_d, false, s3);

convolver->convolve(im_32fc1_d, yy, d_d, false, s4);

convolver->convolve(im_32fc1_d, xy, e_d, false, s5);

cuda streams not concurrent

tdakhran commented 6 years ago

@echoGee did you check "Enable concurrent kernel profiling" checkbox in Settings tab in Nvidia Visual Profiler?

echoGee commented 6 years ago

@tarkook : I see its reference in http://docs.nvidia.com/cuda/profiler-users-guide/index.html. I could not find that option in the Visual profiler. Any pointers?