hughperkins / distro-cl

OpenCL Torch
147 stars 17 forks source link

Possible event leak, cause CL_OUT_OF_HOST_MEMORY on intel-opencl-r3.0 (IntelHD GPU) #14

Open ache7 opened 7 years ago

ache7 commented 7 years ago

neural-style exit with error after 90-100 iterations Discussed here - https://software.intel.com/en-us/forums/opencl/topic/701907 Maybe it's not your fault, but I can't find out by myself.

bashbaug commented 7 years ago

I haven't spent a lot of time looking at the OpenCL Torch code, so I may be mistaken, but I believe this is the issue:

In xdot.c, the doDot() function is enqueueing two kernels. The first is Sdot_kernel, which creates the event firstDotCall. This event is used in the wait list for the second kernel, Sred_sum_kernel, so it cannot be removed completely, but it does need to be released and I don't see anywhere this is occurring.

The same issue may be occurring for other clblas functions too, but at least for neural-style, this specific leak appears to be the most problematic.

ache7 commented 7 years ago

Seems to work ok after leak fixed, thanks to bashbaug. Here is a pull request - https://github.com/hughperkins/clBLAS/pull/2

ache7 commented 7 years ago

https://github.com/clMathLibraries/clBLAS/pull/300/commits/03254e597e6649116a8bc249d8a45f973b7e32cf here's the fix for xdot() and other functions with event leaks.