Open ache7 opened 7 years ago
I haven't spent a lot of time looking at the OpenCL Torch code, so I may be mistaken, but I believe this is the issue:
In xdot.c, the doDot() function is enqueueing two kernels. The first is Sdot_kernel, which creates the event firstDotCall. This event is used in the wait list for the second kernel, Sred_sum_kernel, so it cannot be removed completely, but it does need to be released and I don't see anywhere this is occurring.
The same issue may be occurring for other clblas functions too, but at least for neural-style, this specific leak appears to be the most problematic.
Seems to work ok after leak fixed, thanks to bashbaug. Here is a pull request - https://github.com/hughperkins/clBLAS/pull/2
https://github.com/clMathLibraries/clBLAS/pull/300/commits/03254e597e6649116a8bc249d8a45f973b7e32cf here's the fix for xdot() and other functions with event leaks.
neural-style exit with error after 90-100 iterations Discussed here - https://software.intel.com/en-us/forums/opencl/topic/701907 Maybe it's not your fault, but I can't find out by myself.