krrishnarraj / clpeak

A tool which profiles OpenCL devices to find their peak capacities
Apache License 2.0
396 stars 111 forks source link

clpeak fails for local-size not a power of two #41

Closed doe300 closed 5 years ago

doe300 commented 7 years ago

Given a local size, which is not a power of two, all kernel-based clpeak tests fail with CL_INVALID_WORK_GROUP_SIZE, since the global work-size is not divisible by the local size.

Given this excerpt from compute_sp.cpp:

uint globalWIs = (devInfo.numCUs) * (devInfo.computeWgsPerCU) * (devInfo.maxWGSize);
uint t = MIN((globalWIs * sizeof(cl_float)), devInfo.maxAllocSize);
t = roundToPowOf2(t);
globalWIs = t / sizeof(cl_float);

The reason for this is the line t = roundToPowOf2(t) which forces the global work-size to be a power of two, regardless of whether the local size is. And since powers of two are only divisible by powers of two, enqueueing a kernel with these values fails.

krrishnarraj commented 7 years ago

True. On which device are you seeing local-size not being power of 2?. If that is a possibility, this needs to be fixed

doe300 commented 7 years ago

I'm writing a custom OpenCL implementation, where I have a local size of 12 to maximize hardware use. If I comment out the line rounding to a power of two, the tests run successfully.

The same issue arises with the global-bandwidth test too, since the local size is set correctly to the maximum the device supports, but the global size is not checked to be a multiple of it.