Closed doe300 closed 5 years ago
True. On which device are you seeing local-size not being power of 2?. If that is a possibility, this needs to be fixed
I'm writing a custom OpenCL implementation, where I have a local size of 12 to maximize hardware use. If I comment out the line rounding to a power of two, the tests run successfully.
The same issue arises with the global-bandwidth test too, since the local size is set correctly to the maximum the device supports, but the global size is not checked to be a multiple of it.
Given a local size, which is not a power of two, all kernel-based clpeak tests fail with CL_INVALID_WORK_GROUP_SIZE, since the global work-size is not divisible by the local size.
Given this excerpt from
compute_sp.cpp
:The reason for this is the line
t = roundToPowOf2(t)
which forces the global work-size to be a power of two, regardless of whether the local size is. And since powers of two are only divisible by powers of two, enqueueing a kernel with these values fails.