Open doe300 opened 7 years ago
Thanks for reporting, @doe300 . On which device did you encounter the problem? It's a known issue for some CPU implementations, yet in the recent past I observed that some SDKs allow for workgroup sizes similar to that of GPUs.
It's a custom implementation I'm working on for an embedded graphic chip. The hardware is too limited to allow for 128 work-items to be run in parallel.
Ok, thanks. Two years ago we evaluated a couple of embedded SoCs and found that the default was mostly okay, whereas in the cases it was not okay the performance was really bad.
Regardless, I agree that code should not crash by default and honor the maximum worksize limits.
For ViennaCL kernel executions, the local-size is set without checking the maximum work-group size of the device used. This results in failing to execute almost any ViennaCL program or test-case on an OpenCL device with a maximum work-group size of less than the default 128. This value is set e.g. as general default in 'viennacl::ocl::kernel::set_work_size_defaults()' or by various algorithms, e.g. in
linalg/opencl/iterative_operations.hpp
in line 93.