Local size does not check the maximum work-group size of the device

viennacl / viennacl-dev

Developer repository for ViennaCL. Visit http://viennacl.sourceforge.net/ for the latest releases.

Other

282 stars 91 forks source link

Local size does not check the maximum work-group size of the device #243

Open doe300 opened 7 years ago

doe300 commented 7 years ago

For ViennaCL kernel executions, the local-size is set without checking the maximum work-group size of the device used. This results in failing to execute almost any ViennaCL program or test-case on an OpenCL device with a maximum work-group size of less than the default 128. This value is set e.g. as general default in 'viennacl::ocl::kernel::set_work_size_defaults()' or by various algorithms, e.g. in linalg/opencl/iterative_operations.hpp in line 93.

karlrupp commented 7 years ago

Thanks for reporting, @doe300 . On which device did you encounter the problem? It's a known issue for some CPU implementations, yet in the recent past I observed that some SDKs allow for workgroup sizes similar to that of GPUs.

doe300 commented 7 years ago

It's a custom implementation I'm working on for an embedded graphic chip. The hardware is too limited to allow for 128 work-items to be run in parallel.

karlrupp commented 7 years ago

Ok, thanks. Two years ago we evaluated a couple of embedded SoCs and found that the default was mostly okay, whereas in the cases it was not okay the performance was really bad.

Regardless, I agree that code should not crash by default and honor the maximum worksize limits.