Closed vm3tr1c1 closed 3 years ago
BTW,
OpenCL kernel attributes for the second case are here.
Host-side check is based on clGetKernelWorkGroupInfo()
call with CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE
argument.
@vm3tr1c1 Do you use OPENCV_OCL4DNN_FORCE_AUTO_TUNING=1
environment variable?
@vm3tr1c1 Do you use
OPENCV_OCL4DNN_FORCE_AUTO_TUNING=1
environment variable?
I don't, but it doesn't seem to make any difference if I use it.
I'm not an expert in opencl, but please let me know how can I be of help (testing, debugging, whatever).
It looks like problem on IGC side, moving issue to https://github.com/intel/intel-graphics-compiler
Can confirm the similar behavior for i7-11700K on Win10 with 27.20.100.9127 driver. Will try to prepare minimal reproducer for this problem.
Investigation shows that there is incorrect condition in OpenCV code.
We should not use clGetKernelWorkGroupInfo(CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE)
check in such strict way. It is a hint and it should not block kernels execution.
"Unblocked" kernels work fine without regressions.
This issue can be closed (no problem in IGC compiler). Related OpenCV fixes can be tracked here: https://github.com/opencv/opencv/issues/20559
@alalek I can confirm that the OpenCV patch resolves the problem. Thank you!
At the runtime of an opencv application which use the ocl4dnn module, a lot of 'simd size' errors are generated, like these ones:
OpenCV(ocl4dnn): The OpenCL compiler chose a simd size (16) that does not equal the size (8) kernel source required. Skip this kernel U_GEMM_LIKE_CONV_k3x3_cn3_g1_s2x2_d1x1_b1_in256x256_p1x1_num1_M32_activ1_eltwise0_FP32_5_1_8_32_SIMD8 OpenCV(ocl4dnn): The OpenCL compiler chose a simd size (32) that does not equal the size (16) kernel source required. Skip this kernel IDLF_k3x3_cn3_g1_s2x2_d1x1_b1_in256x256_p1x1_num1_M32_activ1_eltwise0_FP32_2_7_3_1_SIMD16
If the CPU is replaced with an i5-10600, without changing anything else, the errors disappear and the GPU code runs without any issue at all.