Open hfp opened 3 years ago
We have reproduced the issue and placed the bug in our debug queue, but do not have an ETA for a fix.
Thank you very much!
We have reproduced the issue and placed the bug in our debug queue, but do not have an ETA for a fix.
@AdamCetnerowski Over a year has passed. Any updates?
Global memory updates using 32-bit atomic behave non-atomic on Intel HD Graphics integrated into Celeron/Atom platform. Specifically,
Intel(R) Celeron(R) CPU J3455 @ 1.50GHz
(lscpu
) withIntel(R) Graphics [0x5a85]
(clinfo
). It is likely reproducible on similar Celeron/Atom based CPUs with integrated HD Graphics, and perhaps a misconfiguration/enabling of features in the driver stack (either i915 kmd or up the stack aka compute runtime).How to reproduce:
The console output of below command looks like:
In the above output (
max.error: abs=849.74 rel=1
), the error appears due to data races or non-atomic updates. Generally, GEN9 based devices as integrated into Core based processors work just fine (atomic flow). Similar to Core, the Celeron/Atom based OpenCL platform advertises sufficient support for atomic ops likecl_khr_global_int32_base_atomics
andcl_khr_global_int32_extended_atomics
used by the reproducer.The reproducer implements atomic FP32-updates using the usual flow based on
cmpxchg
orxchg
. The atomic implementation can be toggled usingOPENCL_LIBSMM_SMM_ATOMICS=cmpxchg
(default on GEN9),OPENCL_LIBSMM_SMM_ATOMICS=xchg
, orOPENCL_LIBSMM_SMM_ATOMICS=0
. The latter of which replaces the atomic flow with plain FP32-add ("+=") meant to observe/study performance differences. However on Celeron/Atom based GEN9, the accumulated error due to data races is similar between supposedly atomic flow and non-atomic flow.