Closed jeroenk closed 10 years ago
Thanks for bringing this to our attention. Can you let us know which compiler version you are testing with, so we can better replicate the issue and the fix?
--Jeff
Current head of llvm/clang repository and current head of the libclc repository (also part of the llvm project).
I don't think the compiler is the issue here though. Basically the implementation violates the way volatile parameters may be passed. This in combination with the cl_khr_global_int32_baseatomics being just plainly broken, due to their parameters not being volatile. OpenCL 1.1 fixed this with the introduction of the atomic* functions.
I've implemented this fix (commit dd307351a4) and tested it with the following setup:
M2090 GPU, CUDA 5.5, GCC 4.8
From testing this against the initial code, I could verify that the timing results are not affected by this change. I left the cl_khr_global_int32_base_atomics pragma in bfs_uiuc_spill.cl for the moment but can remove it if there is a pressing need.
The 32-bit atom_* functions used in src/opencl/level1/bfs/bfs_uiucspill.cl do not take volatile arguments [1], while the pointers passed to these functions are volatile. The results in some OpenCL compilers (e.g., vanilla clang with libclc headers) rejecting the kernel due to loss of volatile when invoking the atom* functions.
The proper fix would be switching to the 32-bit atomic_* functions introduced with OpenCL 1.1 [2], which do take volatile arguments.
[1] http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/cl_khr_global_int32_base_atomics.html [2] http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/atomicFunctions.html