Closed jxy closed 3 months ago
When I wrote this code, I thought I stuck to the standard. I'm not an OpenMP ninja though.
@jxy feel free to file a PR with a patch that keeps Intel OpenMP happy if you like.
I think this actually creates a race condition, because the threads are updating block
. I'll open a PR later today just to move the block
creation inside the loop.
I got an error compiling this code with Intel's icpx compiler, https://github.com/lattice/quda/blob/d199bd36a7024f24c28ad007d540e35aa850b27e/include/targets/generic/block_reduction_kernel_host.h#L8-L11
The error is
I'm not entirely sure whether the code deviates from the standard or if this is a quirk in Intel’s OpenMP implementation. While I can work around the issue easily, I hope raising it here will attract insights from those more knowledgeable.