halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.86k stars 1.07k forks source link

Semantics of gpu_thread_barrier() #4967

Open shoaibkamil opened 4 years ago

shoaibkamil commented 4 years ago

The intrinsic gpu_thread_barrier() currently has different memory fence semantics on different platforms. We should decide & document what the semantics are. In addition, if we want fences for global/device memory as well as shared memory, then we need to change backends to respect the semantics correctly.

Currently, I believe the semantics are:

abadams commented 4 years ago

Ugh, so if we have a Func compute_at gpu blocks, and we elect to store it in MemoryType::Heap instead of MemoryType::GPUShared, because it doesn't fit in shared, then we may not be currently generating correct code?

shoaibkamil commented 4 years ago

Yes, I believe that's correct. It would be correct for CUDA and OpenGLCompute, but not anything else :-/