Open shoaibkamil opened 4 years ago
Ugh, so if we have a Func compute_at gpu blocks, and we elect to store it in MemoryType::Heap instead of MemoryType::GPUShared, because it doesn't fit in shared, then we may not be currently generating correct code?
Yes, I believe that's correct. It would be correct for CUDA and OpenGLCompute, but not anything else :-/
The intrinsic
gpu_thread_barrier()
currently has different memory fence semantics on different platforms. We should decide & document what the semantics are. In addition, if we want fences for global/device memory as well as shared memory, then we need to change backends to respect the semantics correctly.Currently, I believe the semantics are: