libocca / occa

Portable and vendor neutral framework for parallel programming on heterogeneous platforms.
https://libocca.org
MIT License
399 stars 86 forks source link

Warp/sub-group barriers #516

Open kris-rowe opened 3 years ago

kris-rowe commented 3 years ago

A related issue: If the inner size <= warpSize a warp-wide barrier should be added. Currently no @barrier is added at all. That's tricky at least for Nvidia's Volta and later architectures (you can no longer assume that the threads in a wrap run in lock-step).

Originally posted by @stgeke in https://github.com/libocca/occa/issues/484#issuecomment-919249600

kris-rowe commented 3 years ago

This is also relevant for OpenCL and SYCL/DPC++ since the innermost @inner loop will be mapped to a sub-group. The new versions of the standards support sub-group barriers.