omlins / ParallelStencil.jl

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
BSD 3-Clause "New" or "Revised" License
301 stars 31 forks source link

Add device_sync #106

Open luraess opened 12 months ago

luraess commented 12 months ago

JuliaGPU switching to task local state, synchronize only syncs default stream on current task. This ma lead to conflicts between array programming operations executed on defaults and kernel programming executed on custom stream. Adding support for (heavy) device sync may be needed in some cases: CUDA: CUDA.device_synchronize() AMDGPU: AMDGPU.HIP.devide_synchronize()