JuliaGPU switching to task local state, synchronize only syncs default stream on current task. This ma lead to conflicts between array programming operations executed on defaults and kernel programming executed on custom stream. Adding support for (heavy) device sync may be needed in some cases:
CUDA: CUDA.device_synchronize()
AMDGPU: AMDGPU.HIP.devide_synchronize()
JuliaGPU switching to task local state,
synchronize
only syncs default stream on current task. This ma lead to conflicts between array programming operations executed on defaults and kernel programming executed on custom stream. Adding support for (heavy) device sync may be needed in some cases: CUDA:CUDA.device_synchronize()
AMDGPU:AMDGPU.HIP.devide_synchronize()