For wave sizes smaller than 8, the blur compute shader had a race condition in the horizontal blur because it reads several pixels from group shared memory and then overwrites them with the blurred value. When running at SIMD widths of 8 or larger, the reads and writes were naturally synchronized. However, WARP has a pseudo-SIMD width of 4, and artifacts were detected. The solution is to add a GroupMemoryBarrierWithGroupSync() after loading the pixels and before storing the convolved result.
For wave sizes smaller than 8, the blur compute shader had a race condition in the horizontal blur because it reads several pixels from group shared memory and then overwrites them with the blurred value. When running at SIMD widths of 8 or larger, the reads and writes were naturally synchronized. However, WARP has a pseudo-SIMD width of 4, and artifacts were detected. The solution is to add a GroupMemoryBarrierWithGroupSync() after loading the pixels and before storing the convolved result.