microsoft / DirectX-Graphics-Samples

This repo contains the DirectX Graphics samples that demonstrate how to build graphics intensive applications on Windows.
MIT License
5.88k stars 2k forks source link

MiniEngine blur shader has a data race #870

Open jenatali opened 1 month ago

jenatali commented 1 month ago

The blur shader has a data race. This race is specifically problematic on WARP, but can reproduce on any GPU with small wave sizes. The access pattern for the groupshared memory is:

This pattern correctly inserts barriers to prevent hazards from write -> read (readers must wait until writes complete), but is missing barriers to prevent hazards from read -> write (writers must wait until all readers complete before overwriting data).

Since WARP executes 4-channel waves sequentially, it will deterministically hit a problematic case where some readers try to load 2xf16 data, but instead they read 1xf32 data. Trying to unpack this f32 as f16s produces nans and other garbage. Theoretically any GPU with a wave size smaller than 64 (since the blur uses 8x8 thread groups) can hit this.