webgpu / webgpu-samples

WebGPU Samples
https://webgpu.github.io/webgpu-samples/
BSD 3-Clause "New" or "Revised" License
1.77k stars 301 forks source link

Address Particles HDR sample blue trail #458

Open beaufortfrancois opened 5 days ago

beaufortfrancois commented 5 days ago

The following blue trail is a pet peeve of me in the Particles (HDR) sample:

Screenshot 2024-10-08 at 10 17 46

I can reproduce on macOS (Apple M1 Pro device with Chrome and Safari), but not on Android. Could this be a Metal driver issue or something else? FYI @mwyrzykowski

mwyrzykowski commented 4 days ago

I have also noticed this and it seems to occur in different locations in Chrome (left, trail in far top-left) compared to Safari (right, trail in middle top-left)

Screenshot 2024-10-08 at 8 11 06 AM

I have not debugged the sample first, so I would start there before thinking it is a metal bug. The issue is present on AMD Intel Mac as well, so if it is a Metal issue it is not hardware specific.

mwyrzykowski commented 4 days ago

that looks better:

Screenshot 2024-10-08 at 8 23 50 AM

It is an issue in the sample, I don't reproduce when I disable fastMath (options.fastMathEnabled = NO;) in WebKit

The WGSL specification is supposed to allow for variances due to having fast math enabled, so likely these functions:

fn init_rand(invocation_id : u32, seed : vec4f) {
  rand_seed = seed.xz;
  rand_seed = fract(rand_seed * cos(35.456+f32(invocation_id) * seed.yw));
  rand_seed = fract(rand_seed * cos(41.235+f32(invocation_id) * seed.xw));
}

fn rand() -> f32 {
  rand_seed.x = fract(cos(dot(rand_seed, vec2f(23.14077926, 232.61690225))) * 136.8168);
  rand_seed.y = fract(cos(dot(rand_seed, vec2f(54.47856553, 345.84153136))) * 534.7645);
  return rand_seed.y;
}

are making assumptions about numerical accuracy which are not guaranteed by WGSL.

greggman commented 4 days ago

There's at least one other issue. The code is using workgroups of size 64 and then dispatching Math.ceil(numParticles / 64) workgroups. It's using 50000 particles. That means it's calculating and writing to 50048 particles, 48 of which are out of bounds. Out of bounds writes could corruprt other particles. That they aren't is just luck.

The sample should either round up/down the number of particles to a multiple of 64, or it should check in the shader and not process out of bounds particles.

mwyrzykowski commented 4 days ago

Indeed, I think both Chrome and Safari are clamping, but that is not the only behavior allowed by the specification

beaufortfrancois commented 3 days ago

I can confirm with https://chromium-review.googlesource.com/c/chromium/src/+/5920336 that setting strict math removes the blue trail. Thank you!

I'm not sure yet though how to fix the root issue. Setting numParticles to 50048 alone was not enough by itself.

greggman commented 3 days ago

sorry, I didn't mean to suggest the out-of-bounds writting was the source of the problem. I only meant to point out it is yet another problem that may show a different issue on some implementations

mwyrzykowski commented 3 days ago

I don't have time right now to look at it, maybe later in the week, but we could try annotating certain math functions in the metal with:

#pragma METAL fp math_mode([relaxed | safe | fast])

to determine which math function is causing the logic to break down

mwyrzykowski commented 14 hours ago

In WebKit for the following WGSL:

fn init_rand(invocation_id : u32, seed : vec4f) {
  rand_seed = seed.xz;
  rand_seed = fract(rand_seed * cos(35.456+f32(invocation_id) * seed.yw));
  rand_seed = fract(rand_seed * cos(41.235+f32(invocation_id) * seed.xw));
}

we generate:

void function0(unsigned parameter0, vec<float, 4> parameter1, thread vec<float, 2>& global0)
{
    global0 = parameter1.xz;
    global0 = fract((global0 * cos((35.45600128173828 + (float(parameter0) * parameter1.yw)))));
    global0 = fract((global0 * cos((41.23500061035156 + (float(parameter0) * parameter1.xw)))));
}

replacing that with (note precise:: in front of the call to cos):

void function0(unsigned parameter0, vec<float, 4> parameter1, thread vec<float, 2>& global0)
{
    global0 = parameter1.xz;
    global0 = fract((global0 * precise::cos((35.45600128173828 + (float(parameter0) * parameter1.yw)))));
    global0 = fract((global0 * precise::cos((41.23500061035156 + (float(parameter0) * parameter1.xw)))));
}

corrects the issue @beaufortfrancois

mwyrzykowski commented 13 hours ago

@beaufortfrancois it looks like the value being passed to cos call is too large, I didn't debug the actual values but dividing parameter0 by 100 seems to resolve the issue:

void function0(unsigned parameter0, vec<float, 4> parameter1, thread vec<float, 2>& global0)
{
    global0 = parameter1.xz;
    global0 = fract((global0 * fast::cos((35.45600128173828 + (float(parameter0)*0.01f * float2(parameter1.y, parameter1.w))))));
    global0 = fract((global0 * fast::cos((41.23500061035156 + (float(parameter0)*0.01f * float2(parameter1.x, parameter1.w))))));
}