Open beaufortfrancois opened 5 days ago
I have also noticed this and it seems to occur in different locations in Chrome (left, trail in far top-left) compared to Safari (right, trail in middle top-left)
I have not debugged the sample first, so I would start there before thinking it is a metal bug. The issue is present on AMD Intel Mac as well, so if it is a Metal issue it is not hardware specific.
that looks better:
It is an issue in the sample, I don't reproduce when I disable fastMath (options.fastMathEnabled = NO;
) in WebKit
The WGSL specification is supposed to allow for variances due to having fast math enabled, so likely these functions:
fn init_rand(invocation_id : u32, seed : vec4f) {
rand_seed = seed.xz;
rand_seed = fract(rand_seed * cos(35.456+f32(invocation_id) * seed.yw));
rand_seed = fract(rand_seed * cos(41.235+f32(invocation_id) * seed.xw));
}
fn rand() -> f32 {
rand_seed.x = fract(cos(dot(rand_seed, vec2f(23.14077926, 232.61690225))) * 136.8168);
rand_seed.y = fract(cos(dot(rand_seed, vec2f(54.47856553, 345.84153136))) * 534.7645);
return rand_seed.y;
}
are making assumptions about numerical accuracy which are not guaranteed by WGSL.
There's at least one other issue. The code is using workgroups of size 64 and then dispatching Math.ceil(numParticles / 64)
workgroups. It's using 50000 particles. That means it's calculating and writing to 50048 particles, 48 of which are out of bounds. Out of bounds writes could corruprt other particles. That they aren't is just luck.
The sample should either round up/down the number of particles to a multiple of 64, or it should check in the shader and not process out of bounds particles.
Indeed, I think both Chrome and Safari are clamping, but that is not the only behavior allowed by the specification
I can confirm with https://chromium-review.googlesource.com/c/chromium/src/+/5920336 that setting strict math removes the blue trail. Thank you!
I'm not sure yet though how to fix the root issue. Setting numParticles
to 50048
alone was not enough by itself.
sorry, I didn't mean to suggest the out-of-bounds writting was the source of the problem. I only meant to point out it is yet another problem that may show a different issue on some implementations
I don't have time right now to look at it, maybe later in the week, but we could try annotating certain math functions in the metal with:
#pragma METAL fp math_mode([relaxed | safe | fast])
to determine which math function is causing the logic to break down
In WebKit for the following WGSL:
fn init_rand(invocation_id : u32, seed : vec4f) {
rand_seed = seed.xz;
rand_seed = fract(rand_seed * cos(35.456+f32(invocation_id) * seed.yw));
rand_seed = fract(rand_seed * cos(41.235+f32(invocation_id) * seed.xw));
}
we generate:
void function0(unsigned parameter0, vec<float, 4> parameter1, thread vec<float, 2>& global0)
{
global0 = parameter1.xz;
global0 = fract((global0 * cos((35.45600128173828 + (float(parameter0) * parameter1.yw)))));
global0 = fract((global0 * cos((41.23500061035156 + (float(parameter0) * parameter1.xw)))));
}
replacing that with (note precise:: in front of the call to cos
):
void function0(unsigned parameter0, vec<float, 4> parameter1, thread vec<float, 2>& global0)
{
global0 = parameter1.xz;
global0 = fract((global0 * precise::cos((35.45600128173828 + (float(parameter0) * parameter1.yw)))));
global0 = fract((global0 * precise::cos((41.23500061035156 + (float(parameter0) * parameter1.xw)))));
}
corrects the issue @beaufortfrancois
@beaufortfrancois it looks like the value being passed to cos
call is too large, I didn't debug the actual values but dividing parameter0
by 100 seems to resolve the issue:
void function0(unsigned parameter0, vec<float, 4> parameter1, thread vec<float, 2>& global0)
{
global0 = parameter1.xz;
global0 = fract((global0 * fast::cos((35.45600128173828 + (float(parameter0)*0.01f * float2(parameter1.y, parameter1.w))))));
global0 = fract((global0 * fast::cos((41.23500061035156 + (float(parameter0)*0.01f * float2(parameter1.x, parameter1.w))))));
}
The following blue trail is a pet peeve of me in the Particles (HDR) sample:
I can reproduce on macOS (Apple M1 Pro device with Chrome and Safari), but not on Android. Could this be a Metal driver issue or something else? FYI @mwyrzykowski