Closed brendan-duncan closed 1 year ago
The alpha-discard example you showed should be fixed (at least in the spec) because discard is not demote_to_helper
semantic instead of terminate
. Can you share some of the other examples that were not fixed by the change in discard semantic?
Thanks. Yes, I'm in the process of putting together a catalog of all the types of errors we're getting. I'll share that here soon.
In general, we know that the analysis is going to need various adjustments to accommodate common patterns that the analysis isn't smart enough to recognize are fine. But this has to happen on a case-by-case basis, so posting about the problems you run into is very helpful. Obviously, the details matter, but our hope is that in the end, mass conversions should only require a few adjustments --- assuming the code is actually correct.
I'm still looking through the 1,865 shader variants being flagged with errors from a single project I've been testing, to identify the root causes of the analysis errors. So far, many of them are caused by the aforementioned discard pattern, and global variable false positive errors. A local variable being set to a non-uniform value, and then set to a uniform value, will still be considered non-uniform. At least for us, it's possible these could possibly be resolved in the shader translation process, since the global variables are being introduced from the translation of SPIR-V to WGSL by Tint. The "every global var is non-uniform" rule is generally problematic, though I understand the increased scanning complexity relaxing that rule would impose. I would argue the problems it causes, has a greater cost than the safety it promises.
@brendan-duncan I just realized my comment above was a bit dry. I thought Tint had the uniformity of discard up to spec but that seems to not be the case so I can definitely imagine that it would be hard to filter out false negative from true negative. Let's collaborate a bit more on making Tint better so we can see what are the problematic cases that aren't just the ecosystem missing features. (which I'm sure there will be since you showed one issue earlier where there was a sample in a condition depending on the value of a sample iirc?).
As it is, I'm not confident we can reasonably port our engine to WebGPU.
I would consider that an utter failure of WebGPU.
I would argue the problems it causes, has a greater cost than the safety it promises.
This was understood from the beginning as a potential risk, since no other shader language or tooling enforced this level of strictness. We spent a lot of innovation tokens on this, and it's the arguably the densest part of the spec. Your feedback is tremendously valuable.
There are a few issues at play:
The "programmer knows best" scenarios (1 and 2) are legitimate and can't be fixed by smarter WebGPU implementations.
To me this is very strong evidence that we need an escape hatch of some sort.
Usability matters a lot.
That escape hatch should be usable from Unity's use case: starting with something like HLSL through a chain of translations into SPIR-V then WGSL. The chain-of-translations flow represents a very important class of applications, definitely not only Unity.
Another example of a shader that falls in the "yes, this is non-uniform, but we know what we're doing" category is from one of our UI blit shaders (example in HLSL, but it looks similar in WGSL):
fixed4 frag(v2f i) : SV_Target
{
fixed4 color = fixed4(1, 1, 1, 1);
switch (i.texcoord.z)
{
case 0: color = tex2D(_MainTex0, i.texcoord.xy); break;
case 1: color = tex2D(_MainTex1, i.texcoord.xy); break;
case 2: color = tex2D(_MainTex2, i.texcoord.xy); break;
//case 3 .... 8:
}
}
Another problem is that in other shading languages, implicit derivatives are allowed if control flow is uniform among all fragments for a single primitive but WGSL requires uniformity across the entire draw call. So something like this is totally allowed in GLSL, but not in WGSL:
vec4 color = vec4(1,0,0,1);
if (gl_InstanceID == 0)
color = texture(sampler2D(tex, linear), texcoord.xy);
(In a more real-world example, you might use gl_InstanceID to index into a uniform array and then branch on that value instead.)
This is likely to be an issue in piet-gpu as well. I'm linking a gist of a mostly automated translation of coarse.comp, not intended to be correct but intended to highlight the uniformity issue. It fails with the following error:
coarse.wgsl:1610:5 error: 'workgroupBarrier' must only be called from uniform control flow
workgroupBarrier();
^^^^^^^^^^^^^^^^
coarse.wgsl:2136:5 note: control flow depends on non-uniform value
if (((x_2196 >= x_2197) & (x_2199 >= x_2200))) {
^^
coarse.wgsl:1392:28 note: reading from workgroup storage variable 'sh_part_count' may result in a non-uniform value
let x_1270 : u32 = sh_part_count[255i];
This is basically another manifestation of #2321. I'm trying to broadcast a value from a shared memory location to all threads. This is uniform as long as there's no race condition, so basically uniformity analysis depends on proving the absence of races. It is possible to work around this but with some effort and loss of performance. This idiom helps load-balance work across threads.
(For completeness, repro steps for generating that gist: coarse.comp is compiled to coarse.spv using glslangValidator (spv is in gen/ subdir of piet-gpu repo). A few constructs hand-patched out of coarse.comp, mostly atomics, but also an instance of findLSB
, which fails with error: unhandled GLSL.std.450 instruction 73
even though firstTrailingBit
is available. Rename ref
and mat
as those are reserved words. Run tip-of-tree tint over resulting spir-v. Remove @stride(4)
. Fix a couple type mismatches caused by multiple structs being structurally equal.)
Ah, added that before I saw/read @raphlinus comment.... :-) I was triaging something else.
It's going to be tricky to design a targeted escape hatch that can be used by translation chains, because the upstream languages won't carry any information that could indicate when the escape hatch should be applied.
For example, annotations on expressions or struct members aren't something a translator could produce from HLSL, because there's no way for the author of the HLSL to indicate where to insert them.
It's going to be tricky to design a targeted escape hatch that can be used by translation chains, because the upstream languages won't carry any information that could indicate when the escape hatch should be applied.
Agreed.
I think we have distinct audiences:
These are clearly different use cases:
So basically, it looks like two kinds of escape hatches for the two classes of audience.
If we agree on that, then the rest is spelling.
Yes, that's very much in line with my thoughts. I don't think we should be afraid of a bulk opt-out flag, because we intend to make the uniformity analysis valuable: people who can leave the checks on will want to do so. If everyone just flips the bulk switch, then that shows that we failed to provide feedback that they found helpful.
(By analogy: Rust devs don't simply slap unsafe
on everything - but that is because safe Rust is quite usable.)
I'm a little concerned that making it useful for your second category of users may require a good bit of further work on the standard. It may not be ergonomic to simply provide (say) an expression form that coerces a value to uniform in the eyes of the analysis. We may instead need attributes on variables, struct members, and (considering the example shown earlier where the author "knows" that some vector's z value is uniform across primitives) vector components, to avoid forcing people to repetitively annotate each use of a value that they know is uniform. That is "spelling" - but it's a lot of it.
I took a closer look at @raphlinus' example.
There's a function-scope variable ready_ix
that triggers the uniformity violation.
The loop exit depends on ready_ix
, near line 2135:
let x_2197 : u32 = ready_ix; // Propagated into x_2197
let x_2199 : u32 = partition_ix;
let x_2200 : u32 = n_partitions;
if (((x_2196 >= x_2197) & (x_2199 >= x_2200))) { // and exit condition depends on x_2197
break;
}
And ready_ix
is updated in exactly one place inside the loop:
workgroupBarrier();
let x_1270 : u32 = sh_part_count[255i];
ready_ix = x_1270;
But it occurs immediately after a workgroupBarrier(). Then I noticed that Raph's algorithm appears to split execution into alternating phases that write to sh_part_count and phases that read from sh_part_count, and where the phases are separated by workgroupBarrier. So yes, there is no race on sh_part_count, and when reading, all invocations see the same values in the variable. This is firmly in the "programmer knows best" camp, confirming what @raphlinus wrote.
Yes, that's very much in line with my thoughts.
Ok, so some kind of big switch.
I'm a little concerned that making it useful for your second category of users may require a good bit of further work on the standard. It may not be ergonomic to simply provide (say) an expression form that coerces a value to uniform in the eyes of the analysis. We may instead need attributes on variables, struct members, and (considering the example shown earlier where the author "knows" that some vector's z value is uniform across primitives) vector components, to avoid forcing people to repetitively annotate each use of a value that they know is uniform. That is "spelling" - but it's a lot of it.
Let's make this concrete:
Add a uniform
builtin:
uniformity
. i.e. it cuts a path.Add a @uniform
attribute, to that can e applied to var
, let
and formal parameter declarations. It's an assertion that any value read from them is uniform. Cuts the path.
FYI: SPIR-V has Uniform
and UniformId
decorations, which are like the uniform
builtin, i.e. applied to expressions. The "Id" part is for expressing a scope other than 'subgroup'; by default 'Uniform' asserts 'this value is uniform across the subgroup'.
I just want to second the point that a shader author needs to have an option of "I know what I am doing" in that one can declare that use of a derivative (or implicit derivative) non-uniform control are ok and handled. One idea is perhaps to have variants of those function, dFdx
, dFdy
, fwidth
, texture
, to have a variant that can be used in a non-uniform situation, maybe something like dFdxNonUniform
.
One example where there is non-uniform control, but the results for derivative functions give good values is the following (excuse the use of GLSL)
flat in int I;
in vec2 f;
uniform sampler2D S;
out vec4 out_color;
void main(void)
{
if (I == 0)
{
vec2 ff;
ff = sin(f) * 0.5 + 0.5; // or any other non-trivial formula
out_color = texture(S, ff);
}
else
{
out_color = vec4(1.0, 1.0, 1.0, 1.0);
}
}
The derivative operations will give correct values, because the if()
is from a flat
value which is the same across all fragments in a triangle.
The derivative operations will give correct values, because the if() is from a flat value which is the same across all fragments in a triangle.
The example seems reasonable, and resembles the one given in Brendan's earlier comment.
At the moment, the only scope of uniformity that WGSL discusses is the entire draw call. I wonder if it would make sense to have two scopes:
Since barriers can only be used in compute shaders, and derivative operations can only be used in fragment shaders, WGSL could simply assume the appropriate scope for each one. A user-defined function that uses barriers or derivatives is already restricted to use by one kind of entry point or the other, so we should always be able to select a single kind of uniformity to apply for each function.
There are a few bits about uniform control flow and it is worth spelling out fully. The main point was for SIMD architecture so that if a value depended on another slot with an SIMD jazz, then the value was there; another one that is a bit rarer is that the value is the same for all buggers across the SIMD jazz (the one I remember the most was for sampler2D something[N]
so that when indexing into something
, the array index was the same across the entire SIMD jazz).
In terms of kinds of uniformity there are more than the two above, here are some that I see for drawing stuff:
When we get into compute there are more naturally coming from work group geometry the most obvious one being same across entire dispatch vs same across just a fixed work group... but for compute there are more if one things about each of dimensions to some extent.
Though, what is more ideal is to somehow "know" the SIMD-ness (or SIMT-ness) of stuff; however, that is terribly GPU specific. Sighs.
I'd advise for have a "strict/safe" and "liberal/unsafe" versions for those functions that have derivatives.
On a related note, here is a real world example of using early out like semantics in a way compatible with derivative ops:
bool
do_early_out(void)
{
float f;
f = some_smooth_enough_function_dependent_on_smooth_varyings();
return f + fwidth(f) < 0.0
}
out vec4 v;
void
main(void)
{
if (do_early_out())
{
v = vec4(0.0);
return;
}
v = some_expensive_stuff();
}
and then there is this which is even better:
bool
do_early_out(void)
{
float f;
f = some_smooth_enough_function_dependent_on_smooth_varyings();
return f < 0.0
}
out vec4 v;
void
main(void)
{
if (allInvocationsARB(do_early_out()))
{
v = vec4(0.0);
return;
}
v = some_expensive_stuff();
}
What information are you looking for? I have literally thousands of shader variants with errors that could be provided to help catalog some of the analyses issues. Most of the shader errors come from common patterns that are identified as non-uniform, but I'd rather not spend a lot more time sorting through the shaders to catalog the error sources. The shaders are quite large, and translated through multiple compilers before they get to WGSL, so they are also not very readable.
A number of the fundamental issues of analysis being too strict have been discussed. There are two main categories of analysis issues: false errors due to scanner implementations, and shaders that are purposefully non-uniform.
The first category can possibly be fixed by the implementations, but the requirement that analysis scanning be in linear time and performant makes it difficult.
The second category needs an escape hatch. The escape hatch needs to be at both the global level, for shaders like in Unity that are not written by hand but go through multiple levels of translation; and ideally at the shader code level, so a shader author could tag a block or variable as being acceptably non-uniform. The global level escape hatch is a must-have for Unity, since we don't author WGSL directly.
There's also the third case mentioned above of fragment shaders which are non-uniform according to WGSL's definition, but valid according to the requirements of the low-level APIs.
But really the more high level concern is about the false positive rate. If users are constantly getting non-uniformity errors for valid code, they're just going to start habitually replacing if(...) {}
with if(uniform(...)) {}
whenever they see one (or flipping the global escape hatch if one is provided) without even stopping to think whether their shader is correct or not. At which point the uniformity analysis isn't actually providing any value at all...
they're just going to start habitually replacing if(...) {} with if(uniform(...)) {} whenever they see one (or flipping the global escape hatch if one is provided) without even stopping to think whether their shader is correct or not. At which point the uniformity analysis isn't actually providing any value at all...
I disagree with that sentiment; Most rust code is not massively decorated with unsafe
just to get it to compile. I also don't think an interface of if(uniform())
is anywhere the right way to go, as declaring that a block of code is uniform when it is not has futher consequences than just if it is legal to use certain functions. I'd rather see "unsafe" versions for the functions dFdx
, dFdy
, fwidth
, texture
and barrier
that can be used regardless of uniform analysis. That would be the escape hatch for "I know what I am doing". For the case where there are shaders that originated from other sources (such as Unity) that are translated to WGSL, then they would just use the unsafe versions in their translation since those shaders were already working anyways. Though that option requires some kind of option in the translator to do that. I admit I am quite uneasy about a global option to turn it all off in all of a shader, it seems gong to far... though, maybe I am wrong here on that.
For developers like us who don't write WGSL directly, marking WGSL blocks as "unsafe" or "uniform" isn't really an option. Unsafe versions of the offending functions, might work in that I could just filter/replace the safe versions with unsafe versions when generating the WGSL if it's determined the shader is intentionally non-uniform or hit by false errors. Otherwise a global option would be needed in our case.
I disagree with that sentiment; Most rust code is not massively decorated with unsafe just to get it to compile.
An enormous amount of effort went into designing the Rust language so that unsafe was rarely needed and that the borrow checker had a very low false positive rate. Just because WGSL's uniformity analysis looks superficially similar doesn't mean that it will also automatically be ergonomic to use and have a low false positive rate. It would be great it if were! But that requires work now, not just for designing escape hatches, but also finding ways to make sure those escape hatches are almost never needed.
So, I don't think this case is that the uniform analysis checker is completely at fault. The issue is that doing clever things are not feasible to detect; shaders, especially non-trivial ones, have a tendency to do clever things that work. The truth is that the escape hatches are going to be needed to be used a great in performance sensitive code. Just early outs are a big deal by itself. As a strong example, the derivative operations are not necessarily broken by non-uniform control flow. They are broken when the control flow to the neighbouring pixels is not the same, see the example I wrote above as where derivative will still work although there is an early-out breaking uniform control flow. By not marking a block of code, but just having an unsafe function version, it keeps the escape hatch completely local to the offending function call.
Just because WGSL's uniformity analysis looks superficially similar doesn't mean that it will also automatically be ergonomic to use and have a low false positive rate. It would be great it if were! But that requires work now, not just for designing escape hatches, but also finding ways to make sure those escape hatches are almost never needed.
I agree with all this. I didn't mean to imply that WGSL is fine as it is. My point was similar to yours: people don't simply pursue the lowest possible level of interference, and a well-designed static analysis can be welcomed.
For developers like us who don't write WGSL directly, marking WGSL blocks as "unsafe" or "uniform" isn't really an option. Unsafe versions of the offending functions, might work in that I could just filter/replace the safe versions with unsafe versions when generating the WGSL if it's determined the shader is intentionally non-uniform or hit by false errors. Otherwise a global option would be needed in our case.
If your translator can translate e.g. x = foo()
to x = foo_but_assume_uniform()
, it could also instead generate x = unsafe { foo() }
, right?
@brendan-duncan wrote:
For developers like us who don't write WGSL directly, marking WGSL blocks as "unsafe" or "uniform" isn't really an option.
Like Kelsey, I'm surprised by this. Couldn't you just always generate WGSL whose functions' bodies are entirely wrapped in an unsafe
block? (Assuming WGSL grew such a thing.)
@jimblandy We use Tint to generate the WGSL from SPIR-V, so perhaps it could be extended to tag generated functions for us if such a tag existed. Otherwise the other option would be to postprocess the WGSL, scan for functions and insert tags after the fact. If tagging functions as unsafe were the only option, I would make it work. But it would be much simpler to have a GPUShaderModuleCompilationHint to skip analysis for the given shader. If a study of shaders causing analysis issues is something you're interested in, we could work with you to provide a large corpus of shaders for study, similar to what we've done for Google.
Yes, that sort of change to Tint is what I had in mind.
People writing WGSL directly want to be able to say, "I know what I'm doing here" for one piece of code without giving up the benefits of the analysis for the rest. Something like an unsafe block supports that sort of use while still being straightforward for a generator like Tint or Naga to apply to the whole program.
If a study of shaders causing analysis issues is something you're interested in, we could work with you to provide a large corpus of shaders for study, similar to what we've done for Google.
Mozilla is very interested in this.
Tint can be made to emit precise block annotations, or function annotations based on UA violations. It would have to be done (internally) after the WGSL is generated, likely via a transform. That said, transforms are a big perf hit, and if this were automatically done by the compiler, then the compiler might as well just wrap everything with this annotation and avoid another pass.
Either way, a block or function annotation to silence UA would work for tint's SPIR-V ingestion path.
I don't follow why the current proposal is to be marking blocks of code with "ignore uniform analysis issues" instead of just having variants of those functions, i.e. dFdx_unsafe
or something like with the possibility to still emit a warning on using _unsafe
variants outside of uniform control.
I don't follow why the current proposal is to be marking blocks of code with "ignore uniform analysis issues" instead of just having variants of those functions, i.e.
dFdx_unsafe
or something
+1, I think this is a much better solution than the idea of marking blocks of code with an "unsafe" tag. It much more accurately captures the information that the author is trying to convey to the compiler, is much simpler to spec and implement, and is trivial to separate between fragment and compute stages. I would vote against having any escape hatch for barriers.
FWIW I think a module-level enable allow_non_uniform_sampling;
switch (or similar) is even more preferable, but I gather that there is significant pushback against this.
Let's move discussion of the opt out design to #3554.
Looking at WGSL 2022-10-25 Minutes it starts with trying to make the escape hatch as local as possible, and then it returns to marking blocks as unsafe-like. In contrast, just having unsafe function versions maximizes locality.
What is the motivation for marking blocks of code as unsafe-like instead of just having unsafe functions? I do not see any benefit at all from the marking of blocks of code over having unsafe function variants.
@brendan-duncan How's it feeling now?
@kdashg This solved our problems with vertex and fragment shaders! Still having a problem with a compute shader (early out causes barriers to be considered non-uniform). I was going to submit that as a Tint bug initially, to determine if it's an implementation issue or a spec issue, or a me issue. I'll file a new issue here if it turns out to be a spec issue.
Thanks for the great work to find a workable compromise on this issue!
Thank you very much for the clear and actionable feedback. It's been super helpful!
I'll file a new issue here if it turns out to be a spec issue.
Closing this one.
You can get around it showing up as a naga error by wrapping the texture sample in a function that is called when you would normally do the texture sample.
fn tsw(t_diffuse: texture_2d<f32>, s_diffuse: sampler, uv: vec2<f32>) -> vec4<f32> {
return textureSample(t_diffuse, s_diffuse, uv);
}
Now that uniformity analysis is being treated as an error instead of a warning, many of the shaders we've been working with are being flagged as uniformity analysis errors. Modifying the shaders to pass analysis is a monumental task, with an existing engine, where we translate existing shaders from an enormous library, where shaders use libraries of includes, or are otherwise composited from multiple sources. We cannot just "rewrite the shader" to pass the analysis check.
I believe the analysis is being far too strict, particularly for non-vertex shaders. One example of an analysis error in a fragment shader:
Any textureSample calls after that conditional discard line, is considered non-uniform.
No other graphics API or platform has a problem with this pattern. But now WebGPU will error on it.
This is just one example, but other errors I've been hitting are of a similar pattern. We need to loosen up the analysis and what is flagged as an error. As it is, I'm not confident we can reasonably port our engine to WebGPU.