Uniformity analysis is far too strict

brendan-duncan commented 2 years ago

Now that uniformity analysis is being treated as an error instead of a warning, many of the shaders we've been working with are being flagged as uniformity analysis errors. Modifying the shaders to pass analysis is a monumental task, with an existing engine, where we translate existing shaders from an enormous library, where shaders use libraries of includes, or are otherwise composited from multiple sources. We cannot just "rewrite the shader" to pass the analysis check.

I believe the analysis is being far too strict, particularly for non-vertex shaders. One example of an analysis error in a fragment shader:

let cutout = textureSample(cutoutTex, sampler, uv);
if (cutout.a > 0.5) discard;
// ... lots of code for the rest of the shader code
let color = textureSample(colorTex, sampler, uv);

Any textureSample calls after that conditional discard line, is considered non-uniform.

No other graphics API or platform has a problem with this pattern. But now WebGPU will error on it.

This is just one example, but other errors I've been hitting are of a similar pattern. We need to loosen up the analysis and what is flagged as an error. As it is, I'm not confident we can reasonably port our engine to WebGPU.

Kangz commented 2 years ago

The alpha-discard example you showed should be fixed (at least in the spec) because discard is not demote_to_helper semantic instead of terminate. Can you share some of the other examples that were not fixed by the change in discard semantic?

brendan-duncan commented 2 years ago

Thanks. Yes, I'm in the process of putting together a catalog of all the types of errors we're getting. I'll share that here soon.

jimblandy commented 2 years ago

In general, we know that the analysis is going to need various adjustments to accommodate common patterns that the analysis isn't smart enough to recognize are fine. But this has to happen on a case-by-case basis, so posting about the problems you run into is very helpful. Obviously, the details matter, but our hope is that in the end, mass conversions should only require a few adjustments --- assuming the code is actually correct.

brendan-duncan commented 2 years ago

I'm still looking through the 1,865 shader variants being flagged with errors from a single project I've been testing, to identify the root causes of the analysis errors. So far, many of them are caused by the aforementioned discard pattern, and global variable false positive errors. A local variable being set to a non-uniform value, and then set to a uniform value, will still be considered non-uniform. At least for us, it's possible these could possibly be resolved in the shader translation process, since the global variables are being introduced from the translation of SPIR-V to WGSL by Tint. The "every global var is non-uniform" rule is generally problematic, though I understand the increased scanning complexity relaxing that rule would impose. I would argue the problems it causes, has a greater cost than the safety it promises.

Kangz commented 2 years ago

@brendan-duncan I just realized my comment above was a bit dry. I thought Tint had the uniformity of discard up to spec but that seems to not be the case so I can definitely imagine that it would be hard to filter out false negative from true negative. Let's collaborate a bit more on making Tint better so we can see what are the problematic cases that aren't just the ecosystem missing features. (which I'm sure there will be since you showed one issue earlier where there was a sample in a condition depending on the value of a sample iirc?).

dneto0 commented 2 years ago

As it is, I'm not confident we can reasonably port our engine to WebGPU.

I would consider that an utter failure of WebGPU.

I would argue the problems it causes, has a greater cost than the safety it promises.

This was understood from the beginning as a potential risk, since no other shader language or tooling enforced this level of strictness. We spent a lot of innovation tokens on this, and it's the arguably the densest part of the spec. Your feedback is tremendously valuable.

There are a few issues at play:

Patterns that are actually pedantically correct but the compiler can't analyze:
- e.g. branch on a value that the programmer knows is uniform, but the compiler can't tell. (e.g. value from storage buffer)
- Programmer knows better, pedantically
Patterns that are pedantically incorrect, but any inaccuracy or non-portability is within application tolerance
- e.g. the derivative is not computed perfectly, so you might get select the wrong level of detail on a texturing operation. But perhaps the error has small perceptual impact. (Does the game player notice an artifact at the edge of an object at 30fps?)
- Programmer knows better, accepts the tradeoff
Conservative approximations in the analysis.
- e.g. uniform value stored to a module-scope variable, then used in another function. The analysis is conservative because it is static analysis and only intra-procedural.
Translation mechanism hides the uniformity: e.g. Tint's SPIR-V reader copies pipeline inputs to module-scope variables for later use, which then triggers the loss of accuracy due to conservatism about module-scope variables.
- This has material effect certainly for the workgroup_id and num_workgroups builtin inputs, which are by definition unifoirm. No other builtins are considered uniform, and so fragment shaders are unaffected by this.
- This aspect can be fixed with a more sophisticated translation.

The "programmer knows best" scenarios (1 and 2) are legitimate and can't be fixed by smarter WebGPU implementations.

To me this is very strong evidence that we need an escape hatch of some sort.
Usability matters a lot. That escape hatch should be usable from Unity's use case: starting with something like HLSL through a chain of translations into SPIR-V then WGSL. The chain-of-translations flow represents a very important class of applications, definitely not only Unity.

brendan-duncan commented 2 years ago

Another example of a shader that falls in the "yes, this is non-uniform, but we know what we're doing" category is from one of our UI blit shaders (example in HLSL, but it looks similar in WGSL):

fixed4 frag(v2f i) : SV_Target
{
    fixed4 color = fixed4(1, 1, 1, 1);
    switch (i.texcoord.z)
    {
        case 0: color = tex2D(_MainTex0, i.texcoord.xy); break;
        case 1: color = tex2D(_MainTex1, i.texcoord.xy); break;
        case 2: color = tex2D(_MainTex2, i.texcoord.xy); break;
        //case 3 .... 8:
    }
}

fintelia commented 2 years ago

Another problem is that in other shading languages, implicit derivatives are allowed if control flow is uniform among all fragments for a single primitive but WGSL requires uniformity across the entire draw call. So something like this is totally allowed in GLSL, but not in WGSL:

vec4 color = vec4(1,0,0,1);
if (gl_InstanceID == 0)
    color = texture(sampler2D(tex, linear), texcoord.xy);

(In a more real-world example, you might use gl_InstanceID to index into a uniform array and then branch on that value instead.)

raphlinus commented 2 years ago

This is likely to be an issue in piet-gpu as well. I'm linking a gist of a mostly automated translation of coarse.comp, not intended to be correct but intended to highlight the uniformity issue. It fails with the following error:

coarse.wgsl:1610:5 error: 'workgroupBarrier' must only be called from uniform control flow
    workgroupBarrier();
    ^^^^^^^^^^^^^^^^

coarse.wgsl:2136:5 note: control flow depends on non-uniform value
    if (((x_2196 >= x_2197) & (x_2199 >= x_2200))) {
    ^^

coarse.wgsl:1392:28 note: reading from workgroup storage variable 'sh_part_count' may result in a non-uniform value
        let x_1270 : u32 = sh_part_count[255i];

This is basically another manifestation of #2321. I'm trying to broadcast a value from a shared memory location to all threads. This is uniform as long as there's no race condition, so basically uniformity analysis depends on proving the absence of races. It is possible to work around this but with some effort and loss of performance. This idiom helps load-balance work across threads.

(For completeness, repro steps for generating that gist: coarse.comp is compiled to coarse.spv using glslangValidator (spv is in gen/ subdir of piet-gpu repo). A few constructs hand-patched out of coarse.comp, mostly atomics, but also an instance of findLSB, which fails with error: unhandled GLSL.std.450 instruction 73 even though firstTrailingBit is available. Rename ref and mat as those are reserved words. Run tip-of-tree tint over resulting spir-v. Remove @stride(4). Fix a couple type mismatches caused by multiple structs being structurally equal.)

dneto0 commented 2 years ago

Related #2321 which discussed an escape hatch for marking a load from a storage buffer as uniform.

Ah, added that before I saw/read @raphlinus comment.... :-) I was triaging something else.

jimblandy commented 2 years ago

It's going to be tricky to design a targeted escape hatch that can be used by translation chains, because the upstream languages won't carry any information that could indicate when the escape hatch should be applied.

For example, annotations on expressions or struct members aren't something a translator could produce from HLSL, because there's no way for the author of the HLSL to indicate where to insert them.

dneto0 commented 2 years ago

It's going to be tricky to design a targeted escape hatch that can be used by translation chains, because the upstream languages won't carry any information that could indicate when the escape hatch should be applied.

Agreed.

I think we have distinct audiences:

Class A: Folks who have a long chain of translations from other original source. Unity, for example.
Class B: Folks for whom WGSL is the original source. Includes a long tail of developers making a Web-first GPU app.

These are clearly different use cases:

Class A folks often do their own qualification/certification/QA and tells their downstream users (or internal teams) what works where.
- They take on their own responsibility for correctness for the uniformity cases.
- They need a simple big hammer of an escape hatch, single knob to turn on.
Class B folks are the ones who need the guidance and guardrails that the uniformity analysis is supposed to help. They:
- (1) learn from the analysis feedback, and
- (2a) have a good chance at fixing their code if it truly is a mistake or
- (2b) applying a targeted escape to their WGSL source when they truly know better.

So basically, it looks like two kinds of escape hatches for the two classes of audience.

If we agree on that, then the rest is spelling.

jimblandy commented 2 years ago

Yes, that's very much in line with my thoughts. I don't think we should be afraid of a bulk opt-out flag, because we intend to make the uniformity analysis valuable: people who can leave the checks on will want to do so. If everyone just flips the bulk switch, then that shows that we failed to provide feedback that they found helpful.

(By analogy: Rust devs don't simply slap unsafe on everything - but that is because safe Rust is quite usable.)

I'm a little concerned that making it useful for your second category of users may require a good bit of further work on the standard. It may not be ergonomic to simply provide (say) an expression form that coerces a value to uniform in the eyes of the analysis. We may instead need attributes on variables, struct members, and (considering the example shown earlier where the author "knows" that some vector's z value is uniform across primitives) vector components, to avoid forcing people to repetitively annotate each use of a value that they know is uniform. That is "spelling" - but it's a lot of it.

dneto0 commented 2 years ago

I took a closer look at @raphlinus' example.

There's a function-scope variable ready_ix that triggers the uniformity violation. The loop exit depends on ready_ix, near line 2135:

    let x_2197 : u32 = ready_ix;   //  Propagated into x_2197
    let x_2199 : u32 = partition_ix;
    let x_2200 : u32 = n_partitions;
    if (((x_2196 >= x_2197) & (x_2199 >= x_2200))) {   // and exit condition depends on x_2197
      break;
    }

And ready_ix is updated in exactly one place inside the loop:

        workgroupBarrier();
        let x_1270 : u32 = sh_part_count[255i];
        ready_ix = x_1270;

But it occurs immediately after a workgroupBarrier(). Then I noticed that Raph's algorithm appears to split execution into alternating phases that write to sh_part_count and phases that read from sh_part_count, and where the phases are separated by workgroupBarrier. So yes, there is no race on sh_part_count, and when reading, all invocations see the same values in the variable. This is firmly in the "programmer knows best" camp, confirming what @raphlinus wrote.

dneto0 commented 2 years ago

Yes, that's very much in line with my thoughts.

Ok, so some kind of big switch.

I'm a little concerned that making it useful for your second category of users may require a good bit of further work on the standard. It may not be ergonomic to simply provide (say) an expression form that coerces a value to uniform in the eyes of the analysis. We may instead need attributes on variables, struct members, and (considering the example shown earlier where the author "knows" that some vector's z value is uniform across primitives) vector components, to avoid forcing people to repetitively annotate each use of a value that they know is uniform. That is "spelling" - but it's a lot of it.

Let's make this concrete:

Add a uniform builtin:
- Signature: uniform(e:T) -> T where T is any type that
- Semantics: It's the identity function, but also an assertion that the result value is uniform. In terms of the analysis: in the uniformity analysis graph no arc leaves the uniformity. i.e. it cuts a path.
Add a @uniform attribute, to that can e applied to var, let and formal parameter declarations. It's an assertion that any value read from them is uniform. Cuts the path.

FYI: SPIR-V has Uniform and UniformId decorations, which are like the uniform builtin, i.e. applied to expressions. The "Id" part is for expressing a scope other than 'subgroup'; by default 'Uniform' asserts 'this value is uniform across the subgroup'.

krogovin commented 2 years ago

I just want to second the point that a shader author needs to have an option of "I know what I am doing" in that one can declare that use of a derivative (or implicit derivative) non-uniform control are ok and handled. One idea is perhaps to have variants of those function, dFdx, dFdy, fwidth, texture, to have a variant that can be used in a non-uniform situation, maybe something like dFdxNonUniform.

One example where there is non-uniform control, but the results for derivative functions give good values is the following (excuse the use of GLSL)

flat in int I;
in vec2 f;

uniform sampler2D S;
out vec4 out_color;

void main(void)
{
   if (I == 0)
    {
       vec2 ff;

      ff = sin(f) * 0.5 + 0.5; // or any other non-trivial formula
      out_color = texture(S, ff);
    }
   else
   {
       out_color = vec4(1.0, 1.0, 1.0, 1.0);
   }
}

The derivative operations will give correct values, because the if() is from a flat value which is the same across all fragments in a triangle.

jimblandy commented 2 years ago

The derivative operations will give correct values, because the if() is from a flat value which is the same across all fragments in a triangle.

The example seems reasonable, and resembles the one given in Brendan's earlier comment.

At the moment, the only scope of uniformity that WGSL discusses is the entire draw call. I wonder if it would make sense to have two scopes:

the draw call, for barriers, and
the primitive, for derivative operations.

Since barriers can only be used in compute shaders, and derivative operations can only be used in fragment shaders, WGSL could simply assume the appropriate scope for each one. A user-defined function that uses barriers or derivatives is already restricted to use by one kind of entry point or the other, so we should always be able to select a single kind of uniformity to apply for each function.

krogovin commented 2 years ago

There are a few bits about uniform control flow and it is worth spelling out fully. The main point was for SIMD architecture so that if a value depended on another slot with an SIMD jazz, then the value was there; another one that is a bit rarer is that the value is the same for all buggers across the SIMD jazz (the one I remember the most was for sampler2D something[N] so that when indexing into something, the array index was the same across the entire SIMD jazz).

In terms of kinds of uniformity there are more than the two above, here are some that I see for drawing stuff:

same across the entire draw
same across an entire instance of a draw
same across an entire patch (after vertex shading)
same across an entire triangle (in fragment shading)

When we get into compute there are more naturally coming from work group geometry the most obvious one being same across entire dispatch vs same across just a fixed work group... but for compute there are more if one things about each of dimensions to some extent.

Though, what is more ideal is to somehow "know" the SIMD-ness (or SIMT-ness) of stuff; however, that is terribly GPU specific. Sighs.

kdashg commented 2 years ago

WGSL 2022-10-11 Minutes

* JB: Want to extract as much information from folks now about what they're hitting because as soon as we have an escape hatch we won't get the feedback. Would be good to organize issue to have a list of issues being encountered instead of just discussions.

krogovin commented 2 years ago

I'd advise for have a "strict/safe" and "liberal/unsafe" versions for those functions that have derivatives.

On a related note, here is a real world example of using early out like semantics in a way compatible with derivative ops:

bool
do_early_out(void)
{
   float f;

   f = some_smooth_enough_function_dependent_on_smooth_varyings();
   return f + fwidth(f) < 0.0
}

out vec4 v;

void
main(void)
{
    if (do_early_out())
    {
         v = vec4(0.0);
         return;
    }

    v = some_expensive_stuff();
}

and then there is this which is even better:

bool
do_early_out(void)
{
   float f;

   f = some_smooth_enough_function_dependent_on_smooth_varyings();
   return f  < 0.0
}

out vec4 v;

void
main(void)
{
    if (allInvocationsARB(do_early_out()))
    {
         v = vec4(0.0);
         return;
    }

    v = some_expensive_stuff();
}

brendan-duncan commented 2 years ago

What information are you looking for? I have literally thousands of shader variants with errors that could be provided to help catalog some of the analyses issues. Most of the shader errors come from common patterns that are identified as non-uniform, but I'd rather not spend a lot more time sorting through the shaders to catalog the error sources. The shaders are quite large, and translated through multiple compilers before they get to WGSL, so they are also not very readable.

A number of the fundamental issues of analysis being too strict have been discussed. There are two main categories of analysis issues: false errors due to scanner implementations, and shaders that are purposefully non-uniform.

The first category can possibly be fixed by the implementations, but the requirement that analysis scanning be in linear time and performant makes it difficult.

The second category needs an escape hatch. The escape hatch needs to be at both the global level, for shaders like in Unity that are not written by hand but go through multiple levels of translation; and ideally at the shader code level, so a shader author could tag a block or variable as being acceptably non-uniform. The global level escape hatch is a must-have for Unity, since we don't author WGSL directly.

fintelia commented 2 years ago

There's also the third case mentioned above of fragment shaders which are non-uniform according to WGSL's definition, but valid according to the requirements of the low-level APIs.

But really the more high level concern is about the false positive rate. If users are constantly getting non-uniformity errors for valid code, they're just going to start habitually replacing if(...) {} with if(uniform(...)) {} whenever they see one (or flipping the global escape hatch if one is provided) without even stopping to think whether their shader is correct or not. At which point the uniformity analysis isn't actually providing any value at all...

krogovin commented 2 years ago

they're just going to start habitually replacing if(...) {} with if(uniform(...)) {} whenever they see one (or flipping the global escape hatch if one is provided) without even stopping to think whether their shader is correct or not. At which point the uniformity analysis isn't actually providing any value at all...

I disagree with that sentiment; Most rust code is not massively decorated with unsafe just to get it to compile. I also don't think an interface of if(uniform()) is anywhere the right way to go, as declaring that a block of code is uniform when it is not has futher consequences than just if it is legal to use certain functions. I'd rather see "unsafe" versions for the functions dFdx, dFdy, fwidth, texture and barrier that can be used regardless of uniform analysis. That would be the escape hatch for "I know what I am doing". For the case where there are shaders that originated from other sources (such as Unity) that are translated to WGSL, then they would just use the unsafe versions in their translation since those shaders were already working anyways. Though that option requires some kind of option in the translator to do that. I admit I am quite uneasy about a global option to turn it all off in all of a shader, it seems gong to far... though, maybe I am wrong here on that.

brendan-duncan commented 2 years ago

For developers like us who don't write WGSL directly, marking WGSL blocks as "unsafe" or "uniform" isn't really an option. Unsafe versions of the offending functions, might work in that I could just filter/replace the safe versions with unsafe versions when generating the WGSL if it's determined the shader is intentionally non-uniform or hit by false errors. Otherwise a global option would be needed in our case.

fintelia commented 2 years ago

I disagree with that sentiment; Most rust code is not massively decorated with unsafe just to get it to compile.

An enormous amount of effort went into designing the Rust language so that unsafe was rarely needed and that the borrow checker had a very low false positive rate. Just because WGSL's uniformity analysis looks superficially similar doesn't mean that it will also automatically be ergonomic to use and have a low false positive rate. It would be great it if were! But that requires work now, not just for designing escape hatches, but also finding ways to make sure those escape hatches are almost never needed.

krogovin commented 2 years ago

So, I don't think this case is that the uniform analysis checker is completely at fault. The issue is that doing clever things are not feasible to detect; shaders, especially non-trivial ones, have a tendency to do clever things that work. The truth is that the escape hatches are going to be needed to be used a great in performance sensitive code. Just early outs are a big deal by itself. As a strong example, the derivative operations are not necessarily broken by non-uniform control flow. They are broken when the control flow to the neighbouring pixels is not the same, see the example I wrote above as where derivative will still work although there is an early-out breaking uniform control flow. By not marking a block of code, but just having an unsafe function version, it keeps the escape hatch completely local to the offending function call.

jimblandy commented 2 years ago

Just because WGSL's uniformity analysis looks superficially similar doesn't mean that it will also automatically be ergonomic to use and have a low false positive rate. It would be great it if were! But that requires work now, not just for designing escape hatches, but also finding ways to make sure those escape hatches are almost never needed.

I agree with all this. I didn't mean to imply that WGSL is fine as it is. My point was similar to yours: people don't simply pursue the lowest possible level of interference, and a well-designed static analysis can be welcomed.

kdashg commented 2 years ago

For developers like us who don't write WGSL directly, marking WGSL blocks as "unsafe" or "uniform" isn't really an option. Unsafe versions of the offending functions, might work in that I could just filter/replace the safe versions with unsafe versions when generating the WGSL if it's determined the shader is intentionally non-uniform or hit by false errors. Otherwise a global option would be needed in our case.

If your translator can translate e.g. x = foo() to x = foo_but_assume_uniform(), it could also instead generate x = unsafe { foo() }, right?

jimblandy commented 2 years ago

@brendan-duncan wrote:

For developers like us who don't write WGSL directly, marking WGSL blocks as "unsafe" or "uniform" isn't really an option.

Like Kelsey, I'm surprised by this. Couldn't you just always generate WGSL whose functions' bodies are entirely wrapped in an unsafe block? (Assuming WGSL grew such a thing.)

brendan-duncan commented 2 years ago

@jimblandy We use Tint to generate the WGSL from SPIR-V, so perhaps it could be extended to tag generated functions for us if such a tag existed. Otherwise the other option would be to postprocess the WGSL, scan for functions and insert tags after the fact. If tagging functions as unsafe were the only option, I would make it work. But it would be much simpler to have a GPUShaderModuleCompilationHint to skip analysis for the given shader. If a study of shaders causing analysis issues is something you're interested in, we could work with you to provide a large corpus of shaders for study, similar to what we've done for Google.

jimblandy commented 2 years ago

Yes, that sort of change to Tint is what I had in mind.

People writing WGSL directly want to be able to say, "I know what I'm doing here" for one piece of code without giving up the benefits of the analysis for the rest. Something like an unsafe block supports that sort of use while still being straightforward for a generator like Tint or Naga to apply to the whole program.

jimblandy commented 2 years ago

If a study of shaders causing analysis issues is something you're interested in, we could work with you to provide a large corpus of shaders for study, similar to what we've done for Google.

Mozilla is very interested in this.

ben-clayton commented 2 years ago

Tint can be made to emit precise block annotations, or function annotations based on UA violations. It would have to be done (internally) after the WGSL is generated, likely via a transform. That said, transforms are a big perf hit, and if this were automatically done by the compiler, then the compiler might as well just wrap everything with this annotation and avoid another pass.

Either way, a block or function annotation to silence UA would work for tint's SPIR-V ingestion path.

kdashg commented 2 years ago

WGSL 2022-10-18 Minutes

* KG: * MM: I wanted to mention that if we need UnAn for security, then we can’t give an opt-out here. There was some talk that this might be the case (during office hours?), but we need to find out. I think need to figure out if we do need this. * BC: We did talk with one partner and asked to be able to publish/share some of their shaders, and in the mean time we’ve agreed to drop Chrome back to UA warnings instead of hard errors. * BC: One issue seen is that scalars which require separate uniformity tracking are being packed into vectors as part of earlier shader pipelines. The spec doesn’t currently track uniformity across vector lanes, but it could. Even with this - there are cases in our partner’s corpus that wouldn’t be solvable by adjusting the UA algorithm - we need **_an _**escape hatch. * AB: One major fix, was when I added the value assignment, where if you reassigned an aggregate, it could become uniform again. However, we do think there are cases where there’s just no recourse that we can analyze. * AB: We think that we need some kind of escape hatch though. * AB: To MM’s point, I don’t know how to give proof of security here, but it would have to do with e.g. what we do for indeterminate values. * JB: One reason this convo this will be a little odd, is we’re not just talking about what changes we make to the spec, but we’re also talking about how to relate to committee members’ partners. We’re on the verge of getting super valuable feedback from authors. Particularly, we want to get this valuable information out of people hitting, and we want to do it without being insincere. * AB: * KG: While it makes sense to give partners a temporary carve-out so work can continue, we don’t want them to show up in three months after we’ve done everything we can and say, “we still need a global opt-out”. Ideally, we want something like Rust’s `unsafe` blocks, where you can mark one section as “trust me”, but still get the benefits of the analysis from other areas. But we still want feedback from them on what are the actual issues that people are running into, so we can fix them for other people. It sounds a bit like they don’t want to fix the problems, they just want the analysis to be turned off. * AB: This partner did post specific issues, and they did explain the parts that don’t work. They’re not just saying, “turn it off”. We’re not proposing a global switch, we’re open to a range of possibilities from the broad to the specific * KG: We want the client to feel heard, but we also want to make sure that we can move forward by hearing more about the shaders that fail. We want to unblock them in the short term, but we need to get the feedback about the details * MM: One lens to view this discussion: Should we track vector members independently? yes/no? Should we split variable lifetimes at assignments? We knew it was a likely request, here it is. Discard is already fixed. Should we add a uniform load function? Now we have a request for that. One way to avoid heated discussions is to address the concrete requests that we already have. * KG: Yes - we want to collaborate. I didn’t know that those things were specifically asked for, because they were not stated directly in the issue. * AB: We can make the analysis more accurate, up until we run into similar problems to those that arise with alias analysis, where the results depend on runtime behavior. They have issues beyond what is reasonably solved by the analysis. For example: they sample a texture, and based on that they choose which texture to sample. The issue has gotten long, and these specifics are hard to find. We’re working on getting the details from them. We all agreed that we wanted uniformity analysis, and we’re committed to make it work, we’re not abandoning it * KG: We just need to concentrate on specifics here. * MM: I’m not sure that having an opt-out is better than just not running the analysis, but we can leave that for another discussion.

krogovin commented 2 years ago

I don't follow why the current proposal is to be marking blocks of code with "ignore uniform analysis issues" instead of just having variants of those functions, i.e. dFdx_unsafe or something like with the possibility to still emit a warning on using _unsafe variants outside of uniform control.

jrprice commented 2 years ago

I don't follow why the current proposal is to be marking blocks of code with "ignore uniform analysis issues" instead of just having variants of those functions, i.e. dFdx_unsafe or something

+1, I think this is a much better solution than the idea of marking blocks of code with an "unsafe" tag. It much more accurately captures the information that the author is trying to convey to the compiler, is much simpler to spec and implement, and is trivial to separate between fragment and compute stages. I would vote against having any escape hatch for barriers.

FWIW I think a module-level enable allow_non_uniform_sampling; switch (or similar) is even more preferable, but I gather that there is significant pushback against this.

alan-baker commented 2 years ago

Let's move discussion of the opt out design to #3554.

kdashg commented 2 years ago

WGSL 2022-10-25 Minutes

* JB: Been in contact with Brendan about getting access to shaders in their corpus which provide useful problems that we can use to improve uniformity analysis. In the middle of drafting PR for an unsafe block. Talked a bit at office hours last week, what it amounts too is to say that the graph that uniformity analysis consumes when analysing a function would omit edges from code occurring inside an unsafe (not the real name) blocks. Then, this would be something useful for folks who want local opt-out and useful for folks coming from other languages where can't make local adjustments. Can just wrap whole function body if can't do better. Definitely then we should press on with making analysis more accurate and perhaps adding annotations to allow authors to express preferences but would like that in separate PR. Don't think this is right issue to discuss more detailed proposals like unsafe, annotations or analysis refinement that don't require user intervention (like tracking vector components). Want to see those filed issues and discussed separately. * AB: To be clear, new type of statement? \ JB: Yes block. * KG: Expression or statement * JB: Statement. WGSL is not expression language so we have to pick * AB: Preference is as few escape hatches as possible. If we do statement, just do that, if we do attributes just have that. * JB: Let's talk about that in context of specific proposals. That question does make senes to talk about as it's how we get Unity going again. * MM: First, making uniformity analysis more fine tuned is a good idea nad we should do it. Second, a little curious about the mechanism for this opt-out we talk about. Mentioned it could be a statement/block, why not have the mechanism be larger scope where it could be an extension for the whole module or alternatively, could be offline that shader author runs if they want to, but don't run if don't' want too. Why statement and not larger. * KG: Reason for smaller scope is to retain uniformity for the rest of the function calland only opt out small portion known to be bad. Admit some of this feels a bit too obvious to me coming from rust as it's straight forward how unsafe works. If not used ot it could see how it isn't useful. In rust can do let foo = unsafe { bad thing } and then everything else assumes that thing is safe but you still get borrow checking and all the other safety guarantees. For WGSL you say this one texture sample, yea it's non-uniform, I know, but keep going and check everything else. The key is for it to be small scoped, that's the benefit, having it be too big a hammer, folks just mark whole sections as unsafe, can't make ti smaller without function composition is less useful. * KG: As for why have step we always run, i'm borrowing heavily from Rust, in that I like tha Rust forces you to do these things, it isn't an offline check. that said, Rust needs that safety for it's guarantees. I think it's useful to have folks run everything through it and opt-out small sections. Think that moves ecosystem forward and is worth spending innovation points. * JB: Like that explanation.2 reasons for part of spec and not lint, 1. If we make it lint, in practice it will be used rarely. Being manipulative through technology design as I want folks to run this analysis. Why, we have seen (Either BC or AB ahd examples) of FXC behaviour being .. by removing real bugs. By helping users, I think we're providing a benefit of making users alert to the problems they're having. Think we're catching real bugs and think we can have a good effect on ecosystem by pointing out these problems. 2. Was talking to … a developer of Mesa about barriers and non-uniform control flow and what she said is barriers can become unsynchronized. This barrier is unsync'd by another barrier elsewhere. If using internal to implementation can upset that and can upset driver. Think folks in the future will mass convert compute shaders, which I believe is not what Unity is doing yet, but when we have folks bring large bodies of compute shaders and i think we may be in a position of not being able to give it to them because we require barriers to be correct to make implementations sound. So, I think it may not be possible to opt-out in compute. Fortunately, graphics is only derivatives, and compute is only barriers, so for now opt-out fragment but not compute. * MM: One thought about a block of unsafe uniformity-ness, a potential problem, not necessarily a deal breaker, is if you make a change in uniformity that can have an effect later in the program. If i have unsafe block, don't want to ignore the effects of code in the block from the perspective of future blocks. As, i do something in unsafe block that causes compile error 100 lines down outside block. If goal is to help authors, then design like that doesn't help authors. * JB: I think unsafe only makes more programs permitted, don't see it making fewer permitted. * KG: If it changed the locality of errors, then would see that as a core problem of unsafe block proposal. It's contingent on that not being a problem and us doing this in a clean way. * TR: Have considered to mark variable as uniform-enough instead of block? If it could be marked in a way as 'user marked' as opposed to analysis-marked. If analysis can't tell, and you say it is, or it's uniform enough for texture sampling, because uniform across quad, then could propagate that, and could propagate to flow control and make sample valid in that context or derivative. Has that been considered? * JB: Yes, that would be an additional thing. The way I'm seeing it right now is if you only have unsafe blocks you would have to insert unsafe block at every use of variable. If you have var and awant it treated uniform would have to wrap in unsafe, which seems silly. Would be nicer to put attribute on variable and have it just understood. Unsafe block is big hammer. Attribute is a refinement. Goes contrary to AB's wish of not having multiple ways of expressing this. Recognize that value and agree I am contravening it. * AB: If it were just uniform attribute on function call could have rules on where it goes. Like function call or texture sample or barrier. Should consider these designs before having 1 proposal and the merits of each. * JB: Advantage of block, i think, over attribute on calls, (It's calls that make the requirements, so attribute on call handles all cases I think) but what you'll do is make unity decorate every call with this attribute instead of just putting one block around thing and opting out. Effect would be the same. * AB: there is a range. If applied to user defined function maybe nothing applies, lots of ways to play with it. Maybe getting too far down design path. Don't want to see PR where we haven't discussed, lets make an issue and talk about it. * MM: How is this going to work when we add new features ot fragment shaders that are security sensitive. Will we run analysis twice, once in a way that honours unsafe and one that doesn't? How's that going to work? * KG: Expecting eventually to formally respect the multiple scopes of uniformity, so may run analysis twice, or just track more stuff in passes it does. * MM: There is a natural idea to have unsafe block take a param, only one thing at the moment (unsafeDerivatives) but make more in future. * JB: One thing about unsafe in rust is they aren't a free-for-all. All of rusts restrictions apply, just certain operations you're additionally permitted to do. if we had things we have to enforce for soundness the unsafe block would always check those. It does seem like that's feasible. Although it does complicate the implementation I had in mind * MM: Could see design where we run ignoring unsafe blocks and then for all errors see if they come from unsafe blocks and then turn into warnings. * AB: Don't happen to have an example of feature that introduces security risk that requires uniformity do you? * MM: Not off top of head, Think there might be something with quad operations but not certain. * TR: What about barrier case. * <<< Earth quake >>> * TR: If you just have unsafe block around barrier, and it is really not uniform and end up with situation where you hang gpu or have driver issues or sync wrong barriers, isn't that bad and we want to avoid at all costs. * MM: That is bad. * KG: Hanging gpu isn't a problem, prefer not to happen but not a requirement. * JB: Does WebGL allow hanging gpu? * KG: Yes. Javascript allows you to hang CPU and exhaust memory. * MM: Was clamin a few weeks ago that improper use of >.. < can cause to reboot computer. * KG: Would be bigger problem, but introducing TDR is not a problem. * AB: Have seen hardware where you kill X and have to go to console and reboot X. Have seen reports but haven't seen myself. * KG: Goal is sufficiently rare and exceptionally and concentrated on old hardware. If new hardware that reliably creates blue screen don't want to ship. * MM: From our perspective, any situation where it is possible to go to website and get locked out of computer, the threshold for that, acceptable % of users to hit that is 0. * KG: Unfortunately can't make the requirement be 0. But the goal is catastrophically few people. * TR: Guess can see the difficulty in marking values in uniform tag/template but that seems like safer approach in some ways. * KG: One thing, this is the chance, maybe, for anyone who want to convert compute shaders to tell us it sucks. We're working on graphics, but if people expect to convert shaders and make as unsafe we wont' be able to give them that. So, we need to known now, or we need to figure out how to do it backwards compatible. * RC: Making clarifications, when KG says hang GPU that's strong statement. Ok with me is we hang GPU and browser/os can recover. Don't lose data. On Windows we have TDRs were yes the browser hangs, but can close the browser but you can open it back up. Don't have to reboot the machine. As TR alluded too, if you say unsafe and then write code that access buffers or textures that belong to another domain running in gpu that's a case we never ever want to allow. * TR: This isn't' unsafe, just uniform analysis. * RC: Sure, but brings up the point of if you unsafe can you be trust me trust me. * TR: Just worried about code gen that wraps every access or every barrier. When they go to investigate barrier says no good and they just add unsafe. * MM: Can imagine architecture which has broadcast and all threads get one value and separate instruction which isn't broadcast but does memory access, compiler things some data is uniform but it actually isn't. Think that would be exploitable. Compiler makes wrong decision and can read data from wrong place. * JB: Nothing in spec now where uniformity changes how code is translate. It just gives error messages or rejects program. Does not carray conclusions through to code gen. * MM: In MSL we have concept of uniform annotation where programmer promises it's uniform and if it isn't it's UB. To emit that in MSL we'd have to ignore the unsafe blocks to know what is proved uniform. What the WGSL author says is uniform but we can't prove we wouldn't emit that code. * AB: Couple points, Raph has commented and has compute shader which I think would be satisfied by workgrouBroadcast. INternally discussed maybe not allowing this for compute. When have arrays of bindings, vulkan has by default dynamic uniformity requirements which lead to undefined access if you do it. Few ways around it when we get there but as we add new features more places where uniformity comes into it. * TR: If you're making values instead of doing the unsafe block you could, in an implementation, do something to more safely execute by forcing it uniform by doing a broadcast. Could get around the barrier situation by just wrapping each barrier. * KG: One thing wanted to ask about SPIR-V requirements, it sounds like one of the questions that is important is when reconvergence happens. If you can use branches/merge blocks to force reconvergence, it sounds like AB is saying don't get partial reconvergence until get total reconvergence. Is that a correct characterization and does it makes sense to ask for SPIR-V spec clarification in that direction. * AB: That is the correct understanding and we can't clarify without new functionality. * KG: Want to make it more clear in spec. If read spec and combine with how you think it works you come to conclusion that derivatives converge and if they don't we should make that explicit. I'll file a spir-v issue. * MM: Sounds like where we are is 3 things, 1. we should make the uniformity analysis more precise. Have 3-5 ways to do that. Think there is consensus that's a good thing. 2. JB will open issue where he creates >= 1 approaches for opt-out, probably at least unsafe block and unsafe attribute on variable. 3. This group is not willing to make this offline pass or extension. * KG: Think that's an accurate description of the consensus. * MM: Conclusion 4. once we design opt-out, the policy on where it applies can be done async, (does it apply to barriers or not, does it apply to derivatives or not). * KG: What do you mean be async? * MM: We can design the opt-out without deciding if it applies or does not apply to barriers. * KG: Ok. * JB: Shout out to AB for webgpu chat help navigating vulkan and spir-v specs. Want to request continued patience as we want to struggle with it. Thanks for sticking with us. More questions to come.

krogovin commented 2 years ago

Looking at WGSL 2022-10-25 Minutes it starts with trying to make the escape hatch as local as possible, and then it returns to marking blocks as unsafe-like. In contrast, just having unsafe function versions maximizes locality.

What is the motivation for marking blocks of code as unsafe-like instead of just having unsafe functions? I do not see any benefit at all from the marking of blocks of code over having unsafe function variants.

kdashg commented 2 years ago

WGSL 2022-11-08 Minutes

* KG: The other low-hanging fruit were already solved, we’re just waiting for implementation updates, right? * AB: We also discussed whether the analysis should look at constant expressions in terms of access into composite values. The analysis right now stops at any aggregate member access, but we could make it see further when the access was simple. If we have the opt-out, making the analysis more precise may not matter * KG: People can always just not use aggregates if they want things analyzed separately. Within a shader you can always just scalarize things yourself. But perhaps that could be post-v1? * AB: I don’t think that would suffice on its own - if people were going to use the opt-out, doing finer-grained aggregate analysis isn’t going to make them not use it

kdashg commented 1 year ago

@brendan-duncan How's it feeling now?

brendan-duncan commented 1 year ago

@kdashg This solved our problems with vertex and fragment shaders! Still having a problem with a compute shader (early out causes barriers to be considered non-uniform). I was going to submit that as a Tint bug initially, to determine if it's an implementation issue or a spec issue, or a me issue. I'll file a new issue here if it turns out to be a spec issue.

Thanks for the great work to find a workable compromise on this issue!

dneto0 commented 1 year ago

Thank you very much for the clear and actionable feedback. It's been super helpful!

I'll file a new issue here if it turns out to be a spec issue.

Closing this one.

jonathandw743 commented 1 year ago

You can get around it showing up as a naga error by wrapping the texture sample in a function that is called when you would normally do the texture sample.

fn tsw(t_diffuse: texture_2d<f32>, s_diffuse: sampler, uv: vec2<f32>) -> vec4<f32> {
    return textureSample(t_diffuse, s_diffuse, uv);
}

gpuweb / gpuweb

Uniformity analysis is far too strict #3479

Related #2321 which discussed an escape hatch for marking a load from a storage buffer as uniform.