gpuweb / gpuweb

Where the GPU for the Web work happens!
https://webgpu.io
Other
4.83k stars 319 forks source link

Uniform control flow vs textures for flat interpolated varyings #3668

Closed unconed closed 1 year ago

unconed commented 1 year ago

I'm trying to implement a deferred renderer. As part of this, I figured I'd batch the lights by type, for efficiency, with one instance per light.

However, as some lights have shadows and some don't, this triggers a warning about non-uniform control flow and texture sampling. The light index is @interpolate(flat) but this is still apparently considered non-uniform. Nevertheless, the shader works fine on desktop (macos). It just throws an ugly warning in the console that I can't suppress. But I suspect it's there because tiled GPUs might not work.

I'm however at a loss for how to solve this efficiently. Without push constants, the only way I can pass dynamic data to my shader is via a buffer. To avoid creating a unique buffer per draw call, I can use dynamic buffer offsets. But, buffer bindings must be aligned to 256 bytes.

This means that not only do I have to split a perfectly batchable call into one call per instance, but, in order to pass a mere uint32 to a light, so it can index from a shared light buffer, I need to allocate 256 bytes per draw call. This seems ridiculous.

At first, I thought maybe I could use firstInstanceIndex as a push constant, but this would also have to be passed from vert to frag, and thus be non-uniform. It is also an optional feature so I'm not even sure it's commonly available.

TLDR:

dneto0 commented 1 year ago

Thanks for the feedback!

At first, I thought maybe I could use firstInstanceIndex as a push constant, but this would also have to be passed from vert to frag, and thus be non-uniform. It is also an optional feature so I'm not even sure it's commonly available.

firstInstance is not an optional feature. It's an optional parameter to the 'draw' call. It defaults to zero. See https://gpuweb.github.io/gpuweb/#dom-gpurendercommandsmixin-draw

TLDR:

  • are flat interpolated varyings supposed to be non uniform?

Yes. By the letter of the Vulkan spec, they are non-uniform. Yes, this is surprising. See #3489

  • is there some other way of having push constants that I've missed?

Push constants are not in v1. See the discussion/investigation in #75

  • wtf is the point of throwing warnings about shaders that work fine?

This is the intense debate of the last few months. See #3479 #3554 #3644

  • why does texturing suck so much in WebGPU? not having bindless is one thing but not even being able to put an if around a texture fetch is retarded and makes this a sad repeat of WebGL

This is getting less actionable. Please keep within boundaries of professional conduct and congenial communications.

Pulling the signal out of the noise: it is customary to put an operation requiring an implicit derivative inside non-uniform control flow. But the spec currently says this should be banned. We are debating how to weaken that rule.

fintelia commented 1 year ago

are flat interpolated varyings supposed to be non uniform?

Yes. By the letter of the Vulkan spec, they are non-uniform. Yes, this is surprising. See https://github.com/gpuweb/gpuweb/issues/3489

That's not the whole story. Vulkan doesn't actually require total uniformity across the invocation group, it only requires that derivatives are executed within "control flow that is uniform within a derivative group". A couple quotes from the spec that explain things:

Derivative operations operate on a set of invocations known as a derivative group as defined in the SPIR-V specification. A derivative group is equivalent to the primitive scope instance for a fragment shader invocation.

Any input variables decorated with Flat are uniform within a primitive scope instance.

All dynamic instances of explicit derivative instructions (OpDPdx, OpDPdy, and OpFwidth*) must be executed in control flow that is uniform within a derivative group. For other derivative operations, results are undefined if a dynamic instance is executed in control flow that is not uniform within the derivative group.

unconed commented 1 year ago

I was mistaken about firstInstance, it's only an optional feature for indirect calls. Useful here would be to know why a particular feature couldn't just be made part of the spec, and to document or link this.

Quoting from one of the linked threads:

As it is, I'm not confident we can reasonably port our engine to WebGPU.

This is exactly what I'm getting at. There is a lot of rules lawyering but I see very little focus on actually making sure well-established rendering techniques can be implemented. Whether you dress this up nicely is less important than the recognition that webgpu 1.0 will be dead on arrival if these kinds of issues are not fixed. Maybe a rude wake up call is exactly what is needed, rather than talk about "spending innovation tokens" (also a quote). That is the sort of "professional" language that actual professionals know is meaningless.

I have a repo with months of webgpu code in it that shows how untenable all these feature cuts can become when they all intersect, because of the many workarounds that all have to be deployed in concert. Even something as simple as rendering a GLTF—a format designed for GPU consumption, on the web—doesn't work unless you write code to split and realign data at 256 byte offsets. It's a death of a thousand cuts even if you have infrastructure like a shader linker and polyfills.

In my case, the offending texture sampler is a textureSampleCompare on a depth map without MIPs, so I don't even understand why derivatives would be needed.

Deferred isn't a particularly complex rendering technique. If you imagine e.g. a forward+ renderer, then the uniformity requirement pretty much blocks that from being realized altogether unless the developer emulates a tiled/froxel based hardware renderer entirely in software, with tons of redundant draw calls. This is not a viable thing to do, nor would it be sane. Obviously forward+ renderers have been used for years now, so it is WebGPU which is wrong.

With regards to throwing warnings: the question is a) whether warnings signify shaders that will break e.g. on mobile platforms or b) is a false alarm from an overzealous strictness check. Right now it seems like the latter, in which case the ability to silence it should've been there from the get go. This should be clearly communicated.

dneto0 commented 1 year ago

are flat interpolated varyings supposed to be non uniform?

Yes. By the letter of the Vulkan spec, they are non-uniform. Yes, this is surprising. See #3489

That's not the whole story. Vulkan doesn't actually require total uniformity across the invocation group, it only requires that derivatives are executed within "control flow that is uniform within a derivative group". A couple quotes from the spec that explain things:

Derivative operations operate on a set of invocations known as a derivative group as defined in the SPIR-V specification. A derivative group is equivalent to the primitive scope instance for a fragment shader invocation.

Any input variables decorated with Flat are uniform within a primitive scope instance.

All dynamic instances of explicit derivative instructions (OpDPdx, OpDPdy, and OpFwidth*) must be executed in control flow that is uniform within a derivative group. For other derivative operations, results are undefined if a dynamic instance is executed in control flow that is not uniform within the derivative group.

I understandt that. The issue is about reconvergence, and that it doesn't happen when most programmers expect it to. Sorry it isn't explained in #3489, and I didn't explain it above.

Example:

void main() {
  if ( cond_divergent_between_different_primitives )  {
          if ( cond_divergent_within_a_primitive ) {
                ...
          }
          // Most programmers think the invocations *within* the primitive
          // guaranteed to be reconverged here.
          // The spec does not guarantee this is the case.
          vec4 v = texture(sampler2D(t,s,coords)); // not required to be in uniform control flow.
   }
   // The way Vulkan/SPIR-V reconvergence works, this is the first point at which
   // control flow is uniform.
}

In a previous discussion I gave an executable example showing possibly-surprising non-reconvergence.
https://github.com/gpuweb/gpuweb/pull/954#issuecomment-680287661

dneto0 commented 1 year ago

In my case, the offending texture sampler is a textureSampleCompare on a depth map without MIPs, so I don't even understand why derivatives would be needed.

Can you use textureSampleCompareLevel and give it level 0?

dneto0 commented 1 year ago

Maybe a rude wake up call is exactly what is needed, rather than talk about "spending innovation tokens" (also a quote). That is the sort of "professional" language that actual professionals know is meaningless.

Perhaps you missed the part where I said we're debating exactly how to relax the rule.

Rudeness and name-calling is not helping your case. You've overstepped your welcome.

kdashg commented 1 year ago

All: It's fine to be frustrated, but let's not be rude, thanks! (this includes veiled insults) Please keep in mind that we are collaborating here, in accordance with https://docs.github.com/en/site-policy/github-terms/github-community-guidelines.


I expect that we will be adding a local opt-out (allowing suppression of warnings) in https://github.com/gpuweb/gpuweb/pull/3644 or shortly after.

Unfortunately for us, we're choosing to to (or at least strongly considering being) the bearer of bad news with regards to the minimal and strict guarantees of uniformity coming up to us from our underlying APIs. We are still figuring out exactly what form this takes, and ask that you please bear with us (and give us feedback about what works and what doesn't) while we figure out what we think is the best way to surface this very tricky and surprising class of issues. As noted by @dneto0, this is an area of very active discussion right now.

fintelia commented 1 year ago

A lot of the discussion seems to be conflating the example @dneto0 provided with something like the following (which I think may more closely match what's described in the original post):

void main() {
  if (cond_divergent_between_different_primitives)  {
        // Texture sample is OK here, but current analysis rejects it.
        vec4 v = texture(sampler2D(t,s,coords));
   }
}

Right now, the uniformity analysis cannot distinguish between the two examples, so it rejects both. That's an unfortunate state to be in and will hopefully change.

Either way, it seems worthwhile for the messaging to distinguish between true positives (where the system would rightfully be the bearer of bad news) and false positives (where the code is fine, and the programmer just needs to invoke the opt-outs once they're ready).

kdashg commented 1 year ago
WGSL 2022-12-13 Minutes * KG: Think this is again where developers don’t understand the surprising lack of guarantees. * AB: Think your summary is correct. * AB: If it helps, I’ve received lots of feedback that things not being intuitive to the user should be “fixed in the spec”. Thanks but can’t do that. Reality is what must be described. * KG: Want to be able to convey that many vendors can reliably say the guarantees actually are not provided. * AB: There’s an extension available for Vulkan for subgroup uniformity, and need to query what stages it’s allowed on (and must include compute, but fragment is optional). * JB: Telemetry about how widely adopted those extensions are? * AB: Have the telemetry but can’t share it. * AB: Look at experimental CTS tests for “maximal reconvergence” in the Vulkan CTS (which is public). You can read it and play with it. I can’t share experimental data about it. I can give you a link to that file. \ [https://github.com/KhronosGroup/VK-GL-CTS/blob/main/external/vulkancts/modules/vulkan/reconvergence/vktReconvergenceTests.cpp](https://github.com/KhronosGroup/VK-GL-CTS/blob/main/external/vulkancts/modules/vulkan/reconvergence/vktReconvergenceTests.cpp) * KG: **So resolution: as much as authors want the better analysis, we can’t give a better analysis that guarantees what they want. ** * AB: And the opt-out we now have will let them proceed if they are willing to take the risk.
kdashg commented 1 year ago
WGSL 2023-01-24 Minutes * KG: Think the resolution we came to is that this is something that is reasonable.** Many users expect this, not guaranteed, it's our job to be the messenger that this doesn't work**. Think the only thing we wanted to do was, in the error message, is to distinguish between these things. If we can tell there is a uniformy error that relies on flat and would be ok if flat was valid, then we can update the error to say the thing you want isn't real. It's a nice to have UA thing but nothing to do in the spec, per say, then have the issue to point folks at. Does that sound reasonable? * DN: Sounds reasonable to me. Also, since then, have diagnostic opt-out so they can turn it off if they want, but up to the UA to do better * MM: Why is flat non-uniform? * KG: Doesn't have guarantees you want. Reconvergence in SPIRV for things that are dynamic uniform is not guaranteed to reconverge. * MM: What does that have to do with flat? * KG: Flat is not dynamically uniform. Might think all things in quad get the same value, but the uniformity model in the spec says these values may not reconverge. * AB: Reconvergence is at a different scope then flat * KG: Have to return to the root of the execution * TR: Flat is quad uniform but not subgroup uniform which is needed for the reconvergence * KG: This is the secret difference, where technically we have these two things. Reconviergence is not guaranteed to happen on a flat, at least not in the cases they think it will. Does that make sense? * MM: Not really …. But ok * KG: Yep, pretty much “because Vulkan”