Closed unconed closed 1 year ago
Thanks for the feedback!
At first, I thought maybe I could use firstInstanceIndex as a push constant, but this would also have to be passed from vert to frag, and thus be non-uniform. It is also an optional feature so I'm not even sure it's commonly available.
firstInstance is not an optional feature. It's an optional parameter to the 'draw' call. It defaults to zero. See https://gpuweb.github.io/gpuweb/#dom-gpurendercommandsmixin-draw
TLDR:
- are flat interpolated varyings supposed to be non uniform?
Yes. By the letter of the Vulkan spec, they are non-uniform. Yes, this is surprising. See #3489
- is there some other way of having push constants that I've missed?
Push constants are not in v1. See the discussion/investigation in #75
- wtf is the point of throwing warnings about shaders that work fine?
This is the intense debate of the last few months. See #3479 #3554 #3644
- why does texturing suck so much in WebGPU? not having bindless is one thing but not even being able to put an if around a texture fetch is retarded and makes this a sad repeat of WebGL
This is getting less actionable. Please keep within boundaries of professional conduct and congenial communications.
Pulling the signal out of the noise: it is customary to put an operation requiring an implicit derivative inside non-uniform control flow. But the spec currently says this should be banned. We are debating how to weaken that rule.
are flat interpolated varyings supposed to be non uniform?
Yes. By the letter of the Vulkan spec, they are non-uniform. Yes, this is surprising. See https://github.com/gpuweb/gpuweb/issues/3489
That's not the whole story. Vulkan doesn't actually require total uniformity across the invocation group, it only requires that derivatives are executed within "control flow that is uniform within a derivative group". A couple quotes from the spec that explain things:
Derivative operations operate on a set of invocations known as a derivative group as defined in the SPIR-V specification. A derivative group is equivalent to the primitive scope instance for a fragment shader invocation.
Any input variables decorated with Flat are uniform within a primitive scope instance.
All dynamic instances of explicit derivative instructions (OpDPdx, OpDPdy, and OpFwidth*) must be executed in control flow that is uniform within a derivative group. For other derivative operations, results are undefined if a dynamic instance is executed in control flow that is not uniform within the derivative group.
I was mistaken about firstInstance, it's only an optional feature for indirect calls. Useful here would be to know why a particular feature couldn't just be made part of the spec, and to document or link this.
Quoting from one of the linked threads:
As it is, I'm not confident we can reasonably port our engine to WebGPU.
This is exactly what I'm getting at. There is a lot of rules lawyering but I see very little focus on actually making sure well-established rendering techniques can be implemented. Whether you dress this up nicely is less important than the recognition that webgpu 1.0 will be dead on arrival if these kinds of issues are not fixed. Maybe a rude wake up call is exactly what is needed, rather than talk about "spending innovation tokens" (also a quote). That is the sort of "professional" language that actual professionals know is meaningless.
I have a repo with months of webgpu code in it that shows how untenable all these feature cuts can become when they all intersect, because of the many workarounds that all have to be deployed in concert. Even something as simple as rendering a GLTF—a format designed for GPU consumption, on the web—doesn't work unless you write code to split and realign data at 256 byte offsets. It's a death of a thousand cuts even if you have infrastructure like a shader linker and polyfills.
In my case, the offending texture sampler is a textureSampleCompare on a depth map without MIPs, so I don't even understand why derivatives would be needed.
Deferred isn't a particularly complex rendering technique. If you imagine e.g. a forward+ renderer, then the uniformity requirement pretty much blocks that from being realized altogether unless the developer emulates a tiled/froxel based hardware renderer entirely in software, with tons of redundant draw calls. This is not a viable thing to do, nor would it be sane. Obviously forward+ renderers have been used for years now, so it is WebGPU which is wrong.
With regards to throwing warnings: the question is a) whether warnings signify shaders that will break e.g. on mobile platforms or b) is a false alarm from an overzealous strictness check. Right now it seems like the latter, in which case the ability to silence it should've been there from the get go. This should be clearly communicated.
are flat interpolated varyings supposed to be non uniform?
Yes. By the letter of the Vulkan spec, they are non-uniform. Yes, this is surprising. See #3489
That's not the whole story. Vulkan doesn't actually require total uniformity across the invocation group, it only requires that derivatives are executed within "control flow that is uniform within a derivative group". A couple quotes from the spec that explain things:
Derivative operations operate on a set of invocations known as a derivative group as defined in the SPIR-V specification. A derivative group is equivalent to the primitive scope instance for a fragment shader invocation.
Any input variables decorated with Flat are uniform within a primitive scope instance.
All dynamic instances of explicit derivative instructions (OpDPdx, OpDPdy, and OpFwidth*) must be executed in control flow that is uniform within a derivative group. For other derivative operations, results are undefined if a dynamic instance is executed in control flow that is not uniform within the derivative group.
I understandt that. The issue is about reconvergence, and that it doesn't happen when most programmers expect it to. Sorry it isn't explained in #3489, and I didn't explain it above.
Example:
void main() {
if ( cond_divergent_between_different_primitives ) {
if ( cond_divergent_within_a_primitive ) {
...
}
// Most programmers think the invocations *within* the primitive
// guaranteed to be reconverged here.
// The spec does not guarantee this is the case.
vec4 v = texture(sampler2D(t,s,coords)); // not required to be in uniform control flow.
}
// The way Vulkan/SPIR-V reconvergence works, this is the first point at which
// control flow is uniform.
}
In a previous discussion I gave an executable example showing possibly-surprising non-reconvergence.
https://github.com/gpuweb/gpuweb/pull/954#issuecomment-680287661
In my case, the offending texture sampler is a textureSampleCompare on a depth map without MIPs, so I don't even understand why derivatives would be needed.
Can you use textureSampleCompareLevel and give it level 0?
Maybe a rude wake up call is exactly what is needed, rather than talk about "spending innovation tokens" (also a quote). That is the sort of "professional" language that actual professionals know is meaningless.
Perhaps you missed the part where I said we're debating exactly how to relax the rule.
Rudeness and name-calling is not helping your case. You've overstepped your welcome.
All: It's fine to be frustrated, but let's not be rude, thanks! (this includes veiled insults) Please keep in mind that we are collaborating here, in accordance with https://docs.github.com/en/site-policy/github-terms/github-community-guidelines.
I expect that we will be adding a local opt-out (allowing suppression of warnings) in https://github.com/gpuweb/gpuweb/pull/3644 or shortly after.
Unfortunately for us, we're choosing to to (or at least strongly considering being) the bearer of bad news with regards to the minimal and strict guarantees of uniformity coming up to us from our underlying APIs. We are still figuring out exactly what form this takes, and ask that you please bear with us (and give us feedback about what works and what doesn't) while we figure out what we think is the best way to surface this very tricky and surprising class of issues. As noted by @dneto0, this is an area of very active discussion right now.
A lot of the discussion seems to be conflating the example @dneto0 provided with something like the following (which I think may more closely match what's described in the original post):
void main() {
if (cond_divergent_between_different_primitives) {
// Texture sample is OK here, but current analysis rejects it.
vec4 v = texture(sampler2D(t,s,coords));
}
}
Right now, the uniformity analysis cannot distinguish between the two examples, so it rejects both. That's an unfortunate state to be in and will hopefully change.
Either way, it seems worthwhile for the messaging to distinguish between true positives (where the system would rightfully be the bearer of bad news) and false positives (where the code is fine, and the programmer just needs to invoke the opt-outs once they're ready).
I'm trying to implement a deferred renderer. As part of this, I figured I'd batch the lights by type, for efficiency, with one instance per light.
However, as some lights have shadows and some don't, this triggers a warning about non-uniform control flow and texture sampling. The light index is @interpolate(flat) but this is still apparently considered non-uniform. Nevertheless, the shader works fine on desktop (macos). It just throws an ugly warning in the console that I can't suppress. But I suspect it's there because tiled GPUs might not work.
I'm however at a loss for how to solve this efficiently. Without push constants, the only way I can pass dynamic data to my shader is via a buffer. To avoid creating a unique buffer per draw call, I can use dynamic buffer offsets. But, buffer bindings must be aligned to 256 bytes.
This means that not only do I have to split a perfectly batchable call into one call per instance, but, in order to pass a mere uint32 to a light, so it can index from a shared light buffer, I need to allocate 256 bytes per draw call. This seems ridiculous.
At first, I thought maybe I could use firstInstanceIndex as a push constant, but this would also have to be passed from vert to frag, and thus be non-uniform. It is also an optional feature so I'm not even sure it's commonly available.
TLDR: