Open cart opened 4 years ago
More context here: #306
WebGPU supports Char2
and UChar2
. Why do you want to align those to 32 bits?
My recommendation is - don't have a separate code path, just change pathfinder to use those 2-byte formats where the original used 1-byte. Last time I checked, pathfinder does most of the work in fragment shaders, so I don't think this 1 more byte per vertex is going to change any benchmarks.
Those are vector types right? You're right to say that "32 bit alignment" isnt required given that 16 bit types exist. But wouldn't that complicate the code more given that we'd be dealing with vec2s when the code assumes a single int/uint?
My plan for the wgpu backend (untested!) was to use Char2 and UChar2 where pathfinder wants the scalar variants (same with Char3, etc) and just let the let the extra components of one value overlap the next value (hopefully none of the wgpu/gfx-hal implementations with check and complain!). If this doesn't work, @kvark's proposal of splatting (or zeroing) the pathfinder buffers to a supported WebGPU format would work fine, I think.
I'm also hoping to take advantage of some of the jank of shader inputs binding... in my experience the cpu vertex attribute layout and shader declarations don't need to agree. I think this quickly goes into undefined behavior territory, but should (??) be safe if the shader declares fewer components than the cpu vertex attribute. @kvark do you have thoughts/ideas on this? My goal here is to avoid needing to modify the shaders for WebGPU if I can.
@cart
But wouldn't that complicate the code more given that we'd be dealing with vec2s when the code assumes a single int/uint?
My proposal is to make this code assume vec2
/uvec2
. It's not a big complication.
@tangmi
but should (??) be safe if the shader declares fewer components than the cpu vertex attribute
We haven't discussed this particular aspect of the validation with WebGPU group. And when we will, such an idea would be hard to sell: if the user specifies vec4
in the vertex attribute format, they might as well specify it in the shader input, just not use all the components of it. Validating these two sides to match is a simpler spec, simpler implementation, and no obvious downsides to the users.
So, in the end, we may end up allowing that. I don't think you should be betting on that though. As I said, PathFinder shouldn't be bottle-necking on vertex fetches (correct me if that's wrong!), so that extra byte isn't going to harm anybody.
I think all of the options discussed are viable, but they all introduce weirdness somewhere:
leave rust datatypes intact, but change shaders (and their descriptors) to expect vec2/uvec2
Vec::<OriginalRustDataType>::get_bytes(self)
. We now need to add padding.Does that seem correct (and fair)?
I'm happy to make any of those options work. As my ultimate goal here is to get a solution to this problem merged so I don't need to fork pathfinder, the opinion that matters the most to me is pcwalton's. (I'm not @-ing him anymore because i've done that enough already).
edit: forgot to mention the relatively "big" gpu data types this pr uses as a con
Yeah, I was thinking that to work around this limitation we would just pack all the vertex attributes into three uint
s (each instance is 12 bytes long, right?) and unpack them manually in the shader. It's not just WebGPU; Metal has limitations on vertex attributes too (though not as severe) and I'm tired of working around them :)
I would try to avoid making the tiles longer; the CPU side is very memory bound and every byte I could shave off improved performance. Though fills are more important to optimize than tiles.
That makes sense to me!
Some good-ish news: I have a semi-working backend for my engine/render graph built on top of wgpu:
The bad news is that the image above is supposed to be a house :smile:
Note: This is using the "vulkan-style" shaders from the other pr, but I'm not using the "32bit alignment" code from this pr. Instead I just added some offsets / vec2s like kvark suggested.
I'm honestly at a bit of a loss. The offset changes / vulkan-style shaders work fine with the GL backend, so this is almost certainly an issue with my backend.
Support Strict 32 Bit Alignment Platforms
Pathfinder currently uses 1-byte and 2-byte data types in many of its shaders, but some graphics apis (such as WebGPU) do not support those types. In preparation for a WebGPU backend, I have added a
shader_alignment_32_bits
feature flag. When enabled, it will treat both 1-byte and 2-byte datatypes as 4-byte datatypes.I added the
AlignedU8
,AlignedU16
,AlignedI8
, andAlignedI16
type aliases, which will point to either their real type (ex:u8
) or their aligned type (ex:u32
) based on the feature flags. I also added their corresponding attribute types (ALIGNED_U8_ATTR
), which resolve to the correctVertexAttrType
.This change means that the strides for some shader layouts change based on the feature flag. To make this more straightforward, I added a
VertexBufferDescriptor
type, which automatically calculates the offsets and strides of its VertexAttrDescriptors. It also sets their index and divisor, which will always be the same within a vertex buffer. This change significantly reduces the boilerplate required for vertex attributes and also eliminates a number of classes of errors:The only place I didn't apply this change to was
StencilVertexArray
, which has a stride that does not line up with its datatype. @pcwalton is this intentional or is it a bug?To decrease the odds of breaking things and to retain clarity, this PR doesn't try to optimize the layouts for 32 bit alignment, however there is plenty of potential in that area.
Without the
shader_alignment_32_bits
flag everything works as expected:Unfortunately, I have a few artifacts when I enable the
shader_alignment_32_bits
flag:I am investigating this now, but if anyone has ideas let me know.
@pcwalton because hes the expert here :) @kvark because he is probably invested in wgpu compatibility @tangmi because he started work on a wgpu backend and i'm assuming he will eventually be blocked by this