cart commented 4 years ago

Support Strict 32 Bit Alignment Platforms

Pathfinder currently uses 1-byte and 2-byte data types in many of its shaders, but some graphics apis (such as WebGPU) do not support those types. In preparation for a WebGPU backend, I have added a shader_alignment_32_bits feature flag. When enabled, it will treat both 1-byte and 2-byte datatypes as 4-byte datatypes.

I added the AlignedU8, AlignedU16, AlignedI8, and AlignedI16 type aliases, which will point to either their real type (ex: u8) or their aligned type (ex: u32) based on the feature flags. I also added their corresponding attribute types (ALIGNED_U8_ATTR), which resolve to the correct VertexAttrType.

This change means that the strides for some shader layouts change based on the feature flag. To make this more straightforward, I added a VertexBufferDescriptor type, which automatically calculates the offsets and strides of its VertexAttrDescriptors. It also sets their index and divisor, which will always be the same within a vertex buffer. This change significantly reduces the boilerplate required for vertex attributes and also eliminates a number of classes of errors:

let fill_attrs= &[
    device.get_vertex_attr(&fill_program.program, "FromSubpx").unwrap(),
    device.get_vertex_attr(&fill_program.program, "ToSubpx").unwrap(),
    device.get_vertex_attr(&fill_program.program, "FromPx").unwrap(),
    device.get_vertex_attr(&fill_program.program, "ToPx").unwrap(),
    device.get_vertex_attr(&fill_program.program, "TileIndex").unwrap(),
];

static FILL_BUFFER: Lazy<VertexBufferDescriptor> = Lazy::new(|| {
    let mut descriptor = VertexBufferDescriptor{
        index: 1,
        divisor: 1,
        vertex_attrs: vec![
            VertexAttrDescriptor::datatype_only(VertexAttrClass::FloatNorm, VertexAttrType::U8, 2),
            VertexAttrDescriptor::datatype_only(VertexAttrClass::FloatNorm, VertexAttrType::U8, 2),
            VertexAttrDescriptor::datatype_only(VertexAttrClass::Int, VertexAttrType::U8, 1),
            VertexAttrDescriptor::datatype_only(VertexAttrClass::Int, VertexAttrType::U8, 1),
            VertexAttrDescriptor::datatype_only(VertexAttrClass::Int, VertexAttrType::U16, 1),
        ]
    };
    descriptor.update_attrs();
    descriptor
});

device.bind_buffer(&vertex_array, &vertex_buffer, BufferTarget::Vertex);
FILL_BUFFER.configure_vertex_attrs(device, &vertex_array, fill_attrs);

The only place I didn't apply this change to was StencilVertexArray, which has a stride that does not line up with its datatype. @pcwalton is this intentional or is it a bug?

device.configure_vertex_attr(&vertex_array, &position_attr, &VertexAttrDescriptor {
    size: 3,
    class: VertexAttrClass::Float,
    attr_type: VertexAttrType::F32,
    // this would normally be 4 * 3
    stride: 4 * 4,
    offset: 0,
    divisor: 0,
    buffer_index: 0,
});

To decrease the odds of breaking things and to retain clarity, this PR doesn't try to optimize the layouts for 32 bit alignment, however there is plenty of potential in that area.

Without the `shader_alignment_32_bits` flag everything works as expected:

normal

Unfortunately, I have a few artifacts when I enable the `shader_alignment_32_bits` flag:

32bitaligned

I am investigating this now, but if anyone has ideas let me know.

@pcwalton because hes the expert here :) @kvark because he is probably invested in wgpu compatibility @tangmi because he started work on a wgpu backend and i'm assuming he will eventually be blocked by this

cart commented 4 years ago

More context here: #306

kvark commented 4 years ago

WebGPU supports Char2 and UChar2. Why do you want to align those to 32 bits? My recommendation is - don't have a separate code path, just change pathfinder to use those 2-byte formats where the original used 1-byte. Last time I checked, pathfinder does most of the work in fragment shaders, so I don't think this 1 more byte per vertex is going to change any benchmarks.

cart commented 4 years ago

Those are vector types right? You're right to say that "32 bit alignment" isnt required given that 16 bit types exist. But wouldn't that complicate the code more given that we'd be dealing with vec2s when the code assumes a single int/uint?

tangmi commented 4 years ago

My plan for the wgpu backend (untested!) was to use Char2 and UChar2 where pathfinder wants the scalar variants (same with Char3, etc) and just let the let the extra components of one value overlap the next value (hopefully none of the wgpu/gfx-hal implementations with check and complain!). If this doesn't work, @kvark's proposal of splatting (or zeroing) the pathfinder buffers to a supported WebGPU format would work fine, I think.

I'm also hoping to take advantage of some of the jank of shader inputs binding... in my experience the cpu vertex attribute layout and shader declarations don't need to agree. I think this quickly goes into undefined behavior territory, but should (??) be safe if the shader declares fewer components than the cpu vertex attribute. @kvark do you have thoughts/ideas on this? My goal here is to avoid needing to modify the shaders for WebGPU if I can.

kvark commented 4 years ago

@cart

But wouldn't that complicate the code more given that we'd be dealing with vec2s when the code assumes a single int/uint?

My proposal is to make this code assume vec2/uvec2. It's not a big complication.

@tangmi

but should (??) be safe if the shader declares fewer components than the cpu vertex attribute

We haven't discussed this particular aspect of the validation with WebGPU group. And when we will, such an idea would be hard to sell: if the user specifies vec4 in the vertex attribute format, they might as well specify it in the shader input, just not use all the components of it. Validating these two sides to match is a simpler spec, simpler implementation, and no obvious downsides to the users.

So, in the end, we may end up allowing that. I don't think you should be betting on that though. As I said, PathFinder shouldn't be bottle-necking on vertex fetches (correct me if that's wrong!), so that extra byte isn't going to harm anybody.

cart commented 4 years ago

I think all of the options discussed are viable, but they all introduce weirdness somewhere:

this pr:
- pros: original code path / shaders largely untouched. cpu and gpu datatypes in sync
- cons: two code paths and therefore easier to regress and harder to maintain. type aliases that change size based on a cargo feature reduce clarity. 8 bit datatypes become 32 bit datatypes, which will negatively affect performance (notably more than the vec2 types mentioned below would).
replace all cases of int/uint with vec2/uvec2 in shaders and rust datatypes
- pros: single code path for all backends. cpu and gpu datatypes in sync
- cons: slight loss of clarity as "vectorness" is not relevant to the functionality. probably a tiny performance loss.
define overlapping vec types in the wgpu backend's shader layouts whenever int/unit is encountered
- pros: doesn't touch existing code paths / scoped to wgpu backend.
- cons: maybe unsupported by wgpu in the future? adds jank to the wgpu backend.
leave rust datatypes intact, but change shaders (and their descriptors) to expect vec2/uvec2
- pros: rust side stays mostly the same
- cons: loss of clarity in shaders. probably a tiny performance loss. cpu/gpu datatypes out of sync. complicates buffer creation because we can't just do Vec::<OriginalRustDataType>::get_bytes(self). We now need to add padding.
Does that seem correct (and fair)?

I'm happy to make any of those options work. As my ultimate goal here is to get a solution to this problem merged so I don't need to fork pathfinder, the opinion that matters the most to me is pcwalton's. (I'm not @-ing him anymore because i've done that enough already).

edit: forgot to mention the relatively "big" gpu data types this pr uses as a con

pcwalton commented 4 years ago

Yeah, I was thinking that to work around this limitation we would just pack all the vertex attributes into three uints (each instance is 12 bytes long, right?) and unpack them manually in the shader. It's not just WebGPU; Metal has limitations on vertex attributes too (though not as severe) and I'm tired of working around them :)

I would try to avoid making the tiles longer; the CPU side is very memory bound and every byte I could shave off improved performance. Though fills are more important to optimize than tiles.

cart commented 4 years ago

That makes sense to me!

Some good-ish news: I have a semi-working backend for my engine/render graph built on top of wgpu:

The bad news is that the image above is supposed to be a house :smile:

Note: This is using the "vulkan-style" shaders from the other pr, but I'm not using the "32bit alignment" code from this pr. Instead I just added some offsets / vec2s like kvark suggested.

I'm honestly at a bit of a loss. The offset changes / vulkan-style shaders work fine with the GL backend, so this is almost certainly an issue with my backend.

servo / pathfinder