gpuweb / gpuweb

Where the GPU for the Web work happens!
https://webgpu.io
Other
4.86k stars 319 forks source link

Investigate possible solution for cheaper opaque canvases #1988

Open kainino0x opened 3 years ago

kainino0x commented 3 years ago

Tabling #1871 while we get some implementation experience. Here are some options that we should try to get experience with:

Options:

  1. Don't do anything (just keep the compositingAlphaMode: "opaque" we already have, which requires an alpha-clear and can't do so inside an existing render pass).
  2. storeOp: "present" which takes away access immediately at the end of the render pass, and can either clear to 1 at the end of the pass (with a quad) or clear later. https://github.com/gpuweb/gpuweb/issues/1425#issuecomment-809830117
    • storeOp: "present" which ALWAYS clears to 1 at the end of the pass (regardless of platform), avoiding the extra tracking within command buffers to take away access.
  3. An RGBX texture format that can be used for canvases, that must be used with loadop clear, and either (a) overrides (or validates) the alpha write mask to 0, or (b) overrides (or validates) the blend mode to always write 1.
    • Can have an alternate implementation that modifies the shader instead of the alpha write mask, for hardware that has performance issues with alpha mask (WebGL has seen issues with alpha write mask performance on Intel GPUs).
  4. Allow unportable compositing results, but "good enough for security" (e.g. insert a blank extra compositor layer behind the canvas).

See: #1871, #1425

kainino0x commented 3 years ago

Something to think about while investigating this: We may be able to do something analogous for compositingAlphaMode: "premultiplied", which already has unportable (but secure) compositing results, like option 4,

kvark commented 3 years ago

Thanks for filing this!

  1. I think it's acceptable for MVP, it's not the worst place to be in.
  2. This would really help to bring the overhead to near-zero where it matters (mobile/tiling GPUs). Bonus points if this is mandatory.
  3. Not feeling positive about this one for multiple reasons:
    • Setting color mask on each pipeline has significant cost, as Ken showed previously from WebGL land
    • I can't use the same shader for rendering into the canvas and non-canvas any more (since non-canvas isn't going to use RGBX).
    • Injecting shadering code is a bit invasive, also prevents generation of the shader code ahead of time.
    • We have this "preferred format" semantics precisely because we don't know what format we'll actually want users to pick. And RGBX is only making it more complicated.
    • Alpha channel is generally useful to have (citation needed)
  4. No opinion on this one. What's the perf cost of doing an extra layer?
Kangz commented 3 years ago

Setting color mask on each pipeline has significant cost, as Ken showed previously from WebGL land

@kainino0x hinted at it in the meeting, but I think if we clear with alpha at the beginning of the pass, and force some blend modes, then we can avoid setting writeMask. (alpha blending would always keep the one in the render target)

I can't use the same shader for rendering into the canvas and non-canvas any more (since non-canvas isn't going to use RGBX).

Do you mean the same pipeline? This seems like a tiny concern.

I can't use the same shader for rendering into the canvas and non-canvas any more (since non-canvas isn't going to use RGBX).

No shader code injection needed.

We have this "preferred format" semantics precisely because we don't know what format we'll actually want users to pick. And RGBX is only making it more complicated.

It just gives the format to the application, I'm not sure how it makes it more complicated.

IMHO 3) seems like the lowest overhead solution in part because we're doing the optimization in collaboration with the app, so only some pipelines need to be modified. It is more complicated to spec and does require the app to be aware that RGBX is a thing though.

But +1 that for MVP 1) is probably enough and we have time to see for the rest.

kainino0x commented 3 years ago

It just gives the format to the application, I'm not sure how it makes it more complicated.

Because now the preferred format is either rgbx8unorm/bgrx8unorm or rgba8unorm/bgra8unorm depending on which one the user wants. However I don't think this really makes things more complicated because we are going to have the same problem for rgba8unorm-srgb/bgra8unorm-srgb.

kainino0x commented 3 years ago

Tentatively moving this to post-MVP. I think we are most interested in option 2 (storeOp: "present") but it doesn't have to be done right now.

kainino0x commented 2 years ago

More thoughts while exploring various canvas issues:

  • 1. Don't do anything (just keep the compositingAlphaMode: "opaque" we already have, which requires an alpha-clear and can't do so inside an existing render pass).

We might be able to detect when the alpha channel is definitely 1, by construction of the most recently executed render pass (clear value + blend mode or write mask), and elide the clear.

Or we could make it explicit and put a new flag on the GPURenderPassColorAttachment that says "alphaReadOnly" and validate the pipelines used with it, very similar to depthReadOnly/stencilReadOnly.

  • 2. storeOp: "present" which takes away access immediately at the end of the render pass, and can either clear to 1 at the end of the pass (with a quad) or clear later. Can WebGPU canvas alpha be configured? #1425 (comment)
    • storeOp: "present" which ALWAYS clears to 1 at the end of the pass (regardless of platform), avoiding the extra tracking within command buffers to take away access.

Just a note: if we take away access to the texture, we also need to take it away from the "canvas's bitmap" (used when the canvas is used as an image source - in drawImage/texImage2D/toDataURL/etc).

If not, and this is implemented as a clear on all platforms, then it's fine, the current texture and the canvas's bitmap will just be alpha-cleared.

storeOp: "present" which ALWAYS clears to 1 at the end of the pass (regardless of platform)

This would play into the detection idea above - this would be an alternate way to trigger that fast-path without limiting yourself to specific clear values and blend/mask states. (A probably unhelpful idea is: this could also be a "clearAlpha(attachmentIndex)" command in the render pass, rather than a new storeOp.)

That said, it does still have a cost on platforms that don't need alpha cleared to 1, so I don't think this is preferable.