As WebGPU gets closer to a shippable state, I think it's time we begin looking seriously at what the WebXR/WebGPU interface should be. For anyone that's been following the Layers module work it should be unsurprising that my proposal is to build on those mechanisms with a proposed XRGPUBinding interface that mirrors the existing XRWebGLBinding.

I don't think anything here is too controversial, but I wanted to put this up in proposals prior to requesting a repo for it to get some preliminary feedback. There are a couple of things worth pointing out:

The naming convention of XRGPU___ seems a bit weird, given that the WebGL equivalent is XRWebGL____. But WebGPU's naming convention is GPU____ rather than WebGPU___ and so I wanted to try and stick with replicating that.
WebGPU has a concept of a GPUSwapChain that is relatively similar to the pattern that we want here, so it's tempting to use that. The reason I didn't initially is because it doesn't offer a way to specify arguments like XRView or XRFrame, doesn't allow for multiple textures (whereas we want to support both a color and depth/stencil) and doesn't allow the texture dimensions or viewport to be reported (not a problem for regular WebGPU usage, because those values come from the canvas.)
Since WebGPU always supports texture arrays it's tempting to limit usage here to only that, at least for projection layers, but given that side-by-side textures are still desired for other layer types it feels a bit awkward to break the usage pattern in just that one place for no specific gain.

Below is a first pass at explainer text for the proposed module, which was relatively simple to produce given that it borrows so much from the Layers explainer.

WebXR/WebGPU binding

WebXR is well understood to be a demanding API in terms of graphics rendering performance, a task that has previously fallen entirely to WebGL. The WebGL API, while capable, is based on the relatively outdated native APIs which have recently been overtaken by more modern equivalents. As a result, it can sometimes be a struggle to implement various recommended XR rendering techniques in a performant way.

The WebGPU API is an upcoming API for utilizing the graphics and compute capabilities of a device's GPU more efficiently than WebGL allows, with an API that better matches both GPU hardware architecture and the modern native APIs that interface with them, such as Vulkan, Direct3D 12, and Metal. As it offers the potential to enable developers to get significantly better performance in their WebXR applications.

This module aims to allow the existing WebXR Layers module to interface with WebGPU by providing WebGPU swap chains for each layer type.

WebGPU binding

As with the exisitng WebGL path described in the Layers module, all WebGPU resources required by WebXR would be supplied by an XRGPUBinding instance, created with an XRSession and GPUDevice like so:

const gpuAdapter = await navigator.gpu.getAdapter({xrCompatible: true});
const gpuDevice = await gpuAdapter.requestDevice();
const xrGpuBinding = new XRGPUBinding(xrSession, gpuDevice);

Note that the GPUAdapter must be requested with the xrCompatible option set to true. This mirrors the WebGL context creation arg by the same name, and ensures that the returned adapter will be one that is compatible with the UAs selected XR Device.

Once the XRGPUBinding instance has been created, it can be used to create the various XRCompositorLayers, just like XRWebGLBinding:

const gpuAdapter = await navigator.gpu.getAdapter({xrCompatible: true});
const gpuDevice = await gpuAdapter.requestDevice();
const xrGpuBinding = new XRGPUBinding(xrSession, gpuDevice);
const projectionLayer = xrGpuBinding.createProjectionLayer('texture-array', { alpha: false });

This allocates a layer that supplies a 2d-array GPUTexture as it's output surface.

As with the base XR Layers module, XRGPUBinding is only required to support XRProjectionLayers unless the layers feature descriptor is supplied at session creation and supported by the UA/device. If the layers feature descriptor is requested and supported, however, all other XRCompositionLayer types must be supported. Layers are still set via XRSession's updateRenderState method, as usual:

const quadLayer = xrGpuBinding.createQuadLayer('texture-array', {
    space: xrReferenceSpace,
    viewPixelWidth: 1024,
    viewPixelWidth: 768,
    layout: 'stereo'
  });

xrSession.updateRenderState({ layers: [projectionLayer, quadLayer] });

Rendering

During XRFrame processing each layer can be updated with new imagery. Calling getViewSubImage() with a view from the XRFrame will return an XRGPUSubImage indicating the textures to use as the render target and what portion of the texture will be presented to the XRView's associated physical display.

WebGPU layers allocated with the 'texture' type will provide sub images with a viewport and an imageIndex of 0 for each XRView. Note that the colorTexture and depthStencilTexture can be different between the views.

// Render Loop for a projection layer with a WebGPU texture source.
const xrGpuBinding = new XRGPUBinding(xrSession, gpuDevice);
const layer = xrGpuBinding.createProjectionLayer('texture');

xrSession.updateRenderState({ layers: [layer] });
xrSession.requestAnimationFrame(onXRFrame);

function onXRFrame(time, xrFrame) {
  xrSession.requestAnimationFrame(onXRFrame);

  const commandEncoder = device.createCommandEncoder({});

  for (const view in xrViewerPose.views) {
    const subImage = xrGpuBinding.getViewSubImage(layer, view);

    // Render to the subImage's color and depth textures
    const passEncoder = commandEncoder.beginRenderPass({
        colorAttachments: [{
          attachment: subImage.colorTexture.createView(),
          loadValue: 'load',
        }],
        depthStencilAttachment: {
          attachment: subImage.depthStencilTexture.createView(),
          depthLoadValue: 'load',
          depthStoreOp: 'store',
          stencilLoadValue: 'load',
          stencilStoreOp: 'store',
        }
      });

    let viewport = subImage.viewport;
    passEncoder.setViewport(viewport.x, viewport.y, viewport.width, viewport.height, 0.0, 1.0);

    // Render from the viewpoint of xrView

    passEncoder.endPass();
  }

  device.defaultQueue.submit([commandEncoder.finish()]);
}

WebGPU layers allocated with the 'texture-array' type will provide sub images with the same viewport and a unique imageIndex indicating the texture layer to render to for each XRView. Note that the colorTexture and depthStencilTexture are the same between views, just the imageIndex is different.

// Render Loop for a projection layer with a WebGPU texture source.
const xrGpuBinding = new XRGPUBinding(xrSession, gpuDevice);
const layer = xrGpuBinding.createProjectionLayer('texture-array');

xrSession.updateRenderState({ layers: [layer] });
xrSession.requestAnimationFrame(onXRFrame);

function onXRFrame(time, xrFrame) {
  xrSession.requestAnimationFrame(onXRFrame);

  const commandEncoder = device.createCommandEncoder({});

  for (const view in xrViewerPose.views) {
    const subImage = xrGpuBinding.getViewSubImage(layer, view);

    // Render to the subImage's color and depth textures
    const passEncoder = commandEncoder.beginRenderPass({
        colorAttachments: [{
          attachment: subImage.colorTexture.createView({
            baseArrayLayer: subImage.imageIndex;
            arrayLayerCount: 1;
          }),
          loadValue: 'load',
        }],
        depthStencilAttachment: {
          attachment: subImage.depthStencilTexture.createView({
            baseArrayLayer: subImage.imageIndex;
            arrayLayerCount: 1;
          }),
          depthLoadValue: 'load',
          depthStoreOp: 'store',
          stencilLoadValue: 'load',
          stencilStoreOp: 'store',
        }
      });

    let viewport = subImage.viewport;
    passEncoder.setViewport(viewport.x, viewport.y, viewport.width, viewport.height, 0.0, 1.0);

    // Render from the viewpoint of xrView

    passEncoder.endPass();
  }

  device.defaultQueue.submit([commandEncoder.finish()]);
}

Non-projection layers, such as XRQuadLayer, may only have 1 sub image for 'mono' layers and 2 sub images for 'stereo' layers, which may not align exactly with the number of XRViews reported by the device. To avoid rendering the same view multiple times in these scenarios Non-projection layers must use the XRGPUBinding's getSubImage() method to get the XRSubImage to render to.

For mono textures the XRSubImage can be queried using just the layer and XRFrame:

// Render Loop for a projection layer with a WebGPU texture source.
const xrGpuBinding = new XRGPUBinding(xrSession, gpuDevice);
const quadLayer = xrGpuBinding.createQuadLayer('texture', {
  space: xrReferenceSpace,
  viewPixelWidth: 512,
  viewPixelWidth: 512,
  layout: 'mono'
});

// Position 2 meters away from the origin with a width and height of 1.5 meters
quadLayer.transform = new XRRigidTransform({z: -2});
quadLayer.width = 1.5;
quadLayer.height = 1.5;

xrSession.updateRenderState({ layers: [quadLayer] });
xrSession.requestAnimationFrame(onXRFrame);

function onXRFrame(time, xrFrame) {
  xrSession.requestAnimationFrame(onXRFrame);

  const commandEncoder = device.createCommandEncoder({});

  const subImage = xrGpuBinding.getSubImage(quadLayer, xrFrame);

  // Render to the subImage's color texture.
  const passEncoder = commandEncoder.beginRenderPass({
      colorAttachments: [{
        attachment: subImage.colorTexture.createView(),
        loadValue: 'load',
      }]
      // Many times simple quad layers won't require a depth attachment, as they're often just
      // displaying a pre-rendered 2D image.
    });

  // When rendering to a mono layer or a non-projection texture-array layer it's not necessary to
  // explicitly set the viewport, since they're guaranteed to always be the full texture dimensions.

  // Render the mono content.

  passEncoder.endPass();

  device.defaultQueue.submit([commandEncoder.finish()]);
}

For stereo textures the target XREye must be given to getSubImage() as well:

// Render Loop for a projection layer with a WebGPU texture source.
const xrGpuBinding = new XRGPUBinding(xrSession, gpuDevice);
const quadLayer = xrGpuBinding.createQuadLayer('texture', {
  space: xrReferenceSpace,
  viewPixelWidth: 512,
  viewPixelWidth: 512,
  layout: 'stereo'
});

// Position 2 meters away from the origin with a width and height of 1.5 meters
quadLayer.transform = new XRRigidTransform({z: -2});
quadLayer.width = 1.5;
quadLayer.height = 1.5;

xrSession.updateRenderState({ layers: [quadLayer] });
xrSession.requestAnimationFrame(onXRFrame);

function onXRFrame(time, xrFrame) {
  xrSession.requestAnimationFrame(onXRFrame);

  const commandEncoder = device.createCommandEncoder({});

  for (const eye of ['left', 'right']) {
    const subImage = xrGpuBinding.getSubImage(quadLayer, xrFrame, eye);

    // Render to the subImage's color texture.
    const passEncoder = commandEncoder.beginRenderPass({
        colorAttachments: [{
          attachment: subImage.colorTexture.createView(),
          loadValue: 'load',
        }]
        // Many times simple quad layers won't require a depth attachment, as they're often just
        // displaying a pre-rendered 2D image.
      });

    let viewport = subImage.viewport;
    passEncoder.setViewport(viewport.x, viewport.y, viewport.width, viewport.height, 0.0, 1.0);

    // Render content for the given eye.

    passEncoder.endPass();
  }

  device.defaultQueue.submit([commandEncoder.finish()]);
}

Proposed IDL

partial dictionary GPURequestAdapterOptions {
    boolean xrCompatible = false;
};

[Exposed=Window] interface XRGPUSubImage : XRSubImage {
  [SameObject] readonly attribute GPUTexture colorTexture;
  [SameObject] readonly attribute GPUTexture? depthStencilTexture;
  readonly attribute unsigned long? imageIndex;
  readonly attribute unsigned long textureWidth;
  readonly attribute unsigned long textureHeight;
};

[Exposed=Window] interface XRGPUBinding {
  constructor(XRSession session, GPUDevice device);

  readonly attribute double nativeProjectionScaleFactor;

  XRProjectionLayer createProjectionLayer(XRTextureType textureType,
                                          optional XRProjectionLayerInit init);
  XRQuadLayer createQuadLayer(XRTextureType textureType,
                              optional XRQuadLayerInit init);
  XRCylinderLayer createCylinderLayer(XRTextureType textureType,
                                      optional XRCylinderLayerInit init);
  XREquirectLayer createEquirectLayer(XRTextureType textureType,
                                      optional XREquirectLayerInit init);
  XRCubeLayer createCubeLayer(optional XRCubeLayerInit init);

  XRGPUSubImage getSubImage(XRCompositionLayer layer, XRFrame frame, optional XREye eye = "none");
  XRGPUSubImage getViewSubImage(XRProjectionLayer layer, XRView view);
};

WebGPU-wise it looks mostly good.

imageIndex matches the arrayLayer concept in WebGPU, so since XRGPUSubImage is more in the WebGPU world, maybe it would make sense to make the concept name match?

Another thing is that GPUTexture must have a known format and usage, where is it specified for the textures in XRGPUSubImage? The format will be important to create pipelines that render to the textures (WebGPU has a validation rule that the pipeline's color attachment format must match the render pass's color attachment format). And the usage is important if it needs to support more than just OUTPUT_ATTACHMENT.

What's the initial content of textures in XRSubImage? Is it fair to assume they are going to start (lazy) zeroed?

imageIndex matches the arrayLayer concept in WebGPU, ... maybe it would make sense to make the concept name match?

I'd be fine with that! Also (heh) I just realized that if we have the opportunity to make this even simpler by providing a GPUTextureViewDescriptor directly. The only thing it'll really have to specify is the baseArrayLayer and arrayLayerCount, but the fact that we can just say "This is the subresource you want" will make devs live easier while not preventing them from inspecting the values and doing their own thing if they really need to.

GPUTexture must have a known format and usage, where is it specified for the textures in XRGPUSubImage?

Oh! Good point! A given layer should have the same texture properties for its lifetime, so I'd imagine we'd want to just pass the format and usage in to the create____Layer methods. I was originally thinking that usage would have to include OUTPUT_ATTACHMENT, but then realized that in some cases it would be perfectly reasonable to only need COPY_DST, so we're probably better off just leaving the developer to give us the full usage flags. (The user agent may need it's own usage internally, like COPY_SRC, so we'd have to figure out how to handle that too.)

Along a similar line, I've realized that maybe we don't need the XRTextureType here like we do in WebGL, since the textures will always have a dimensionality of 2d, and whether or not you treat it as an array (or a cube map) happens when you create the GPUTextureView over the top of it. In fact, given that we can guarantee texture array availability with WebGPU we could maybe also ensure that only one texture is returned per frame, either a single layer with distinct viewports (largely for pre-rendered side-by-side layout content) or a layer array. That could even allow us to start using GPUSwapChains if we really wanted to:

// Render Loop for a projection layer with a WebGPU texture source.
const xrGpuBinding = new XRGPUBinding(xrSession, gpuDevice);
const layer = xrGpuBinding.createProjectionLayer({
  colorFormat: xrGpuBinding.preferredColorFormat,
  depthStencilFormat: xrGpuBinding.preferredDepthStencilFormat
}, { alpha: false });
const layerColorSwapChain = xrGpuBinding.getColorSwapChain(layer);
const layerDepthSwapChain = xrGpuBinding.getDepthSwapChain(layer);

xrSession.updateRenderState({ layers: [layer] });
xrSession.requestAnimationFrame(onXRFrame);

function onXRFrame(time, xrFrame) {
  xrSession.requestAnimationFrame(onXRFrame);

  const commandEncoder = device.createCommandEncoder({});

  const colorTexture = layerColorSwapChain.getCurrentTexture();
  const depthTexture = layerDepthSwapChain.getCurrentTexture();

  for (const view in xrViewerPose.views) {
    // Still have to do this to get the region of the texture to render to.
    const subImage = xrGpuBinding.getViewSubImage(layer, view);

    // Render to the color and depth textures
    const passEncoder = commandEncoder.beginRenderPass({
        colorAttachments: [{
          attachment: colorTexture.createView(subImage.viewDescriptor),
          loadValue: 'load',
        }],
        depthStencilAttachment: {
          attachment: subImage.depthStencilTexture.createView(subImage.viewDescriptor),
          depthLoadValue: 'load',
          depthStoreOp: 'store',
          stencilLoadValue: 'load',
          stencilStoreOp: 'store',
        }
      });

    let viewport = subImage.viewport;
    passEncoder.setViewport(viewport.x, viewport.y, viewport.width, viewport.height, 0.0, 1.0);

    // Render from the viewpoint of xrView

    passEncoder.endPass();
  }

  device.defaultQueue.submit([commandEncoder.finish()]);
}

That's a bit awkward, as far as I'm concerned, so I'd probably avoid it unless there's a compelling current or future reason to use the GPUSwapChain mechanism that I'm not aware of.

What's the initial content of textures in XRSubImage? Is it fair to assume they are going to start (lazy) zeroed?

Yeah, that would be the direction I'd like to go.

So given the above (and ignoring the potential GPUSwapChain integration for a moment), we could update the proposed IDL to be something like this:

[Exposed=Window] interface XRGPUSubImage : XRSubImage {
  [SameObject] readonly attribute GPUTexture colorTexture;
  [SameObject] readonly attribute GPUTexture? depthStencilTexture;
  readonly attribute GPUTextureViewDescriptor viewDescriptor;
  readonly attribute unsigned long textureWidth;
  readonly attribute unsigned long textureHeight;
};

dictionary XRGPULayerTextureDescriptor {
  required GPUTextureFormat colorFormat;
  GPUTextureFormat? depthStencilFormat;
  GPUTextureUsageFlags usage = 0x10; // GPUTextureUsage.OUTPUT_ATTACHMENT
};

[Exposed=Window] interface XRGPUBinding {
  constructor(XRSession session, GPUDevice device);

  readonly attribute double nativeProjectionScaleFactor;

  readonly attribute GPUTextureFormat preferredColorFormat;
  readonly attribute GPUTextureFormat preferredDepthStencilFormat;

  XRProjectionLayer createProjectionLayer(XRGPULayerTextureDescriptor descriptor,
                                          optional XRProjectionLayerInit init);
  XRQuadLayer createQuadLayer(XRGPULayerTextureDescriptor descriptor,
                              optional XRQuadLayerInit init);
  XRCylinderLayer createCylinderLayer(XRGPULayerTextureDescriptor descriptor,
                                      optional XRCylinderLayerInit init);
  XREquirectLayer createEquirectLayer(XRGPULayerTextureDescriptor descriptor,
                                      optional XREquirectLayerInit init);
  XRCubeLayer createCubeLayer(XRGPULayerTextureDescriptor descriptor,
                                      optional XRCubeLayerInit init);

  XRGPUSubImage getSubImage(XRCompositionLayer layer, XRFrame frame, optional XREye eye = "none");
  XRGPUSubImage getViewSubImage(XRProjectionLayer layer, XRView view);
};

Another thing is that GPUTexture must have a known format and usage, where is it specified for the textures in XRGPUSubImage? The format will be important to create pipelines that render to the textures (WebGPU has a validation rule that the pipeline's color attachment format must match the render pass's color attachment format).

Is it necessary to allow the author to create any type of format for the swapchain? If we allow this, should there also be a feature to query which formats are supported?

const gpuAdapter = await navigator.gpu.getAdapter({xrCompatible: true}); const gpuDevice = await gpuAdapter.requestDevice(); const xrGpuBinding = new XRGPUBinding(xrSession, gpuDevice);

Could this all be collapsed into a single call?

I think this proposal looks very reasonable! If it's accepted, should we merge it into current layers spec?

In the latest proposal, the viewDescriptor seems to be for both the color and the depth. Are there cases where the color and the depth would have different descriptors?

The XRGPULayerTextureDescriptor is nice, but are there any constraints on the format and usages that can be used with the platform APIs? If the platform APIs are very strict, maybe there could be a preferred format (and usage?) exposed a bit like GPUCanvasContext.getPreferredFormat. (or the XRGPULayer could tell the application which format it wants it to use).

Is it necessary to allow the author to create any type of format for the swapchain? If we allow this, should there also be a feature to query which formats are supported?

I do have an attribute to get the preferred format, but if we allow developers to specify any format we'll probably need an xrEnumerateSwapchainFormats equivalent.

Could this all be collapsed into a single call?

Not clear on how, or why that would be desirable. (Please note the exact WebGPU initialization sequence is still undergoing some discussion.)

If it's accepted, should we merge it into current layers spec?

Given that WebGPU still isn't shipped I'd be hesitant to make it a dependency of the base layers API. I think they can stay separate for now.

In the latest proposal, the viewDescriptor seems to be for both the color and the depth. Are there cases where the color and the depth would have different descriptors?

No, that shouldn't occur in this context. Given that if you are requesting a depth texture this way then it's allowed to be used in the compositing you'll never have anything but a 1:1 relationship between color and depth sub resources.

The XRGPULayerTextureDescriptor is nice, but are there any constraints on the format and usages that can be used with the platform APIs?

There are some limits, as Rik mentioned. (I should have researched a bit more before updating my proposal.) I do have preferredColor/DepthStencilFormat attributes, but it seems like we'll need a bit more than that in the end. Probably a way to enumerate the supported formats ordered by preference. We could always just let the UA pick the format, the way we do with WebGL, but I think we want to embrace the increased flexibility of WebGPU where we can.

Final thought: Just realized that currently the layer init indicates things like whether or not you want alpha or depth buffers, but in this environment that would be implicit in the formats you provide, so we'll want to re-structure that.

I do have an attribute to get the preferred format, but if we allow developers to specify any format we'll probably need an xrEnumerateSwapchainFormats equivalent.

Ah yes, I missed it. The idea for the GPUSwapchain is that there will be a small list of allowed formats in the specification in addition to the preferred format, but they might cause an extra conversion copy. (currently it's only bgra8-unorm).

Probably a way to enumerate the supported formats ordered by preference.

Or just preferred + a fixe allow-list in the spec. Maybe the usage could allow just OUTPUT_ATTACHMENT to start, and see if we need to add COPY_DST later (we'll need to look whether platform APIs allow it).

I've taken the feedback from this thread so far and updated the explainer text I posted above, which I've now pushed to https://github.com/toji/webxr-webgpu-binding/blob/main/explainer.md for the purposes of previewing and discussion.

/agenda to ask about creating an official Immersive Web repo for the feature. Discussion here seems positive and I doubt anyone is against seeing this integration happen at some point.

Why not skip the GPUTexture completely, like:

  [SameObject] readonly attribute GPUTextureView colorView;
  [SameObject] readonly attribute GPUTextureView? depthStencilView;

This way you don't need a descriptor, and you don't need that awkward view creation on every frame.

This would completely prevent using the textures as COPY_DST, maybe that's fine given it is a rare usecase and giving the views directly would be a good usability improvement.

Why not skip the GPUTexture completely, like:

  [SameObject] readonly attribute GPUTextureView colorView;
  [SameObject] readonly attribute GPUTextureView? depthStencilView;

Would that work with multiview?

COPY_DST usage seems like it would be desirable for a lot of non-projection layer types, which will frequently be populated directly from an ImageBitmap or similar. (In fact, we may even want to make COPY_DST part of the default usage for those layer types... hm.)

I would expect in XR/VR to see all the work happening inside a single render pass (or one pass per eye, at least). Anything that you'd need to copy to screen would be drawn as quads, so that render pass is not disrupted, and mobile GPUs can do their tiling efficiently.

That's true of anything rendered into what we call "projection layers", which is what's used to render your typical immersive content. Where COPY_DST usage comes in is Quad/Cylinder/Equirect/Cube layers, which are frequently updated just once or very infrequently and positioning is handled by the XR compositor after that. For example: loading an Equirect as a skybox. It'll be a pretty natural code path to upload that directly from an Image tag/ImageBitmap into the layer texture and then never touch it again.

this is moving to https://github.com/immersive-web/WebXR-WebGPU-Binding/ repository? (housekeeping)

immersive-web / proposals