Interoperability between WebGL and WebGPU within WebXR?

mwyrzykowski commented 3 weeks ago

Hi, currently the explainer does not seem to prevent using WebGL one frame and WebGPU the next. However, this is problematic because WebGL's coordinate system is inverted and I'm not sure that all backends which implement WebGL (via OpenGL) can share textures to WebGPU, if they are implemented for instance with Vulkan. It is additionally problematic because the shader modules might already be compiled with WebGPU prior to entering the immersive session and now they have to be potentially, partially recompiled to handle the inverted coordinate system

I would like to know if the backend should be specified in the XRSession prior to entering the immersive session? E.g., something like:

partial interface WebXRSession {
    Promise<undefined> useOnlyWithWebGPU();
}

where an exception is thrown if WebGPU is not supported, otherwise, the XRSession uses WebGPU's coordinate system. We could alternatively name it setPreferredBackend(name) where name is one of "webgl", "webgpu" or any number of similar methods.

And similarly, creating an XRGPUBinding from an XRSession which does not call this method would be an error.

If we do not have something like this, then the WGSL shaders which pass vertical positions will be inverted in WebXR's default coordinate system which matches WebGL. I haven't fully considered if an implementation can invert that without significant performance cost or runtime shader recompilation.

cc @toji as the author of the document.

During our effort of implementing the explainer, this is the one issue I came across. I only tried it with a very simple head tracked triangle test page, but seems great 👍. Initially the triangle was upside down, because I used the same shader from my hello triangle WebGPU page without accounting for the y-axis inversion.

toji commented 3 weeks ago

This is a good point, and something that I think we've only touched on briefly in past discussions IIRC. +@cabanier, as I would love to get his thoughts on this too!

From the point of view of OpenXR, which pretty much all of the non-Apple devices will be using, there is a (very reasonable) requirement that only a single graphics API be used at a time. See the "Session Creation" section of the spec.

During XrSession creation the application must provide information about which graphics API it intends to use by adding an XrGraphicsBinding* struct of one (and only one) of the enabled graphics API extensions to the next chain of XrSessionCreateInfo.

(Emphasis mine)

In an ideal world it would be nice to be able to intermix WebGL and WebGPU layers for the sake of library composability. ie: The main scene is rendered with WebGPU but there's a window displayed as an XRQuadLayer that is rendered with WebGL because it's coming from an older GUI library. It's possible that some browsers might be able to facilitate that since they may be translating everything to the same underlying native API, but I don't think that's an implementation detail that we want to require.

Because of this, at minimum we would need to validate that all the layers passed to the session are from the same API. As mentioned, though, that would allow developers to still thrash between APIs from frame to frame, which sounds like a terrible idea even if we can facilitate it. A mode switch, like @mwyrzykowski suggested, would maybe be better, but for an OpenXR-based backend that was changing the graphics binding it used based on the JS graphics API in use that would involve tearing down and restarting the entire session. Probably too disruptive.

So it's probably best if we indicate the API that will be used at session creation time. I can think of a couple of ways to do this. One is that we could have a "webgpu" feature. It's not clear to me, though, if that would be a mutually exclusive feature with "layers" or require "layers" but forbid part of it's use? That could get pretty messy. It may be simpler to add a new enum to the XRSessionInit that indicates the API to be used, defaulting to "webgl". Then only layers using the API in question could be created with the session. (Question that we must have already addressed for WebGL: Do we allow multiple devices/contexts for that API type to be used?) For prior art on adding new keys to XRSessionInit we can look at the DOM overlay module which both requires a feature string and configures it with a new dictionary entry, domOverlay.

Having this set at session creation time also would solve a related problem I've been wondering about during my prototyping, which is that WebGPU projection matrices have a [0, 1] depth range, whereas WebGL projection matrices have a [-1, 1]. This would allow sessions to return the right depth range for the API without developer intervention, which is nice.

AdaRoseCannon commented 3 weeks ago

/tpac I'm going to tentatively mark this for TPAC unless you want to talk about it sooner (in 2 weeks)

mwyrzykowski commented 3 weeks ago

So it's probably best if we indicate the API that will be used at session creation time.

Sounds great to me, that would be my preference as well. I am also fine discussing at TPAC

toji commented 3 weeks ago

After giving it some additional thought, I'm going to advocate for using a feature string to perform the WebGL/WebGPU mode switch. So:

xrSession = await navigator.xr.requestSession('immersive-vr', {
  requiredFeatures: ['webgpu'] // Exact string subject to discussion
});

This likely provides a better developer experience, because existing WebXR implementations will actively reject that session if WebGPU integration isn't supported. (The spec says that any unrecognized requiredFeatures string causes the session request to fail.) Using a new dictionary key, on the other hand, would succeed on browsers that haven't implemented the feature yet, making it easy for developers to set the flag and assume that any session they get will now be WebGPU-based when in fact many of them will still be WebGL-based.

This also allows developers to use { optionalFeatures: ['webgpu'] } if they have a render paths that can support either mode. The developer can choose which one they will use then based on if webgpu shows up in the xrSession.enabledFeatures list. (Though to be honest this path is relatively unlikely because I expect most apps to initialize their graphics content before a session is requested, not after. In those cases developers can check for the existence of the XRGPUBinding class to estimate whether or not the session will support WebGPU.)

cabanier commented 3 weeks ago

After giving it some additional thought, I'm going to advocate for using a feature string to perform the WebGL/WebGPU mode switch. So:
xrSession = await navigator.xr.requestSession('immersive-vr', {
  requiredFeatures: ['webgpu'] // Exact string subject to discussion
});
This likely provides a better developer experience, because existing WebXR implementations will actively reject that session if WebGPU integration isn't supported.

I could live with this but would prefer a new API (ie requestWebGPUSession) because it's such a big functionality switch. I agree that it's unlikely that authors want to mix WebGPU and WebGP in their WebXR session so it's ok to have a hard choice at session startup time.

Should we also add an enum attribute to indicate to gl interface type to the xrsession or create a whole new interface? Both approaches will require substantial changes to the WebXR and WebXR layers specs. A new interface might be more work but would result in a cleaner (or potentially a whole new) spec.

toji commented 3 weeks ago

Should we also add an enum attribute to indicate to gl interface type to the xrsession or create a whole new interface?

By a whole new interface do you mean introducing something like an XRWebGPUSession that you would use instead of an XRSession? I'm not in favor of that at all. XRSession has almost nothing that is graphics-API specific. Accepting an XRWebGLLayer as the baseLayer in updateRenderState() is the only thing, and it's already easy to avoid with layers. I don't see any benefit to creating a new variant of the session, as it'll basically be the exact same thing with a different interface name.

A new session request API (requestWebGPUSession()) is more palatable, but I'm not sold on the benefit of it. I do think that an enum attribute to indicate the graphics API is useful, even if we went the webgpu feature route, since it's easier/faster than if (xrSession.enabledFeatures.includes('webgpu')) { /* ... */. }.

As for changes to the WebXR and Layers spec, I don't think it'll be too bad? We would simply indicate (probably in the WebGPU bindings module spec itself) that the constructors for XRWebGLLayer and XRWebGLBinding throw an exception if the session is not a WebGL-based session. We'd also indicate that the projection matrices need to be tweaked to account for WebGPUs [0, 1] depth range (see #8) and that's about it?

cabanier commented 3 weeks ago

Should we also add an enum attribute to indicate to gl interface type to the xrsession or create a whole new interface?

By a whole new interface do you mean introducing something like an XRWebGPUSession that you would use instead of an XRSession? I'm not in favor of that at all. XRSession has almost nothing that is graphics-API specific. Accepting an XRWebGLLayer as the baseLayer in updateRenderState() is the only thing, and it's already easy to avoid with layers. I don't see any benefit to creating a new variant of the session, as it'll basically be the exact same thing with a different interface name.

It would just be to keep the spec clean and not having to add if/else conditions all over the place since the class name would be checked by idl. Maybe it won't be too bad...

A new session request API (requestWebGPUSession()) is more palatable, but I'm not sold on the benefit of it. I do think that an enum attribute to indicate the graphics API is useful, even if we went the webgpu feature route, since it's easier/faster than if (xrSession.enabledFeatures.includes('webgpu')) { /* ... */. }.

OK. We can always iterate on this later.

As for changes to the WebXR and Layers spec, I don't think it'll be too bad? We would simply indicate (probably in the WebGPU bindings module spec itself) that the constructors for XRWebGLLayer and XRWebGLBinding throw an exception if the session is not a WebGL-based session.

True. Will the WebGPUBinding class be in that module and refer to the Layers spec for the different layer types? To check for the session type, we would need an attribute on xrsession. The attribute could just be internal.

immersive-web / WebXR-WebGPU-Binding

Interoperability between WebGL and WebGPU within WebXR? #7