Problem: Supporting WebXR responsively across the web is hard.

Current WebXR/WebVR development requires handling multiple types of XR experiences (3DOF mobile, 6DOF room, different controller types), as well as when no native device is supported (desktop with no VR display, mobile with no VR display), or when the API doesn't exist at all. Not even including the complexities and uncertainties of future AR devices, there are a lot of possibilities to handle.

Code Forking

Currently, if you want to support both a non-XR experience and native XRDevice, you need some forks in code. In WebVR 1.1, this wasn't too bad: either apply some manipulation to a camera representation when no VRDisplay found, or otherwise update VRFrameData, and mostly the same render path using the same WebGL context. WebXR has a different structure which makes this code forking more complex.

Valid use cases that run into forked code:

A responsive experience that is designed for mouse/keyboard on desktop, touch on mobile, and full immersive XR.
A "teaser" non-XR WebGL experience before entering XR, e.g. use mouse/keyboard to try out an experience, before potentially deciding to grab a headset and "enter XR".
A "teaser" non-XR WebGL experience before entering XR due to user-gesture locks on creating XRSession.

Forked code causes the following difficulties:

In WebXR, one needs to change how to bind input events; a desktop/no-XR experience must bind to the input WebGL canvas, whereas a WebXR supported experience must bind to the output XRPresentationContext.
Differences in rendering due to things like framebuffer binding in XR. Originally working with WebXR and three.js resulted in three's render targets (shadows, custom render targets) not working when using the XR path.
Future tooling/libraries could help here, but the differences between a standard "WebGL non-XR path" and "XR path" mean that it is extra work to support XR, if the non-XR WebGL path is the baseline. Targeting a single path could mean new experiences in the future would get XR "for free".
Supporting a single-context (non-XR) is just very different from supporting double-context (WebXR).

Can the polyfill solve this?

Previously in the webvr-polyfill, there was support for VRDisplay's that were controlled via mouse and keyboard on desktop, or potentially a touch panner for mobile (swipe to look around). This allowed developers to have an out-of-the-box (after using the polyfill) control scheme for non-VR platforms, although not customizable. While the polyfill could support ways for developers to provide their own controls, this is still something developers must be aware of, and difficult to inject your own controls into an (webxr) polyfill implementation.
In WebXR, there are two contexts (WebGL input, XRPresentationContext output), which, when polyfilled, will always have some overhead of copying a canvas in user-land.
The polyfill is another library that developers have to be aware of, and to include in their WebXR stack, with lines blurring to devs between aframe/three/polyfill/platform APIs. It feels necessary to be aware of the history and current status of WebVR/WebXR in order to use it, which should not be the case. It feels silly to clobber all of the WebXR Device API replaced with a JavaScript version when a browser supports the API but no native XRDevice. More consistency across platforms while debugging rather than figuring out, "so is navigator.xr.requestDevice() native? From the polyfill? Which version?"
In webvr-polyfill, there's an option enabled by default that when on mobile, if we still clobber the native WebVR 1.1 navigator.getVRDisplays() because we need to first see if getVRDisplays() returns a native VRDisplay, and if not, provide a cardboard fallback, in order to try our best to support some experience. Monkey-patching getVRDisplays to return either the native display or a cardboard fallback feels a bit like throwing the baby out with the bathwater. The webxr-polyfill does similar monkey-patching, except due to the complexities compared to 1.1's mostly-VRDisplay API, the patching is more involved, and more likely to differ from native implementations.

Potential Solution: Poseless XRSessions

If it were possible to have an XRDevice on non-supported platforms that does not provide poses, developers could use a single render path, eliminating confusing forks, and offer potentially increasing XR-first application development as well as vendor adoption. This would essentially allow desktop WebGL experiences to use the WebXR rendering pipeline.

I'm confident that the problem is something that should be addressed, while less confident in this specific solution. The proposed names are just used to indicate the idea and for sure need work.

If we have the concept of an XRDevice representing the rendering flow, with supportsSession and XRSession having a 'poseless' value, which functions the same as a standard non-exclusive XRSession except the pose is always null, we could expose a way for developers to plug into the XR rendering path without needing access to a pose-generating XR system, and to build responsive experiences.

Example

A quick, non-detailed example to illustrate the idea. This falls back to a poseless XRSession when WebXR exists, but no valid XR system, like on a desktop. Another example could be a poseless, magic-window XRSession on page load, and upon clicking a button, engage an immersive, real-pose XRSession.

async function init() {
  const xrDevice = await navigator.xr.requestDevice();
  try {
    // If we cannot create a magic window session due to user gesture gate
    // or lack of native pose support
    session = await xrDevice.supportsSession({ outputContext: ctx });
  } catch (e) {
    // Create a controlled XRSession if no platform support
    session = await xrDevice.supportsSession({ outputContext: ctx, poseless: true })
  }
  // set up render loop, bind appropriate input events
}

// Only used on poseless sessions
function onMouseMove(e) {
  this.camera.quaternion.copy(calculatePoseFromMouseMovement(e));
}

function onXRFrame(time, frame) {
  let pose;
  // In this case, we render based off of `camera` values, and when poseless,
  // use the camera's pose for rendering, otherwise, copy the XRDevicePose into
  // the camera. We could also convert the custom `camera` into an XRDevicePose-ish
  // object for rendering, same results.
  if (session.poseless) {
    camera.updateMatrix();
  } else {
    pose = frame.getDevicePose(this.frameOfRef);
    copyXRDevicePoseToCamera(pose, camera);
  }
  // render
}

Rough IDL

partial dictionary XRSessionCreationOptions {
  boolean poseless = false;
};
partial [SecureContext, Exposed=Window] interface XRSession : EventTarget {
  readonly attribute boolean poseless;
};

Hopes & Dreams & Assumptions

With platform XR support not necessary to provide a WebXR Device API implementation, this could increase/accelerate vendor adoption, reducing the need for the polyfill, and reduce the amount of caveats needed in the answer to, "So I can do VR on the web?"
Ecosystem of WebXR controls easily usable, as they just affect APIs on an XRSession, rather than needing to propagate through the WebXR system. Probably can leverage many three.js controls out of the box, e.g. using OrbitControls on a camera, and constructing an XRDevicePose from a THREE.Camera.

Open Questions

Any syncing issues for implementation?
Would this make implementation easy for browser vendor when no native XRDevice is supported?
Are there better names/implementations of this "poseless" XRSession, or another approach?

Some questions/comments/alternative solutions from today's call:

What is an XRDevice if it doesn't guarantee XR? - via @RafaelCintron
Default Controls: Provide something like a touch panner control for swiping a 3DOF experience; provides consistent controls/experience, useful even when you have full VR capabilities (e.g. sitting on a bus) - via @toji
Inject Pose: This initial idea started as an exposed a setPose(poseMatrix) function that allows a developer to use any sort of control handling, and passing that into the XR system, receiving the pose again on every frame. After some riffing, I arrived at poseless sessions, since it seemed redundant to put the pose in the system just to get the same thing back, but @toji mentioned today there are advantages in doing this, since the XR system would then be aware of poses and use relative rays.
@NellWaliczek has some ideas around this issue that she'll be sharing in the coming weeks
Potentially a single framebuffer in the future, removing (?) the double canvas workflow

Been thinking about this quite a bit, and wanted to post something before I left for vacation. Sorry, Nell, if this disrupts anything you were working on.

After going through a lot of way-too-complicated schemes in my head, it occurs to me that the core thing we really need is just to set a transform that apply to any pose we get. Since poses are always retrieved relative to a coordinate system, the transform makes sense to be applied there. Combine this with the desire for a totally synthetic (I'll use Jordan's "poseless" term for the time being) session and a small bit of consideration for API efficiency, and I arrived at something that looks like this:

// IDL changes

interface XRPoseController {
  void setPoseTransform(Float32Array transformMatrix);
};

dictionary XRSessionCreationOptions {
  boolean immersive = false;
  XRPresentationContext outputContext;
  bool poseless = false; // Not necessary to use the poseTransformController
};

options dictionary XRFrameOfReferenceOptions {
  boolean disableStageEmulation = false;
  double stageEmulationHeight = 0.0;
  XRPoseController poseController = null;
};

// Use

class TouchPoseController extends XRPoseController {
  constructor (canvas) {
    this.canvas = canvas;
    this.yaw = 0;
    this.lastTouchX = 0;
    canvas.addEventListener('touchmove', (ev) => {
      // Handwave handwave
      let touchX = ev.touches[0].screenX;
      this.yaw += this.lastTouchX - touchX;
      this.lastTouchX = touchX;
      let matrix = mat4.create();
      mat4.rotateY(matrix, matrix, this.yaw);
      this.setPoseTransform(matrix);
    });
  }
}

let outputCanvas = document.createElement('canvas');
let ctx = outputCanvas.getContext('xrpresent');
let poseController = new TouchPoseController(outputCanvas);

xrDevice.requestSession({ outputContext: ctx, poseless: true }).then((session) => {
  xrFrameOfRef = session.requestFrameOfReference('stage', { poseController: poseController });
  // And everything else works as normal.
});

To answer a question I know is coming preemptively: I feel like the pose controller should be an interface rather than a simple callback that returns a transform because it's easier to validate the transform once upon setting rather than upon every callback, and them implementation is more efficient both for cases where multiple poses are queried per-frame (like with controllers) and for apps where pose transforms are spare (like touch panning.)

"Poseless" simply returns an identity pose at all times, so the only way it's really useful is if you combine it with some other method for controlling the pose, hence the XRPoseController. But by separating the poseless request from the pose transforms we also enable a really simple mechanism for handling things like artificial movement in the VR space while still having local transforms be accurate.

Would this approach support controls which are rate based? A slider which determines how fast the scene should spin for instance. Will we have access to the frame timing when we do the pose calculation?

Rate based controls: Yeah, you could definitely do this since it would give the developer full control over the pose.
Frame timing: My proposal doesn't explicitly expose this, but I'm sure we could work it out if needed. Biggest issue is that we haven't figured out what kind of timing information about the frame we should expose. (See #347)

One other thought that I had after reviewing what I posted last week is that the poseless flag should probably be passed when requesting the XRDevice, not the XRSession. There's a few of reasons for this:

If we delegate poseless to the session then we must always return a device, which means that you have no mechanism for detecting when actual XR hardware is unavailable.
Specifying a poseless device is logically a bit more consistent, and allows the system to completely avoid even checking for hardware, which means it should be more lightweight and potentially more appealing.
Since WebGL context compatibility is done off the device, if poseless was handled at the session level you may end up making the context compatible with a physical piece of XR hardware only to not actually use it, which may hurt the performance of the poseless session.
Signaling the desire for poseless sessions as early as possible may allow some browsers to avoid "This page wants to use XR" prompts when that's the only mechanism requested, or allow the page to gracefully upgrade: You can start showing poseless content while the prompt is up, and switch to magic window when the prompt is accepted.

So the new proposed code, expanded a bit to account for more of the initialization, looks like:

navigator.xr.requestDevice({ poseless: true }).then((xrDevice) => {
  let glCanvas = document.createElement('canvas');
  let gl = glCanvas.getContext('webgl', { compatibleXRDevice: xrDevice });

  let outputCanvas = document.createElement('canvas');
  let ctx = outputCanvas.getContext('xrpresent');

  xrDevice.requestSession({ outputContext: ctx }).then((xrSession) => {
    let poseController = new TouchPoseController(outputCanvas);

    xrSession.baseLayer = new XRWebGLLayer(xrSession, gl);
    xrSession.requestFrameOfReference('stage', { poseController: poseController }).then((xrFrameOfRef) => {
        // And everything else works as normal.
        xrSession.requestAnimationFrame(onXRFrame);
    }
  });
});

I don't want to deviate the discussion too much but I have a related question about the pose data and making responsive WebXR experiences. Consider a magic window session with VR content, which you use to tease the user to get into the immersive mode. Right now you get a 3DOF experience on most phone, you can look around the scene (using the phone sensors) and that's it. That is great for 360 content such as videos or pictures but not really for a more advanced content where you are intented to move around. Just like you can provide a mouse/keyboard based experience on a desktop, one would think you can provide a virtual d-pad on a magic window session to let the user move freely around the scene (not just click-a-point-to-move experience). I don't believe it's possible to support that today because the device pose is not intended to be modified. I think that poseController proposal would handle this use case as well.

@darktears for that use case, you can always add the transformations that come from the phone's pose to the transformations generated by the d-pad. I think the WebXR API should stick to provide the data that comes from the devices

I tend to agree with @AlbertoElias ... this feels like something that should be handled in Javascript, by a framework.

I'm actually a big fan of the idea of allowing pages to provide the browser with an updated pose, but for the purpose of allowing multiple pages to be composited (e.g., so I can create a "VR" page that I can overlay an "AR" page on). For that to ever work, the base page would need to be able to tell the browser what it's global pose is, so that this could be provided to the other pages. But that's a very different use case.

Here, the page is already dealing with how it wants to move around with the dpad; it doesn't seem to be a huge win to provide this to the WebXR API, vs just using it internally.

This was discussed at the July f2f without reaching a conclusion. Below is an attempt to summarize the issue and why we should do something to address it. Once we've settled that question, we can discuss details about how applications use it.

Note: Some of the modes and session creation work is already heading in this direction.

Observations

First, a couple observations:

There is nothing inherent in immersive: false that says there must be sensors.
- If you think about the opposite of immersive as "inline", there is no implication of sensors/degrees of freedom. Similarly, "immersive" doesn't specify the degrees of freedom.
- While this is mostly a mental exercise, it may help to reset assumptions.
The polyfill won't solve this.
- Technically, a polyfill exactly implements a specification. Thus, it should do exactly what the specification says would happen if there are no sensors. While frameworks or boilerplate could handle this, the polyfill wouldn't.

Pros

The following are (potential) advantages of returning a non-immersive session regardless of whether the device has sensors.

Single rAF: It is much easier for developers if they only need to worry about one rAF.
- The application follows the same flow in all cases (including the polyfill for user agents without any WebXR support).
- Avoids rAF transition bugs.
  - As described in https://github.com/immersive-web/webxr/issues/352, it is easy to overlook an issue that will cause the page to "hang" on some platform (primarily on mobile) when switching between XRSession.rAF and window.rAF because the page will appear to run fine on other platforms (i.e., desktop).
Input: In addition to unifying the rAF, developers can use a single input solution.
- If we don't allow creation of an XRSession, then (the graphics part of the) application needs to handle touch, etc. rather than always relying on the abstraction provided by XRInputSource.
- The single "activation" input already supported by XRInputSource is a great fit for these no-sensor-or-runtime scenarios.
Layers: If we want to add support for additional layers to WebXR in the future, we'll want those to also work for users without headsets. Since "magic window" uses the same APIs, we should be able to make that work. However, without a unified rendering path, authors would need to create a parallel mechanism to support such content on devices without sensors. By allowing WebXR to always be used, any layer work done for non-immersive sessions would work for all clients and users.
It encourages a better experience for everyone without extra developer effort.
- Similar to how accessibility solutions often improve experiences for everyone, efforts to add keyboard/mouse/touch support for sensorless devices can naturally apply to devices with sensors. (At the July f2f, @toji gave an example of viewing a 360 video on a phone on the bus.)
- If the rendering path is separate, developers might only enable this on the WebXR-less path. (While they could similarly only do this if the device is indicated to be sensorless - see Detection below - they might be more likely to in this case, especially with appropriate guidance.)
Even if a device has sensors, the user may not actually use them, making the device effectively sensorless. For example, a laptop or tablet that is docked.
- Thus, it is even more important that developers always provide the alternate navigation mechanisms mentioned above (at least for non-immersive sessions).
Similarly, sensors might come and go.
- I'm not sure how likely this is, but it seems better to avoid baking in an assumption that once sensorless, always sensorless (and vice versa). (We should also think about this if we ever expose degrees of freedom.)

Cons

One of the arguments against this at the July f2f is that this is already possible in user space and frameworks could just handle it and work out the rAF kinks.
- I think there are additional compelling reasons for unification and platform consistency.
A related concern was whether this might mess with what those frameworks want to do.
- In a related discussion, it was noted that frameworks probably already have a XR/non-XR flag and path and that not unifying the rendering paths might lead to three paths in the framework.
- We can probably mitigate this by being clear to the developer about the current state.

Next Steps

Hopefully, we can agree that WebXR-capable implementations should always satisfy requests for a non-immersive session. Then we can make appropriate spec changes (or make this assumption in ongoing modes and session creation work) and close this issue.

Then, there are some additional conversations we should have. I think it would be most effective to discuss these in separate issues.

How should such devices behave?
- Return a null pose or an identity pose?
- Consider:
  - What applications already need to handle.
  - That sensors may come and go.
  - Alternative navigation (see below).
Detection: Should it be possible for applications to detect that there will never be a real pose?
- This was discussed at the July f2f, but I'm not sure this is actually necessary or a good thing.
  - See the text in Pros above about:
    - Sensors not being used or coming and going.
    - Encouraging developer to provide keyboard/mouse/touch overrides for all (non-immersive) sessions.
- If detection is possible, it should only be after the session is created.
- If we need detection, how should it be communicated?
  - Returning a null pose might cause some poorly-written/tested applications to break instead of just rendering "forward." Adding a separate attribute/flag would be unfortunate, though.
Alternative navigation: How should applications allow users to explore the full 360 space (and maybe even move)?
- We should discuss this separately (and probably not block landing the basic fix to the spec).
- Ideas include the application applying a matrix or an explicit setPose() method.
- Related: Should there be UA affordances for navigation?
Session upgrades / progressive enhancement: Ensure the mechanisms work well and are consistent.
Ensuring we handle all cases:
- Issues related to multiple graphics adapters?
- Adding/removing (i.e., desktop) headsets
Is XRDevice still necessary? (#385)
Accessibility: Ensure these changes don't remove options, and consider how things like alternative navigation might actually improve support.

immersive-web / webxr

Unified Render Paths: Poseless XRSessions #367