immersive-web / webxr

Repository for the WebXR Device API Specification.
https://immersive-web.github.io/webxr/
Other
2.99k stars 385 forks source link

Unified Render Paths: Poseless XRSessions #367

Closed jsantell closed 6 years ago

jsantell commented 6 years ago

Problem: Supporting WebXR responsively across the web is hard.

Current WebXR/WebVR development requires handling multiple types of XR experiences (3DOF mobile, 6DOF room, different controller types), as well as when no native device is supported (desktop with no VR display, mobile with no VR display), or when the API doesn't exist at all. Not even including the complexities and uncertainties of future AR devices, there are a lot of possibilities to handle.

Code Forking

Currently, if you want to support both a non-XR experience and native XRDevice, you need some forks in code. In WebVR 1.1, this wasn't too bad: either apply some manipulation to a camera representation when no VRDisplay found, or otherwise update VRFrameData, and mostly the same render path using the same WebGL context. WebXR has a different structure which makes this code forking more complex.

Valid use cases that run into forked code:

Forked code causes the following difficulties:

Can the polyfill solve this?

Potential Solution: Poseless XRSessions

If it were possible to have an XRDevice on non-supported platforms that does not provide poses, developers could use a single render path, eliminating confusing forks, and offer potentially increasing XR-first application development as well as vendor adoption. This would essentially allow desktop WebGL experiences to use the WebXR rendering pipeline.

I'm confident that the problem is something that should be addressed, while less confident in this specific solution. The proposed names are just used to indicate the idea and for sure need work.

If we have the concept of an XRDevice representing the rendering flow, with supportsSession and XRSession having a 'poseless' value, which functions the same as a standard non-exclusive XRSession except the pose is always null, we could expose a way for developers to plug into the XR rendering path without needing access to a pose-generating XR system, and to build responsive experiences.

Example

A quick, non-detailed example to illustrate the idea. This falls back to a poseless XRSession when WebXR exists, but no valid XR system, like on a desktop. Another example could be a poseless, magic-window XRSession on page load, and upon clicking a button, engage an immersive, real-pose XRSession.

async function init() {
  const xrDevice = await navigator.xr.requestDevice();
  try {
    // If we cannot create a magic window session due to user gesture gate
    // or lack of native pose support
    session = await xrDevice.supportsSession({ outputContext: ctx });
  } catch (e) {
    // Create a controlled XRSession if no platform support
    session = await xrDevice.supportsSession({ outputContext: ctx, poseless: true })
  }
  // set up render loop, bind appropriate input events
}

// Only used on poseless sessions
function onMouseMove(e) {
  this.camera.quaternion.copy(calculatePoseFromMouseMovement(e));
}

function onXRFrame(time, frame) {
  let pose;
  // In this case, we render based off of `camera` values, and when poseless,
  // use the camera's pose for rendering, otherwise, copy the XRDevicePose into
  // the camera. We could also convert the custom `camera` into an XRDevicePose-ish
  // object for rendering, same results.
  if (session.poseless) {
    camera.updateMatrix();
  } else {
    pose = frame.getDevicePose(this.frameOfRef);
    copyXRDevicePoseToCamera(pose, camera);
  }
  // render
}

Rough IDL

partial dictionary XRSessionCreationOptions {
  boolean poseless = false;
};
partial [SecureContext, Exposed=Window] interface XRSession : EventTarget {
  readonly attribute boolean poseless;
};

Hopes & Dreams & Assumptions

Open Questions

jsantell commented 6 years ago

Some questions/comments/alternative solutions from today's call:

toji commented 6 years ago

Been thinking about this quite a bit, and wanted to post something before I left for vacation. Sorry, Nell, if this disrupts anything you were working on.

After going through a lot of way-too-complicated schemes in my head, it occurs to me that the core thing we really need is just to set a transform that apply to any pose we get. Since poses are always retrieved relative to a coordinate system, the transform makes sense to be applied there. Combine this with the desire for a totally synthetic (I'll use Jordan's "poseless" term for the time being) session and a small bit of consideration for API efficiency, and I arrived at something that looks like this:

// IDL changes

interface XRPoseController {
  void setPoseTransform(Float32Array transformMatrix);
};

dictionary XRSessionCreationOptions {
  boolean immersive = false;
  XRPresentationContext outputContext;
  bool poseless = false; // Not necessary to use the poseTransformController
};

options dictionary XRFrameOfReferenceOptions {
  boolean disableStageEmulation = false;
  double stageEmulationHeight = 0.0;
  XRPoseController poseController = null;
};
// Use

class TouchPoseController extends XRPoseController {
  constructor (canvas) {
    this.canvas = canvas;
    this.yaw = 0;
    this.lastTouchX = 0;
    canvas.addEventListener('touchmove', (ev) => {
      // Handwave handwave
      let touchX = ev.touches[0].screenX;
      this.yaw += this.lastTouchX - touchX;
      this.lastTouchX = touchX;
      let matrix = mat4.create();
      mat4.rotateY(matrix, matrix, this.yaw);
      this.setPoseTransform(matrix);
    });
  }
}

let outputCanvas = document.createElement('canvas');
let ctx = outputCanvas.getContext('xrpresent');
let poseController = new TouchPoseController(outputCanvas);

xrDevice.requestSession({ outputContext: ctx, poseless: true }).then((session) => {
  xrFrameOfRef = session.requestFrameOfReference('stage', { poseController: poseController });
  // And everything else works as normal.
});

To answer a question I know is coming preemptively: I feel like the pose controller should be an interface rather than a simple callback that returns a transform because it's easier to validate the transform once upon setting rather than upon every callback, and them implementation is more efficient both for cases where multiple poses are queried per-frame (like with controllers) and for apps where pose transforms are spare (like touch panning.)

"Poseless" simply returns an identity pose at all times, so the only way it's really useful is if you combine it with some other method for controlling the pose, hence the XRPoseController. But by separating the poseless request from the pose transforms we also enable a really simple mechanism for handling things like artificial movement in the VR space while still having local transforms be accurate.

bricetebbs commented 6 years ago

Would this approach support controls which are rate based? A slider which determines how fast the scene should spin for instance. Will we have access to the frame timing when we do the pose calculation?

toji commented 6 years ago

One other thought that I had after reviewing what I posted last week is that the poseless flag should probably be passed when requesting the XRDevice, not the XRSession. There's a few of reasons for this:

So the new proposed code, expanded a bit to account for more of the initialization, looks like:

navigator.xr.requestDevice({ poseless: true }).then((xrDevice) => {
  let glCanvas = document.createElement('canvas');
  let gl = glCanvas.getContext('webgl', { compatibleXRDevice: xrDevice });

  let outputCanvas = document.createElement('canvas');
  let ctx = outputCanvas.getContext('xrpresent');

  xrDevice.requestSession({ outputContext: ctx }).then((xrSession) => {
    let poseController = new TouchPoseController(outputCanvas);

    xrSession.baseLayer = new XRWebGLLayer(xrSession, gl);
    xrSession.requestFrameOfReference('stage', { poseController: poseController }).then((xrFrameOfRef) => {
        // And everything else works as normal.
        xrSession.requestAnimationFrame(onXRFrame);
    }
  });
});
darktears commented 6 years ago

I don't want to deviate the discussion too much but I have a related question about the pose data and making responsive WebXR experiences. Consider a magic window session with VR content, which you use to tease the user to get into the immersive mode. Right now you get a 3DOF experience on most phone, you can look around the scene (using the phone sensors) and that's it. That is great for 360 content such as videos or pictures but not really for a more advanced content where you are intented to move around. Just like you can provide a mouse/keyboard based experience on a desktop, one would think you can provide a virtual d-pad on a magic window session to let the user move freely around the scene (not just click-a-point-to-move experience). I don't believe it's possible to support that today because the device pose is not intended to be modified. I think that poseController proposal would handle this use case as well.

AlbertoElias commented 6 years ago

@darktears for that use case, you can always add the transformations that come from the phone's pose to the transformations generated by the d-pad. I think the WebXR API should stick to provide the data that comes from the devices

blairmacintyre commented 6 years ago

I tend to agree with @AlbertoElias ... this feels like something that should be handled in Javascript, by a framework.

I'm actually a big fan of the idea of allowing pages to provide the browser with an updated pose, but for the purpose of allowing multiple pages to be composited (e.g., so I can create a "VR" page that I can overlay an "AR" page on). For that to ever work, the base page would need to be able to tell the browser what it's global pose is, so that this could be provided to the other pages. But that's a very different use case.

Here, the page is already dealing with how it wants to move around with the dpad; it doesn't seem to be a huge win to provide this to the WebXR API, vs just using it internally.

ddorwin commented 6 years ago

This was discussed at the July f2f without reaching a conclusion. Below is an attempt to summarize the issue and why we should do something to address it. Once we've settled that question, we can discuss details about how applications use it.

Note: Some of the modes and session creation work is already heading in this direction.

Observations

First, a couple observations:

Pros

The following are (potential) advantages of returning a non-immersive session regardless of whether the device has sensors.

Cons

Next Steps

Hopefully, we can agree that WebXR-capable implementations should always satisfy requests for a non-immersive session. Then we can make appropriate spec changes (or make this assumption in ongoing modes and session creation work) and close this issue.

Then, there are some additional conversations we should have. I think it would be most effective to discuss these in separate issues.

NellWaliczek commented 6 years ago

Fixed by #409