Closed jsantell closed 6 years ago
Some questions/comments/alternative solutions from today's call:
setPose(poseMatrix)
function that allows a developer to use any sort of control handling, and passing that into the XR system, receiving the pose again on every frame. After some riffing, I arrived at poseless sessions, since it seemed redundant to put the pose in the system just to get the same thing back, but @toji mentioned today there are advantages in doing this, since the XR system would then be aware of poses and use relative rays.Been thinking about this quite a bit, and wanted to post something before I left for vacation. Sorry, Nell, if this disrupts anything you were working on.
After going through a lot of way-too-complicated schemes in my head, it occurs to me that the core thing we really need is just to set a transform that apply to any pose we get. Since poses are always retrieved relative to a coordinate system, the transform makes sense to be applied there. Combine this with the desire for a totally synthetic (I'll use Jordan's "poseless" term for the time being) session and a small bit of consideration for API efficiency, and I arrived at something that looks like this:
// IDL changes
interface XRPoseController {
void setPoseTransform(Float32Array transformMatrix);
};
dictionary XRSessionCreationOptions {
boolean immersive = false;
XRPresentationContext outputContext;
bool poseless = false; // Not necessary to use the poseTransformController
};
options dictionary XRFrameOfReferenceOptions {
boolean disableStageEmulation = false;
double stageEmulationHeight = 0.0;
XRPoseController poseController = null;
};
// Use
class TouchPoseController extends XRPoseController {
constructor (canvas) {
this.canvas = canvas;
this.yaw = 0;
this.lastTouchX = 0;
canvas.addEventListener('touchmove', (ev) => {
// Handwave handwave
let touchX = ev.touches[0].screenX;
this.yaw += this.lastTouchX - touchX;
this.lastTouchX = touchX;
let matrix = mat4.create();
mat4.rotateY(matrix, matrix, this.yaw);
this.setPoseTransform(matrix);
});
}
}
let outputCanvas = document.createElement('canvas');
let ctx = outputCanvas.getContext('xrpresent');
let poseController = new TouchPoseController(outputCanvas);
xrDevice.requestSession({ outputContext: ctx, poseless: true }).then((session) => {
xrFrameOfRef = session.requestFrameOfReference('stage', { poseController: poseController });
// And everything else works as normal.
});
To answer a question I know is coming preemptively: I feel like the pose controller should be an interface rather than a simple callback that returns a transform because it's easier to validate the transform once upon setting rather than upon every callback, and them implementation is more efficient both for cases where multiple poses are queried per-frame (like with controllers) and for apps where pose transforms are spare (like touch panning.)
"Poseless" simply returns an identity pose at all times, so the only way it's really useful is if you combine it with some other method for controlling the pose, hence the XRPoseController. But by separating the poseless request from the pose transforms we also enable a really simple mechanism for handling things like artificial movement in the VR space while still having local transforms be accurate.
Would this approach support controls which are rate based? A slider which determines how fast the scene should spin for instance. Will we have access to the frame timing when we do the pose calculation?
One other thought that I had after reviewing what I posted last week is that the poseless flag should probably be passed when requesting the XRDevice
, not the XRSession
. There's a few of reasons for this:
So the new proposed code, expanded a bit to account for more of the initialization, looks like:
navigator.xr.requestDevice({ poseless: true }).then((xrDevice) => {
let glCanvas = document.createElement('canvas');
let gl = glCanvas.getContext('webgl', { compatibleXRDevice: xrDevice });
let outputCanvas = document.createElement('canvas');
let ctx = outputCanvas.getContext('xrpresent');
xrDevice.requestSession({ outputContext: ctx }).then((xrSession) => {
let poseController = new TouchPoseController(outputCanvas);
xrSession.baseLayer = new XRWebGLLayer(xrSession, gl);
xrSession.requestFrameOfReference('stage', { poseController: poseController }).then((xrFrameOfRef) => {
// And everything else works as normal.
xrSession.requestAnimationFrame(onXRFrame);
}
});
});
I don't want to deviate the discussion too much but I have a related question about the pose data and making responsive WebXR experiences. Consider a magic window session with VR content, which you use to tease the user to get into the immersive mode. Right now you get a 3DOF experience on most phone, you can look around the scene (using the phone sensors) and that's it. That is great for 360 content such as videos or pictures but not really for a more advanced content where you are intented to move around. Just like you can provide a mouse/keyboard based experience on a desktop, one would think you can provide a virtual d-pad on a magic window session to let the user move freely around the scene (not just click-a-point-to-move experience). I don't believe it's possible to support that today because the device pose is not intended to be modified. I think that poseController proposal would handle this use case as well.
@darktears for that use case, you can always add the transformations that come from the phone's pose to the transformations generated by the d-pad. I think the WebXR API should stick to provide the data that comes from the devices
I tend to agree with @AlbertoElias ... this feels like something that should be handled in Javascript, by a framework.
I'm actually a big fan of the idea of allowing pages to provide the browser with an updated pose, but for the purpose of allowing multiple pages to be composited (e.g., so I can create a "VR" page that I can overlay an "AR" page on). For that to ever work, the base page would need to be able to tell the browser what it's global pose is, so that this could be provided to the other pages. But that's a very different use case.
Here, the page is already dealing with how it wants to move around with the dpad; it doesn't seem to be a huge win to provide this to the WebXR API, vs just using it internally.
This was discussed at the July f2f without reaching a conclusion. Below is an attempt to summarize the issue and why we should do something to address it. Once we've settled that question, we can discuss details about how applications use it.
Note: Some of the modes and session creation work is already heading in this direction.
First, a couple observations:
immersive: false
that says there must be sensors.
The following are (potential) advantages of returning a non-immersive session regardless of whether the device has sensors.
Hopefully, we can agree that WebXR-capable implementations should always satisfy requests for a non-immersive session. Then we can make appropriate spec changes (or make this assumption in ongoing modes and session creation work) and close this issue.
Then, there are some additional conversations we should have. I think it would be most effective to discuss these in separate issues.
setPose()
method.Fixed by #409
Problem: Supporting WebXR responsively across the web is hard.
Current WebXR/WebVR development requires handling multiple types of XR experiences (3DOF mobile, 6DOF room, different controller types), as well as when no native device is supported (desktop with no VR display, mobile with no VR display), or when the API doesn't exist at all. Not even including the complexities and uncertainties of future AR devices, there are a lot of possibilities to handle.
Code Forking
Currently, if you want to support both a non-XR experience and native XRDevice, you need some forks in code. In WebVR 1.1, this wasn't too bad: either apply some manipulation to a camera representation when no VRDisplay found, or otherwise update VRFrameData, and mostly the same render path using the same WebGL context. WebXR has a different structure which makes this code forking more complex.
Valid use cases that run into forked code:
Forked code causes the following difficulties:
Can the polyfill solve this?
navigator.getVRDisplays()
because we need to first see ifgetVRDisplays()
returns a native VRDisplay, and if not, provide a cardboard fallback, in order to try our best to support some experience. Monkey-patching getVRDisplays to return either the native display or a cardboard fallback feels a bit like throwing the baby out with the bathwater. The webxr-polyfill does similar monkey-patching, except due to the complexities compared to 1.1's mostly-VRDisplay API, the patching is more involved, and more likely to differ from native implementations.Potential Solution: Poseless XRSessions
If it were possible to have an XRDevice on non-supported platforms that does not provide poses, developers could use a single render path, eliminating confusing forks, and offer potentially increasing XR-first application development as well as vendor adoption. This would essentially allow desktop WebGL experiences to use the WebXR rendering pipeline.
I'm confident that the problem is something that should be addressed, while less confident in this specific solution. The proposed names are just used to indicate the idea and for sure need work.
If we have the concept of an XRDevice representing the rendering flow, with supportsSession and XRSession having a 'poseless' value, which functions the same as a standard non-exclusive XRSession except the pose is always null, we could expose a way for developers to plug into the XR rendering path without needing access to a pose-generating XR system, and to build responsive experiences.
Example
A quick, non-detailed example to illustrate the idea. This falls back to a poseless XRSession when WebXR exists, but no valid XR system, like on a desktop. Another example could be a poseless, magic-window XRSession on page load, and upon clicking a button, engage an immersive, real-pose XRSession.
Rough IDL
Hopes & Dreams & Assumptions
Open Questions