Address confusion regarding XRFrameOfReference types

NellWaliczek commented 6 years ago

A few weeks back, @toji and I discovered that we didn't have the same understanding of the definitions of the "eye-level", "stage", and "head-model" XRFrameOfReference types. We did a poll on one of the weekly calls and it sounded like there were a handful of others that had various interpretations as well.
This issue tracks the need to reach agreement the definitions of these types. It also covers clarifying the explainer/spec text to reflect the expected behaviors on various devices such as 3DOF/6DOF or those which might need to emulate the floor offset. It is also related to issue #389 filed by @Artyom17

RafaelCintron commented 6 years ago

Thank you for bringing this us, @NellWaliczek .

Reading through the current crop of frames of reference, I was confused how they're partially defined in terms of each other.

For head-model, it says: An XRFrameOfReference with a frame of reference type of "head-model" describes a coordinate system identical to an eye-level frame of reference, but where the device is always located at the origin.

For eye-level, it says: Describes a coordinate system with an origin that corresponds to the first device pose acquired by the XRSession after the "head-model" frame of reference is created

Does this mean that you need to make a head-model frame of reference before you make an eye-level frame of reference?

lincolnfrog commented 6 years ago

As per conversation in the f2f, maybe we can simplify this by splitting the role of FOR types apart into ~3 different concerns:

1) Getting a view matrix Since the main use-case of these seems to be for generating a view matrix, we can just have a method XRSession.getViewMatrix(frameOfReferenceType, offsetTransform). This type parameter would likely not need to include "stage" as the stage bounds could be separated out and things like emulated height could be included in the offsetTransform (after having been queried separately - see below). Likely then we just need two options for frameOfReferenceType if we make it so that the "world" type where accuracy of tracking is best in the immediate vicinity of the headset is the standard behavior and we require ubiquitous anchoring of all virtual content (with emulated anchors for 3DOF or outside-in systems). The two options would basically be "head" and "world", where head has the translation zeroed out and world does not. Note: we might need a third setting for whether a neck-model should be used or not since we agreed we want to avoid people hacking the view matrix post-hoc to remove translations.

2) Stage bounds / emulated height Just move these to XRSession as well - XRSession.getStageBounds() and XRSession.getEmulatedHeight(). The emulated height could then be optionally passed into the getViewMatrix() function as an offset transform to differentiate between seated and standing modes.

3) Feature detection In order to determine whether the user's system supports 3DOF/6DOF/etc., we would make that something you query independently on the session and/or request as part of the session.

NellWaliczek commented 6 years ago

Fixed by #409

immersive-web / webxr

Address confusion regarding XRFrameOfReference types #396