Expose hit-testing (raycasting) capability for WebXR

lincolnfrog commented 6 years ago

Title: Expose hit-testing (raycasting) capability for WebXR

Background: In order for web applications to make use of Augmented Reality (AR) capabilities, they must be able to identify real-world geometry. For example, a web application may wish to detect a horizontal plane (e.g, the floor) in the camera feed, and render an object (e.g, a chair) on that plane.

There are many ways that real-world geometry could be exposed through a web API. We propose starting by adding a hit-test API. This API would allow the developer to cast a ray into the real world and return a list of intersection points for that ray against whatever world understanding the underlying system gathers.

This approach abstracts the understanding of the world with a high level primitive that will work across many underlying technologies. A hit-test API would unlock a significant number of use cases for AR while allowing the work to expose other types of world understanding in a web-friendly way to proceed in parallel.

Use Cases: Use-cases enabled by such an API include:

Place a virtual object in the real world The most common form of real-world geometry that is used are horizontal surfaces on which apps would like to place virtual objects. In order for those virtual objects to appear to be anchored in the real world, they must be placed at the same height as the ground/table in the real world. Usually the placement is in response to a user gesture such as a tap. On tap, the app wants to cast a ray into the world emanating from the touch location and get a hit result that represents the location and orientation in the real world that ray would intersect so the object can be placed realistically. A hitTest API would allow the developer to detect geometry in response to a user gesture, and use the results to determine where to place/render the virtual object. Frequency: this action is usually done sparsely - that is once every several seconds or even minutes in response to user input.
Show a reticle in the center of the device that appears to track the real world surfaces that the device or controller is pointed at. Often, AR apps want to show a reticle that appears to stick to real-world surfaces (sometimes as part of the above functionality). In order to do this, the app could perform a hit-test every frame, usually based on a ray that emanates from the center of the screen. This would allow the developer to render the reticle appropriately on real-world surfaces as the scene changes. Frequency: this action is done every single frame based on a consistent ray

Proposed Approach: Technologies for identifying real-world geometry from the camera input are becoming available on mobile devices, and the user agent could use these to implement a hit-test API. The simplicity of this API also enables a wide range of implementation choices and input types. The intent is to explore an extension to the WebXR Device API - one abstracted from world understanding and closely connected to device pose and frame production - because the ability to render data over the real world (whether in passthrough or see-through mode) requires a strong connection between pose and world understanding.

For illustration purposes, such an API might look like the following: XRPresentationFrame::hitTest(XRPose rayPose) -> Promise<sequence<XRHitResult>> Input is a pose (position/orientation or matrix) whose position represents the origin of the raycast and whose orientation represents the direction of the raycast The return value is a sequence of XRHitResult that contain an XRPose which represents all the hit locations the ray intersected with. In the future, it also may contain other fields (such as the object that was hit)

blairmacintyre commented 6 years ago

As we've talked about offline, I'm still very enthusiastic about having a hitTest method in WebXR, on any session that knows about the world. @speigg and I have been talking a lot about this recently as he works on moving Argon4 to use WebXR instead of it's argon.js framework.

Two comments.

I think hitTest should be on the session, not the frame, if it's going to be a promise, as this simplifies using it inside of a web app; since the session is available for the duration of the session, you can do a hit test on a touch handler, in a promise, or in other callbacks.
perhaps hitTest should be generally available and not limited to "AR". As VR devices get more sensing capabilities (e.g., WindowsMR devices have a SLAM sensor, I'm sure others will), it's reasonable for a VR app to leverage such a capabilities too

kearwood commented 6 years ago

This is looking good, IMHO.

A couple of refinements could be made to further describe the results:

We should describe why there would be multiple hits, and what order they are returned in. For example, we could say that the XRHitResult's are returned in the order of depth or that they are in order of confidence.
The coordinate system / frame of reference of the ray should be set explicitly.
Each XRHitResult should include the coordinate system / frame of reference that the point is relative to.
As the api is using a promise, we should describe what happens if the coordinate system of the ray origin and the target moves during this time. If the data representing the world is not immediately available and the ray origin is in a coordinate system that moved, we need to know if the ray would be cast from the original position relative to the target coordinate system at the time of the call or if it would be reflecting the time the promise was resolved. A timestamp describing when the actual intersection test was made could help.
Perhaps we should ensure that we can extend the API additively to support testing intersections with time-swept volumes.
We should describe the behavior of occlusion. Should we include XRHitResult's behind other geometry such as walls detected with the sensors? Should some materials (such as windows or virtual objects) be treated as physically transparent, or perhaps included in the XRHitResult sequence? I would propose that we explicitly identify which entry in the XRHitResult sequence is the nearest physically opaque object.

lincolnfrog commented 6 years ago

@blairmacintyre hitTest() on session seems good to me, it's less opinionated and more tolerant of timing / frames.

To your second point, I agree hitTest could be implemented across any type of XR session. I am not sure what use-case there would be for the user-agent to provide hit-testing in VR, but I could imagine that happening.

lincolnfrog commented 6 years ago

@kearwood You make a bunch of great points! I will try to add / address your points in the readme and i'll let you know when I am done so you can review. Thanks!!!

toji commented 6 years ago

Based on the conversation here and on the CG calls, this proposal meets the criteria for an individual repo. One has been created here: https://github.com/immersive-web/hit-test

Further discussion or issues regarding this proposal should take place on the new repository.

immersive-web / proposals

Expose hit-testing (raycasting) capability for WebXR #9