Extending WebExtensions for XR

TrevorFSmith commented 5 years ago

Basic problem:

The WebXR Device API supports exclusive XR sessions created on demand from UAs' tabs or windows but there is another important use case for headset-accessed XR: long-lived and simultaneously running applications.

Examples of such applications include:

personal agents that stay with you all day
language translation of signs while traveling
billboard advertising blockers
virtual art anchored to the walls of your home

There is an existing API for hosting simultaneously running and long-lived applications outside of the context of a single web view: WebExtensions.

This is a proposal that we work toward defining and eventually standardizing extensions to the WebExtension API to support long-lived XR applications.

I tend to envision these applications as living around users as a sort of personal "flock" that appears whenever they're in the "home environment" of their headset web browsers (so, not in a WebXR session) but as with the WebXR Device API and WebExtensions there will be a lot of room for UAs (or XR OS creators) to choose different uses of the technology.

To avoid locking this discussion into the "flock" concept this Issue will use the term "WebExtension XR" (WEXR) apps.

Open questions to be addressed:

What types of sensor and UA data are necessary for WEXR apps to address common use cases?

How can UAs balance that access with security and safety expectations?

How can UAs render multiple WEXR apps while maintaining acceptable performance?

What information passing between WEXR apps and origins is necessary and acceptable?

How can UAs route user input (hand gesture, voice command, etc) to the appropriate WEXR app and prevent inappropriate access to input data by other WEXR apps?

Straw dog proposal

For the purpose of spawning discussion, here's a quick and incomplete idea for one way that these extensions could be defined and used:

manifest permissions are extended to include 'xr-forward-camera' and 'xr-input' values, indicating that the WEXR app requests access to a headset's forward facing camera and to the UA managed stream of input events.

manifest background is extended to include 'xr-script' values that indicate that the WEXR app would like to spawn a persistent background script for XR.

A message protocol and new methods in the WebExtension runtime API are available to xr-scripts in order to provide environment data, input events, life-cycle events, and inter-app messaging.

A new API is created that allows the WEXR app to give the UA a reference to a glTF file that the UA loads and renders in the UA's own graphics context. The WEXR app can use runtime messaging to request changes to the rendered data such as asking that a sub-mesh's position be translated, that a texture be swapped, or that a new uniform be passed to a shader.

The UA provides the user with ways to manage which WEXR apps are active, how and where they appear, and how to route their input. For example, a UA may only allow WEXR apps to appear inside a finite space in a known position with markers indicating the edges of the space. The UA might only pass on input events to a WEXR app if the user is looking directly at the app and makes a "finger-gun" gesture to activate the app.

Possible next steps

It would be possible to experiment with these ideas using only WebVR (eventually WebXR) by building a prototype environment that emulates a UA's "flat" user interface elements and home environment and writing a mock WebExtensions API. WEXR apps could then be written to run inside this test environment, enabling quick discovery of at least the basic developer experience and expectations.

blairmacintyre commented 5 years ago

One alternative way of looking at this might be this.

stand alone AR/VR displays are creating ideas like "prism" or "holograms", regions of space controlled by the UA into which small-ish apps run and render. (Hololens holograms are not so flexible, but could be; various VR "homes" have similar concepts, or could)
if we extended the way WebXR is initialized, we could allow a web context (page, extension) to create and render into one, and the UA can mix and control the content as it sees fit. The current UA "shells" on ML1 and Hololens are pretty ill-suited to keep long running things following you around, but they could (and probably will) evolve in this way
assuming these things, we could allow build on the ideas presented in the context of "dioramas" (divs within pages) to include the idea that a diorama could be in a "shell area". WebXR would have to report the bounds (sounds like stage bounds), and would report the views relative to whereever the shell puts the thing.
it could / should be possible to then initialize multiple WebXR sessions per page (which will be needed if we ever support dioramas within a page, anyway), if we wanted a single page or extension to support multiple prisms/holograms/etc

I wasn't keen on this path when folks suggested it to me in the context of "webxr extensions on a stand-alone AR display", but it's grown on me. While the current top-level shells of these displays (ML1, Hololens) are pretty ill-suited for it (placement of prisms/holograms is manual and tedious, and fixed in space, for example), it's incredibly easy to see that MSFT and ML will improve these, and the idea that a web page (or web extension) can create content bits that mix into the top level shell with other native apps is really appealing in the long run.

So, all this is to say that if WebXR supported creating these sorts of sessions, the webxr extensions simply need to be able to access webxr, and be limited to creating such sessions. Perhaps?

asajeffrey commented 5 years ago

This looks like an interesting problem, though it seems like web extensions allow a lot more power than would be needed. In particular they allow arbitrary web content to be modified, so web extensions aren't subject to the usual web security model. @avadacatavra might have opinions?

blairmacintyre commented 5 years ago

Good point @asajeffrey. In a sense, what we really want a 3D-only agents that can be instantiated and run without an associated 2D page.

@TrevorFSmith would an alternative be to consider a non-DOM alternative. For example, going to a .html file opens a page right now. Some browsers used to (still do?) open a .xml RSS feed and display it. Perhaps we need to consider a file format for pure 3D. The UA would (obviously) need to provide handles and controls to know what's running, etc.

jdm commented 5 years ago

That sounds like you're describing 3d service workers.

blairmacintyre commented 5 years ago

Perhaps. Are Service Workers meant to run for long periods, and is it ok if they consume a lot of resources (e.g., they'd potentially need to run at high frame rates to re-render). (from the web page: "Service workers are generic, event-driven, time-limited script contexts that run at an origin.")

That said, the concept aligns pretty well. You'd need a way to instantiate it, and then it could open a webxr session and render to it in the background.

AlbertoElias commented 5 years ago

Really interested in this! I think this would be an essential part of the platform and its success. Other native platforms like Oculus, Steam and Vive Reality System already support different options, 2D screens and such by pressing their system button from any app.

I do think them working from any app and not just the browser's home is crucial. Browsers should have a button, like the menu button, or a 2s press or something similar that opens up the Web Extensions environment on top of the browser's home or whatever WebXR content is rendering underneath.

I think it should be up to these WEXR designers to prepare them to work in whatever environment they may be in. Some WEXR may just be HUDs.

One idea for input would be that once you open up the Web Extensions environment, the UA provides a Ray the user controls to pick the WEXR.

I'd say a V1 of WEXR can work with the current way Web Extensions permissions go.

+1 to running experiments. Nailing a good UX for this is really important and this issue is great to collectively brainstorm.

cwilso commented 5 years ago

Definitely NOT a match for service workers. This is more like a background app; service workers are short-lived and transient by comparison.

AlbertoElias commented 5 years ago

Based on what @blairmacintyre I think it is worth considering this 3D file format which would also give us freedom to build a 3D web without the constraints of the DOM based 2D Web.

jdm commented 5 years ago

Actually it seems like a hybrid of worklets and Service Workers to me.

blairmacintyre commented 5 years ago

Based on what @blairmacintyre I think it is worth considering this 3D file format which would also give us freedom to build a 3D web without the constraints of the DOM based 2D Web.

At some point, I and others advocated using a model format like glTF, with embedded scripts. This has pros and cons:

pro: matches certain use cases (e.g., similar to Hololens Holograms) quite well
pro: very clean and simple to conceive of
con: very limited in what can be rendered
con: requires the UA to render, adding complexity and requiring fixed agreement on formats, script APIs for manipulation, and other things
pro: can imagine a format having multiple representations to support scaling naturally, can imagine easier management of performance, and resources, etc.

On the other hand, if we want to leave the "work" to the javascript app, then something simpler like a JS file with access to webxr seems easy.

AlbertoElias commented 5 years ago

I really like the idea, though as I see it right now, following the Extensible Web, it feels like we should first focus on the lower level-more functionality implementation and then, on top of that, build simpler APIs.

TrevorFSmith commented 5 years ago

@cwilso @jdm WebExtensions already have a mechanism for long-running scripts that aren't tied to a tab/window: background scripts

TrevorFSmith commented 5 years ago

@asajeffrey WebExtensions specify in their manifest.json permissions what kind of access they require. A WEXR app that doesn't need access to tabs, history, etc would leave those permissions out of its manifest.json and thus not have access.

TrevorFSmith commented 5 years ago

@blairmacintyre The WebXR Device API is currently (and I think correctly) focused on experiences that have full control over a rendering context and how content is positioned around the user. App code can choose which rendering engines to use, etc.

WEXR apps, on the other hand, can't be given that level of control because the UA will need to orchestrate rendering.

The straw dog proposal gives the UA full control over rendering by only allowing WEXRs to request that a glTF scene be rendered and then to send requests for changes. The UAs can then make decisions to do things like reduce texture size, disable expensive shaders, or stop animations to ensure that the multi-app environment hits a steady frame-rate.

avaer commented 5 years ago

WEXR apps, on the other hand, can't be given that level of control because the UA will need to orchestrate rendering.

I somewhat disagree; the following flow seems reasonable:

WEBEX/background scripts listen for an activation event -- decided by UA based on proximity, user environment, etc.
in response, full WebXR context spins up, with an origin offset decided by the user/UA
all active webex contexts render in parallel rafs (may or may not be in sync)
UA composites/reprojects/throttles/chastises contexts as needed
UA kills contexts whenever it decides to, possibly with a warning

I can't speak to other browser architectures, but I've run enough experiments on Magic Leap/OpenVR to see this as workable.

In general, I'm wary of adding new app styles to the ecosystem (such as programmatic GLTF models) if something existing (like straight up WebXR device) could be slotted in instead.

Of course, things like security would need to be figured out as well, but I think WebExtensions with manifests would be a good place to jump off from.

AlbertoElias commented 5 years ago

Should this PR be merged with #15 ?

TrevorFSmith commented 5 years ago

@AlbertoElias This Issue has a smaller scope than #15 since this one is addressing a specific design path (WebExtensions) but you're right that they're related.

TrevorFSmith commented 5 years ago

@modulesio I'm hearing conflicting ideas about how possible it would be to allow each WEXR app to use its own rendering engine and GL context and still maintain a good experience when there are more than one or two apps. There's some discussion of this in #15.

I'll add it to the agenda for the February 12th CG call.

TrevorFSmith commented 5 years ago

There's a summary of conversations from the F2F in https://github.com/immersive-web/proposals/issues/15#issuecomment-463068053

AdaRoseCannon commented 2 years ago

/tpac to talk about combining Immersive Sessions in general or via Extensions

AdaRoseCannon commented 2 years ago

Use case: 3rd party payment providers

AdaRoseCannon commented 2 years ago

Composability gives more utility - greater than the sum of parts

AdaRoseCannon commented 2 years ago

Issue: Input routing and security

AdaRoseCannon commented 2 years ago

compositing multiple fullscreen items gets expensive fast

AdaRoseCannon commented 2 years ago

via @nbutko it would be important if like iFrames if you can control the volume in which the embedded content can existing.

It would also be great to control the position and scale of the embedded content

immersive-web / proposals