immersive-web / webxr

Repository for the WebXR Device API Specification.

https://immersive-web.github.io/webxr/

Other

3k stars 386 forks source link

Revisit session creation options and flow #330

Closed blairmacintyre closed 6 years ago

blairmacintyre commented 6 years ago

XR session creation is structured as it was in WebVR, which had a much more limited set of use cases and possible display structures.

I would suggest we need to do a few things to make session creation more flexible and usable. Here are a list of possibilities, to spur discussion.

First, the combination of supportsSession and requestSession seem reasonable, but we need a way to handle the case where an external action may activate a session. Is there an implication that the UA can fire off an XRSession event when there is not currently a session? For example, a UA may have it's own UI for activating VR or AR, which might include allowing the user to specify the sort of session. Or, if a user follows a link while in AR of VR, the next page should auto-create a session corresponding to the previous page's session.

Second, we should update the XRSessionCreationOptions to reflect the diversity of sessions we might get, and have a corresponding set of attributes on the XRSession.

Get rid of “exclusive” … it always seemed like something of a hack to deal with the fact that we may want different sorts of connections to devices.
There 3 sorts of sessions type's we've been talking about displays supporting: "immersive" (i.e., 3D probably head-worn displays), "portal" (i.e., the magic window type, where a non-headworn display is tracked and presents either AR or VR on a display), and "flat" (e.g., is just 3D graphics in a DIV in page on the display, with no WebXR specific exclusive rendering, but might use some form of sensing). Additional ones may be added in the future, depending on platform. E.g., “projective” may support projection AR (like Microsoft's RoomAlive) if that was something someone wanted to support. Not sure what to call this: "xrType"?
To support some of the current use of "exclusive", add a capability to distinguish between VR and AR. Not sure what to call it, perhaps “reality”? With possible values “virtual, augmented”?
Add “worldAligned” as a capability to request that the local tracking frame of reference is aligned with geospatial components.
A capability to differentiate between 3DOF and 6DOF. Again, not sure what to call it, perhaps "spatialTracking"?

One of the assumptions I'm making here is that a given implementation would support one or more of these combinations, as they see fit and are able. A 6DOF device could support 3DOF if it wanted; it's not required to. A device that doesn't have a magnetometer may not support "worldAligned".

A phone like a DayDream and ARCore capable Android phone could end up supporting a range of realities that provide some combinations of

reality = virtual AND augmented
spatialTracking = 3DOF AND 6DOF
xrType = immersive AND portal AND flat
perhaps it DOES NOT support "worldAligned" (can't recall if ARCore supports that out of the box)

By exposing these as options, and properties, we also allow developers to inspect the current session to see what the setup is, and create an appropriate UI.

speigg commented 6 years ago

I actually think these various use cases can be addressed much more simply. I’m also worried about the session creation being overburdened with too many options. I do however think that it is very important to distinguish between AR and VR sessions, and that this is something each app should specify at session creation time.

My proposal is that we leave the existing session creation parameters as-is, and simply add one more:

type: “reality” vs “augmentation”.

The rationale is that any XR app does one of two things at any point in time: create a virtual reality, or augment an existing reality. An app will request which one of these two things it wants to do with this session “type” parameter.

For world aligned content, I think this can be a new frame of reference, or an option on the “stage” frame of reference, or a special anchor. If this requires special permissions, however, I can see how it might be more sensible to get it all in one go in the session creation parameters.

For spatial tracking (3DOF vs 6DOF), I think this is already handled adequately in the current spec. For example, devices without 6DOF will fail to provide an “eyeLevel” frame of reference. 3DOF is provided by the “headModel” frame of reference. I don’t think this really matters at session creation time, as the application should do its best to use whatever capabilities are available for the session it’s given.

For display modes, I think this should be made available on the XRSession (e.g. displayMode property that can be “headworn”, “handheld”, “fixed”, “projection”, etc. ), but I’m not sure I see any compelling reason why apps need to be in control (the UA should be able to handle switching between “headworn” and “handheld” modes as necessary). But if there is a compelling reason, I think this should be a method on the XRSession, such as requestDisplayMode(“headmounted” or “handheld”).

toji commented 6 years ago

Thanks for writing up your thoughts on this, Blair! I have buckets full of opinions on this subject, and am struggling to convey them in a way that's not just a massive braindump. Thankfully @speigg already hit on a couple of my initial thoughts (world alignment seems better handled at FrameOfReference/Anchor creation time, same for 3DoF vs 6DoF.)

I guess a good place to start is to try and establish a concrete reason why we should prefer to initialize something at session creation time vs. later.

In an ideal world exposing AR and VR capabilities would be something that could happen entirely post-creation with zero overhead and perfect expressiveness. The Ultimate XR Device™ would be able to be completely opaque or transparent at a moment’s notice, could switch tracking capabilities on and off at a whim, and would only track things like environmental geometry when explicitly asked. The idealized API for such a device would look something like:

xrDevice.requestSession().then((xrSession) => {
  xrSession.transparent = true;
  xrSession.positionalTracking = true;
  xrSession.getPlanes().then((planes) => { … }); 
});

So it's worth examining what prevents us from having that API? In my opinion it's three things: Permissions, mutual capability exclusivity, and hardware limitations.

Permissions: Exposing some capabilities may require showing a permission prompt to the user. We don’t ever want to automatically opt the developer into capabilities that will display a permissions prompt without the developer explicitly indicating that it was their intent to do so. (Even if a prompt is ultimately not produced.) This is probably the most flexible point, in that a great many items that might require permissions can still be handled post-create by utilizing a request->promise pattern that defers until a permission has been granted. There's something to be said for not spreading out the permissions prompts over N different calls, though. Avoiding permission fatigue is a very real and worthwhile goal.

Mutual capability exclusivity: As an example, current Android devices can do phone AR or headset VR, but not both at once. This is enforced today by making the libraries that drive the capabilities separate, but even if it were not so, the realities of mobile performance would dictate that 6DoF tracking with a video passthrough prevents stereo rendering at high enough framerate for headset use. This is technically a temporary issue, but the availability of low-powered devices is not likely to go away short term and so this issue will persist. This means that some decisions must be made early in the XR app's lifetime about what set of capabilities need to be spun up.

Hardware limitations: Devices like HoloLens or Meta 2 have displays that are permanently transparent and cannot display opaque content. Thus the same capability that requires opt-in and permissions on mobile is unavoidable and permissionless on those devices. This one is funny, because it means that something like AR is both a feature you request and a limitation you have to code around, depending on your applications context.

In any case, I tend to feel that anything that directly affects one of the above categories is a candidate for handling at session creation time, and everything else should be deferred.

Tying that back to Blair's list:

session type has enough implications for capabilities and permissions that it definitely belongs at session creation time. I'm not sure how I feel about the exact types Blair proposed, but I agree with the general sentiment.
Augmented vs. Virtual is also something that seems to require up-front definition, especially for the ARCore/Daydream scenarios where we need to initialize one or the other. We also need to consider how/if we should support cases where the user says "No, I demand Virtual. My content can't work on a transparent display."
World Aligned may incur a permission, so it seems like session creation material, but I wonder how much we can treat this like a progressive enhancement. We should be able to return reasonable poses even without world alignment, and have the UA be able to gently prompt for the users permission to be more accurate in parallel. In any case, that fits the existing creation model of the Frames of Reference pretty well.
3DoF vs 6DoF is something that I feel more strongly shouldn't be blocked on. As in, I don't think we want to allow developers to say "I demand 6DoF!" because I feel you'll get a lot of artifically restricted apps. (Does your 360 photo viewer really need Vive-level tracking?) I don't object to the idea of saying "I prefer a 6DoF, non-AR magic window over a 3DoF one, understanding that may incur permissions prompts and higher battery usage." but I'm not so convinced of it's utility that I'd jump through crazy API design hoops to get it.

(Part 1 of N)

blairmacintyre commented 6 years ago

@speigg @toji thanks. Some comments.

One thing I should probably have taken more time to separate is "things we want to know about the session we've gotten", "things we would like to check if a device might support", and "things we're asking for when requesting a session." Following on the single example of "exclusive", I just lumped them all into "options for sessions query/creation" and "properties on the resulting session".

I agree that we should have minimal options, and I agree with the categories or reasons you list @toji. I was thinking about permissions as one reason for having these as options ("we get one permission popup that ask for geo-orientation, permission to use the devices sensors, etc"). I was also thinking about device limitations: ARKit has "align the tracking coordinates with geospatial coordinates" as a initialization options, and this seems like the right place to put it since it's a session-level thing.

To expand on geo-alignment: what this is asking is if the local coordinates can be geoaligned, not for access to geolocation, and the alignment of the local coordinates seems like a "lifetime of the session" kind of choice. Now, it may be that we could change it over time (e.g., nobody is supposed to be use local coordinates directly, but rather everything should be anchored), but surely an app knows if it's using geo data or not? But, geoalignment also seems purely additive, so we can probably move this over to a "proposal" repo. I really want this discussed.

3DOF vs 6DOF may be something you want to ask ("does this device support 6DOF?") and may be something we want to set as a property or otherwise be able to query about a session (since right now I see people doing hacks like "is the position always 0,0,0"?). But this may not be something we want folks to be able to explicitly request. I was actually imagining it more as a hint ("All I need is 3DOF, but 6DOF is fine"). In the end, it may be enough to just somehow notify the programmer that this is a property of the session they got, and not have it be an option.

Finally, w.r.t. AR/VR and display modes: I'd tend to agree that requesting AR vs VR might be reasonable, and then having properties on the session to help you understand what you got might also be reasons. But, considering @toji's Android example: if I have ARCore and Daydream, how do I create a "VR" session? It could be ARCore-tracking with a VR magic window, or HMD Daydream. Who is deciding, how is this decision presented to the user?

Perhaps we have two devices: a Daydream device and a Magic Window device? An app can do sessionSupports ({reality: VR}) on both, see they both support VR, and then present the user with the choice of "devices"?

speigg commented 6 years ago

The Ultimate XR Device™ would be able to be completely opaque or transparent at a moment’s notice, could switch tracking capabilities on and off at a whim, and would only track things like environmental geometry when explicitly asked. The idealized API for such a device would look something like:

xrDevice.requestSession().then((xrSession) => {
  xrSession.transparent = true;
  xrSession.positionalTracking = true;
  xrSession.getPlanes().then((planes) => { … }); 
});

@toji I like the capabilities of your hypothetical Ultimate XR Device™ :), however I would suggest an API that requires apps to be reactive, rather than allowing apps to assume explicit control over the state of the XR device. The problem is that if we give apps explicit control, it becomes harder to backtrack in the future and give some of that control back to the user/user-agent in order to allow for more complex (experimental) use cases, such as (!) multiple simultaneous applications which can conflict with one another if they each assume control over the device state. So, perhaps a minor difference, but I’d prefer to see a combination of hints and requests even if we had the Ultimate XR Device™:

xrDevice.requestSession().then((xrSession) => {
  // request a transparent layer for AR or an opaque layer for VR 
  xrSession.requestLayer(“transparent” or “opaque”).then((xrLayer) => { ... });

  xrSession.requestDisplayMode(“handheld” or “headworn”).then(() => { ... });
  xrSession.ondisplaymodechange = () => { ... };
   // etc.... plus corresponding properties on XRSession

  xrSession.hints.positionalTracking = true;
  xrSession.getPlanes().then((planes) => { … });
});

Of course, since we don’t have the Ultimate XR Device™, there are some API quirks here, as you pointed out, such as the fact that on some devices a “transparent” layer (video-see-thru in this case) won’t work while the display mode is “headworn” (stereo in this case), while using positional tracking (6DOF). If it’s request/promise based, doing these things could result in a rejected promise of course (on devices where such capabilities are mutually exclusive), but it’s also possibly more difficult for a developer to understand why it might fail, or why positional tracking might suddenly stop working when the display mode changes to “headworn” (but only when the layer is “transparent”?!). Potentially very confusing.

It would be nice if we could avoid the failure cases altogether. I personally don’t think something like “requestDisplayMode” is even necessary (I think it’s better if the UA controls the display mode exclusively), and leaving it out could simplify things.

blairmacintyre commented 6 years ago

@speigg I agree with the reactive comments, but the need is more mundane and closer afield. I would drop

The problem is that if we give apps explicit control, it becomes harder to backtrack in the future and give some of that control back to the user/user-agent in order to allow for more complex (experimental) use cases, such as (!) multiple simultaneous applications which can conflict with one another if they each assume control over the device state.

and instead say that the problem is that having explicit control of what kind of session, and assuming apps will always create UIs to control the session they want, is problematic for various reasons:

it dissuades apps from supporting the user's choice of modality. Sure, I may be visiting a site best suited for AR, but what if I don't have (or want to use) an AR device, but do have VR? It would be better if apps got used to reacting.
following that, it gets in the way of UAs presenting "enter AR" and "enter VR" UIs to their users. If a UA (like the Argon4 app you built, @speigg) has a UI for entering VR (your "Viewer Mode") and the app is built to react, it can just react.
the more we build "enter VR!" UI's into apps, the more they are coupled to today's tech.
when users are in AR or VR and follow a link, we will want to be able to give the next page the same sort of session (at the user's discretion) without leaving AR/VR.

speigg commented 6 years ago

To expand on geo-alignment: what this is asking is if the local coordinates can be geoaligned, not for access to geolocation, and the alignment of the local coordinates seems like a "lifetime of the session" kind of choice. Now, it may be that we could change it over time (e.g., nobody is supposed to be use local coordinates directly, but rather everything should be anchored), but surely an app knows if it's using geo data or not? But, geoalignment also seems purely additive, so we can probably move this over to a "proposal" repo. I really want this discussed.

Good points. Given that it’s tied to session creation in the underlying API, I suppose there isn’t really anywhere else it can go. Or perhaps we make the coordinate system geoaligned automatically, whenever the underlying platform supports it? Then we can just have a geoaligned property (true or false) on XRSession that tells the developer whether or not the coordinate system is geoaligned. (I’m also assuming that geoalignment doesn’t have a huge impact on performance/battery relative to the rest of ARKit, but the fact that Apple made it an opt-in feature might signify otherwise).

toji commented 6 years ago

I would suggest an API that requires apps to be reactive, rather than allowing apps to assume explicit control over the state of the XR device.

I very much agree with this, and think we definitely want to encourage reactive development across the board as much as possible. But there's probably a line to be drawn here.

For example, I'm the sort of gung-ho pie-in-the-sky optimist that I'd like to just assume that the developer can say "give me whatever you've got" and we could alternately return a VR device with 6DoF input or a phone with passthrough and tap-to-interact input only OR a zSpace-like desktop and the web app will happily feature detect it's way to a working state once the session is spun up. And in a many cases I think that you can actually do something reasonable across that entire spectrum. But the reality of web development is that people will actually want to establish their own baselines for these things, with the all important decision being "do I advertise this feature or not?"

(Let's put aside the differentiation between a button the developer adds to the page and a button hosted in the UA. In both case the developer will have to make a decision about wether or not the page should support XR content on the device in question.)

Let's take an AR app for visualizing underground pipes so you don't dig into them during construction. This app has no value in VR. This app has no value (and in many ways negative value) if you can't properly align the visualization the world. Upon visiting that page, even with an otherwise XR capable device, if the developer finds that those capabilities are missing they probably want to show a "Your device is not compatible" message rather than a button which tries to spin up an XR session, then feature detects, then kicks you back out of XR and says "Oops! Turns out your device just can't do it. Sorry!"

So this get to Blair's "things we would like to check if a device might support", and I think we can all agree that there a delicate balance to be struck. On the one hand, we want to support interesting use cases like the above scenario, so we can't make our "supports" calls too high-level or it's useless outside of the simplest cases. On the other hand, we don't want to encourage developer behavior of "If you don't have a 9DoF system with tactile simulation and neural interfacing then you simply don't deserve to see this content."

I'll be the first to admit that I don't know where to draw that line, but I think that our platform being the web dictates that we start out conservative in what we expose. Both because it's easier to add API surface to the web than to remove it, and because every bit we do add is fingerprintable.

I'd also suggest that with a minor modification the supportsSession() mechanism we've already defined is likely the right approach, regardless of what feature bits we end up exposing. The method already tells us wether or not the options we pass in are likely to result in a valid session or not (communicated via the promise resolving or rejecting.) We can make it more useful by having the promise resolution return a minimal view of the properties of the session that you will get back when you call requestSession() with the given options.

For example:

let sessionOptions = {
  type: "ar",
  outputContext: context,
};

xrDevice.supportsSession().then((features) => {
  if (features.worldAligned) {
    //xrDevice supports the given sessionOptions, and the resulting session will provide
    // the required features. Advertise XR content.
    addButtonAndTellTheUAWeAreXRReady();
  } else {
    //xrDevice supports the given sessionOptions, but won't provide a required feature.
    // Don't advertise XR content.
  } 
}).catch() {
  // xrDevice doesn't support the given sessionOptions at all.
  // Don't advertise XR content.
}

blairmacintyre commented 6 years ago

React vs Request is an unclear line, I totally agree.

I think we need both. In Argon4/argon.js, we opted to be as reactive as possible, and to have developers express their preferences. You'd initialize, and eventually be handed a session, but we left the specific session up to the UA. This did make it a bit frustrating sometimes. But, in the end, I think we'll have to deal with that here to some degree, since you might not get what you request, based on what the UA does in response to user permissions requests.

Following on your example, @toji, I like the idea of expanding sessionSupports to give the info people need to decide if they want to expose XR; that's a great idea, especially if we formulate it as "The results express possibilities for this device, and are not guaranteed to be granted by the user".

I'm not sure if I prefer if we return a template of the session, or if we have a set of options that are usable in sessionSupports but not in requestSession. Or, perhaps, the features you return is just that: an expanded set of possible features that might be available on the device + sessionOptions you asked about.

The key is that we decide on the major features to include in include features that would be "make or break" for some apps: worldAligned (for ar) is one, and (eventually) things like worldStructure (when we make info about the world available) and visionSensors (it's easy to imagine that a class of applications will want to do custom CV and will need access to this, or not do anything).

The "very nice thing" about this approach is that this naturally extends to different UAs exposing platform specific things. The WebXR version of Argon that @speigg is working on could include features.vuforia to let the dev know that UA has Vuforia built in, and it's usable in that mode on that display.

At the same time, I would like to continue thinking about how we combine this with reactive elements.

I'm less concerned about the overconstrained app that really really only works in one specific case (I need world alignment AND custom computer vision on an AR display, and nothing else; I need 6DOF VR with a room area of at least 6' x 6').

I'm more concerned with underconstrained mass market apps that want to try and do something everywhere. What will their UI be expected to look like? Do they need to query every display made available by the UA for AR and VR, and then make N buttons (one for each combo): "Magic Window AR" "Magic Window VR" "DayDream VR" "External AR Display" .... ?

Or, do we want them to be able to check if at least ONE thing they support is active, create one button, and then when the user presses the button, the permissions dialog (much like the camera dialog in WebRTC) lets them select the display + type combo they want, and then give answer any appropriate permissions?

Or, do we want an app to be able to tell WebXR what they support, and let the UA present the button itself? So, web pages could have the "Enter XR" button, OR they could signal "XR capable" and the UA could present the option? Obviously, we could also support both.

toji commented 6 years ago

I'm not sure if I prefer if we return a template of the session, or if we have a set of options that are usable in sessionSupports but not in requestSession.

I'd be pretty strongly against making the options passed to supportsSession and requestSession different. I think the pattern of "If I asked for these options, what would I get?" is a powerful one, and splitting them up is asking for developer confusion.

That said, if there was a super strong need for something additional in supportsSession I'd advocate for it being of the form supportsSession(sessionOptions, timeout = -1) or similar so that we don't have to change the contents of the sessionOptions.

Or, perhaps, the features you return is just that: an expanded set of possible features that might be available on the device + sessionOptions you asked about.

This is definitely more inline with what I was thinking. And here I'm not sure if the returned features should be part of the eventual session or not. Like so?

interface XRSessionFeatures {
  boolean worldAligned;
  XRDisplayType displayType; // Opaque, passthrough, transparent?
  // Etc.
}

partial interface XRSession {
  XRSessionFeatures features;
}

partial interface XRDevice {
  Promise<XRSessionFeatures> supportsSession(XRSessionOptions sessionOptions);
  Promise<XRSession> requestSession(XRSessionOptions sessionOptions);
}

I can see that being a nice pattern, but I can also see it being more restricting than we'd prefer.

blairmacintyre commented 6 years ago

I'd be pretty strongly against making the options passed to supportsSession and requestSession different. I think the pattern of "If I asked for these options, what would I get?" is a powerful one, and splitting them up is asking for developer confusion.

I think I tend to agree. Just wanted to make this clear.

This is definitely more inline with what I was thinking. And here I'm not sure if the returned features should be part of the eventual session or not. Like so?

Do you have an example of a returned feature that you wouldn't see being part of the session object? I would think we'd want to it be, for the same developer confusion reason, especially if they might not get a session option that is possible.

Consider a display that can provide video/sensor data, or geospatial data. The features returned from supportsSession would have them true, but if the user says "Hell no, you can't have these, you untrustworthy little web page!", they should be in the session with false.

toji commented 6 years ago

I'm more concerned with underconstrained mass market apps that want to try and do something everywhere. What will their UI be expected to look like? Do they need to query every display made available by the UA for AR and VR, and then make N buttons

Arg, meant to address this too! To start, we've been talking about how to expose these things a lot at Google since we're going to be dealing with Daydream vs. ARCore. Do we expose them as two different devices or a single device where the backend that gets spun up is a function of the session options passed? It feels like the latter path is the better option for us, treating the physical phone as the singular XRDevice and allowing different sessions to expose different capability sets.

That aside, I feel like a pattern for how the page advertises it's capabilities is a function of the content and not something that we can or should do much to dictate. I imagine most content will broadly fit into buckets of "Preferred method with fallbacks" and "Multiplie specializations."

For "preferred method with fallbacks" lets use the example of an interior design app. They probably prefer (in a world where such devices are ubiquitous) an AR HMD, letting you design the actual space you're in in an immersive way. But hey! If that's not available, no worries! Handheld AR is pretty good at this scenario too. But if that's not available then working in a VR blank slate room that approximates the real one's dimensions isn't bad. And if all that fails then a simple 2D app is probably fine. This whole spectrum requires one button, though they may want to change the label depending on the mode you'll launch.

The alternative is an app like A-Painter, where there's a clear set of requirements for the painting mode to be feasible, and if that's not available then showing a gallery mode is a good fallback. BUT! What if I have a fancy 6DoF setup but I want to view other people's creations anyway? In that case you can easily envision the page having "View Gallery" and "Create your own" buttons that are both available, but maybe disabled if supportsSession reveals missing requirements.

The only catch here is if you wanted a UA button that existed outside the page, and I kind of feel like if that's going to be a thing then the developer needs to pick a single set of session options and say "That's the default." This would be important for cases where the UA is still initiating the session creation but there's no opportunity for UI to be shown. (Page-to-page navigation, inserting a phone into a headset, proximity sensor triggered, etc.)

Do you have an example of a returned feature that you wouldn't see being part of the session object? I would think we'd want to it be, for the same developer confusion reason, especially if they might not get a session option that is possible.

Not right off. Also, I think your point about permissions is a good one. Should we advertise a feature as being available even if the user has to grant access to it first and hasn't done so?

speigg commented 6 years ago

Should we advertise a feature as being available even if the user has to grant access to it first and hasn't done so?

Given that most features are not guaranteed to be available at any moment: world alignment may not be possible if GPS is not available due to bad weather of no clear view of the skies (indoors) or if the digital compass doesn’t work due to magnetic interference, and 6DOF may fail if there is not enough light / not enough visible features for tracking / user moves too quickly, etc.), I agree that the semantics should be that certain features are advertised as being supported on a given session under ideal conditions, but not guaranteed to be available for any number of reasons, including permissions not being granted.

This also implies that apps should be reactive to these features coming and going throughout the lifetime of a session. Some of this is already taken into consideration in the current spec (e.g., 6DOF -> 3DOF due to loss of tracking), but it seems unlikely that we’ll be able to completely avoid the scenario where an app discovers (after starting the session) that one or more required features are not available.

blairmacintyre commented 6 years ago

@toji

I'd be pretty strongly against making the options passed to supportsSession and requestSession different. I think the pattern of "If I asked for these options, what would I get?" is a powerful one, and splitting them up is asking for developer confusion.

I think I tend to agree. Just wanted to make this clear.

This is definitely more inline with what I was thinking. And here I'm not sure if the returned features should be part of the eventual session or not. Like so?

@speigg I was specifically thinking about these as high level feature ID's. So, "the session is capable of doing this and giving you this data / feature." But, that may or may not mean the feature works perfectly / smoothly through it's life. I agree we need to have the ability to turn some things on/off (e.g., if we expose "world knowledge" like meshes, etc., the UA should/could provide a button to turn it on / off over the life of the app). I suspect we need to deal with features that "change" over the life of the page on a per-feature basis, or with events (e.g., in the "world knowledge" case, the app would likely get notified when it stops getting this info and starts again).

Also, the geospatial is not a great example of this kind of coming-and-going (i.e., since if there is geolocation and you have access, you will get SOMETHING even if low accuracy, AND that's probably good enough for orientation alignment).

ddorwin commented 6 years ago

Doesn't returning a set of features from supportsSession() enable fingerprinting or device exclusion just as much as if we added those attributes to XRSessionCreationOptions? It actually makes fingerprinting easier since you don't even have to query all the possible combinations. Combined with the permission question, we may want to explore other options.

Separately, I think we will want to specifically look at how deferred session requests (#256) would work. The reasonable options for such requests may be much more limited than what supportsSession() says is supported.

blairmacintyre commented 6 years ago

@ddorwin Thinking about this, and looking at #256, I guess one question is if we're willing to adopt a more asynchronous style of session creation like we did in Argon, inspired by how various desktop systems do window creation.

Specifically,

change requestSession so it's a request that does NOT return a promise, just issues a request and returns a value indicating if the request was "not invalid"
add a command or meta tag to indicate that the page supports WebXR, so the UA knows this
require the programmer set up and event handler that handles both explicit requests from the page, and activations from the UA
have a minimal set of XRSessionCreateOptions available to supportsSession(), perhaps type (AR/VR) and exclusive (control or just "use"). This allows ask "the major question" (i.e., do I have a device that supports AR or VR).
suggest that UA's provide interfaces for letting the user "enter XR", select a device from their devices, approve/deny permissions for capabilities, etc
have the semantics that the page should support the UA changing the session at any time, not just type by also device

I would much prefer to see something like this happen, I just didn’t think folks would go for it.

It makes things much simpler for most pages, in my opinion, (they simply express preferences and don't need to provide a UA for requesting if they don't want) and gives the user much more complete control over things.

If a page requires some specific capability (only AR or VR, computer vision, geospatial, video mixed or pure see-through, ...), they adopt a style where they check the capabilities of any session they are given, and pop up a warning / explanation in the session they are given: this handles all the various cases in this way (user started on bad device, user was already on bad devices and navigated to page, etc).

If a page can't (or doesn't want to) deal with dynamic device or session changes, they can use the exact same approach / dialogs / warnings when they get a new session while already running ("Sorry, you need to reload ...")

speigg commented 6 years ago

@blairmacintyre funny, I was going to suggest something very similar, but I didn’t want to rock the boat too much :)

Here is what I was thinking:

a “session” event, which functions as the only API for getting new XRSession instances, whether or not those sessions are requested by the app or simply the result of user/UA actions (navigation, headset being put on, etc.)
requestSession becomes used only in response to user/UA action with “Enter XR” buttons provided by an app. This may still return a promise, but to encourage apps to handle any sessions that are provided, the sessions should only be made available in a “session” event.

This might look like this in practice:

function checkForXR() {
  navigator.xr.requestDevice().then(device => {
    onXRDevice(device);
  }).catch( err => { ... } )
}

navigator.xr.addEventListener(“devicechange”, checkForXR)

function onXRDevice(device) {
  device.addEventListener(“session”, evt => onXRSession(evt.session) );
  advertiseXRSupport(device)
}

function onXRSession(session) {
  if (session.type === “transparent”) {
    // setup app for AR
  } else if (session.type === “opaque”{
    // setup app for VR
  }
}

function advertiseXRSupport(device) {
  let arSessionOptions = {type: ”transparent”, exclusive: true, outputContext: myOutputContext}
  device.supportsSession(arSessionOptions).then(features => {
    If (features.featureMyAppNeedsForAR) {
      arButton.style.display = “block”
      arButton.addEventListener(“click”, () => {
        device.requestSession(arSessionOptions)
      })
    }
  })
  let vrSessionOptions = {type: ”opaque”, exclusive: true, outputContext: myOutputContext}
  device.supportsSession(vrSessionOptions).then(features => {
    If (features.featureMyAppNeedsForVR) {
      vrVutton.style.display = “block”
      vrButton.addEventListener(“click”, () => {
        device.requestSession(vrSessionOptions)
      })
    }
  })
}

A few isssues:

the “advertise XR and request a session on user action” flow is made a bit verbose (and redundant) by having to check each “type” of session separately. Some apps may also want to start with a non-exclusive session that can be “promoted” to an exclusive session, which further complicates things.
the app now has to do more bookkeeping, including checking each session it receives, determining whether or not it can do anything with it, and (if it’s non-exclusive) figuring out how/where to display XR content within it’s page. This isn’t necessarily a bad thing, as it forces the app to be reactive.
it’s not clear to me if the outputContext can be assigned after a session has been created (which would be necessary for mirroring the display when an exclusive session is provided without being requested, I think?)

leweaver commented 6 years ago

I like where @blairmacintyre and @speigg are heading with moving away from promises for session creation, I am definitely in favor of a single code path for developers to use for navigation/onload, in-page click and other initiation request sources (donning headset, button in the browser frame itself, etc...). In particular for the 'in browser frame button' use case, the thing that makes me worried about relying on a promise based requestSession call in the page load event, is that the developer will need to re-requestPresent whenever presentation ends; otherwise the button will only work once!

I've been toying around with lots of variations of how this flow could work to try and address some of the issues that @speigg identified - in particular the duplication of logic and fork based on session type.

One approach I'd like to put forward is the registration of sessionRequest listeners when a new device is found rather than using events. The listener method takes two parameters: XRSessionFeatures (returned from device.supportsSession) and a callback which receives the session. Callbacks feel like they flow a little better than events, since they can be registered with context.

The idea is that the page would register what kind of sessions it is interested in via device.supportsSession and then device.addSessionRequestListener prior to page load. Then, in response to either

device.requestSession via a user initiated action
Page navigation, presence sensor event, other UA driven motivation to enter VR

the UA will respond by calling an appropriate session listener callback (based on the options given). If no sessionRequestListeners match the required options, no callbacks are fired (perhaps we need an event here?)

I also make the assumption that the outputContext is NOT provided in the sessionOptions, but is set after the Session is created. This is something that we would need to figure out.

A modified version of your example...

function checkForXR() {
  navigator.xr.requestDevice().then(device => {
    onXRDevice(device);
  }).catch( err => { ... } )
}

navigator.xr.addEventListener("devicechange", checkForXR)

function onXRDevice(device) {

    let vrSessionOptions = {type: "opaque", exclusive: true}
    let arSessionOptions = {type: "transparent", exclusive: true}

    // Query for session types that this app supports.
    device.supportsSession(arSessionOptions).then(features => {
        if (features.featureMyAppNeedsForAR) {

            device.addSessionRequestListener(features, session => {
                // App specific function
                configureScene(session, {/* AR app specific configuration */})
            });

            arButton.style.display = "block"
            arButton.addEventListener("click", () => {
                device.requestSession(arSessionOptions)
            })
        }
    );

    device.supportsSession(vrSessionOptions).then(features => {
        if (features.featureMyAppNeedsForVR) {

            device.addSessionRequestListener(features, session => {
                // App specific function
                configureScene(session, {/* VR app specific configuration */})
            });

            vrButton.style.display = "block"
            vrButton.addEventListener("click", () => {
                device.requestSession(vrSessionOptions)
            })
        }
    );
}

function configureScene(session, params) {
    // Optionally set the output context, AFTER session is created.
    session.setOutputContext(outputContext);

    // Set other app specific settings
    // params...

    // requestFrameOfReference, create layer, requestAnimationFrame etc.
}

I'm not entirely happy with how the above sample fits together just yet - but I think it shows the direction I am trying to portray.

jonobr1 commented 6 years ago

In both Chrome and Oculus Browser ( probably in Firefox too, but I haven't tested ) WebVR 1.1+ there are edge cases where a page can be loaded with a display already presenting. I don't have strong opinions about the implementation, but I like @leweaver's direction to support a way to listen when sessions are initiated. In the case I'm referring to above these are for featuring specific pieces of web content through the Daydream Home Screen and Oculus Home Application Thumbnails.

toji commented 6 years ago

Apologies for the epic comment, but there's a lot to cover here. The following is based on a variety of conversations with multiple people over several weeks, but a huge portion of the credit goes to Nell for flying to Mountain View to spend a day discussing this, and Alex for staying up with Nell and I into the wee hours of the morning at SIGGRAPH to refine the concepts further.

That being said, I'm not attempting to represent the below text as their opinions. It's really just my understanding of the conclusions we converged on.

Primary Goals

First and foremost, we want to avoid a scenario where some browser are forced into presenting the user with a sequential series of permission dialogs for common scenarios. And example of this "forced" behavior is if the creation of a session requires a permission prompt, and then calling a method on the session also requires a permission prompt. The UA can't do anything to make the presentation less jarring because the session has to fully resolve before the next permission is even requested.
Ideally there's a way to allow the browser to bundle all the necessary permissions into a single dialog if desired/applicable.
- A desired property of this system would be also allowing for bespoke permission requests at the time they're actually necessary. (ie: Not requesting access to RGB camera data until the user clicks a "take a photo" button.)
- The "perfect" solution would not require two separate methods for handling those cases.

Proposal to get there

It should be reasonable for browsers to condense any permission-invoking API calls made in the course of a single callback into a single dialog. (Whether or not the browser chooses to do so is for the UA to decide.)

Specifically, it would mean that something like this could potentially produce a single dialog with multiple checkboxes

api.requestSensitiveServiceA().then(/*...*/);
api.requestSensitiveServiceB().then(/*...*/);

While something like the following would out of necessity produce sequential permission dialogs

api.requestSensitiveServiceA().then((svc) => {
  api.requestSensitiveServiceB().then(/*...*/);
});

Thus, if everything that potentially requires permissions can be called without blocking to wait for a previous potentially permission-dialog-producing (henceforth "PDP") call we have the opportunity to allow developers and UAs to intelligently control how and when they want to incur permission dialogs.

Given the WebXR APIs current design, the primary hurdle to this appears to be that we generally want to hang PDP calls off of the XRSession object, but the act of acquiring an XRSession itself may produce permission dialogs. If some browsers choose to then ask for permissions on both XRSession creation AND subsequent feature queries we have a situation where some browsers would be forced to display at least two dialogs, like so:

xrDevice.requestSession({ immersive: true }) // May ask for general XR permission.
  .then((session) => { 
    session.requestEnvironmentMesh() // May ask for permission, but can't until session request resolves.
  });

One potential solution to this is to ensure that session creation is very lightweight, requiring minimal options and no permissions to create. Then any PDP calls are handled after the fact. This would include things like AR passthrough, which in reality can be viewed as just another data stream on top of the core tracking tech. Given that we all appear to agree that inline sessions without an AR passthrough should be allowed without permissions or user activation this seems like a tractable idea. (An "inline" sessions is my current term for when the primary output is the in-page element. I'm trying to get away from the term "magic window").

As an example of how I see this working out (making up feature APIs as I go):

let xrSession = await xrDevice.requestSession();

let xrEnvMesher;
let xrEnvLight;

// Required features
Promise.all([
  xrSession.requestARPassthrough(),
  xrSession.requestEnvironmentMeshing(),
]).then((values) => {
  xrEnvMesher = values[1];
  startFrameLoop();
}).catch(() => {
  // Whoops! Something we needed isn't there.
  xrSession.end();
});

// Non-required feature
xrSession.requestEnvironmentLighting().then((envLight) => {
  xrEnvLight = envLight;
}); // No catch, don't care.

I should note that the one "feature" I don't feel fitting cleanly into this architecture is whether or not a session is immersive or inline. This distinction seems "special" since it determines not only where the content is displayed but also may determine what sets of features are accessible to the session itself. (For example, a Pixel phone could use AR passthrough on an inline session via AR core, but not an immersive one because Daydream doesn't support it and the phone's cameras are obscured anyway. Thus it's helpful to distinguish between modes.)

It's tempting to allow the session's immersive state to be mutable, set after the fact with a call similar to xrSession.setImmersive(true). I'd personally shy away from that, though, since that leads to a lot of complications regarding session features that appear or disappear after the mode switch. It also means that some features become order-of-operations dependent, which is weird for situations where we're trying to make things highly asynchronous.

For example, does this work...

Promise.all([
  xrSession.setImmersive(true),
  xrSession.requestSomeImmersiveOnlyFeature(),
]);

...while this fails?

Promise.all([
  xrSession.requestSomeImmersiveOnlyFeature(),
  xrSession.setImmersive(true),
]);

That feels wrong. I'd prefer if possible for features that are requested on a session to be persistent for the duration of the session and have almost no dependence on other features unless said dependence is explicitly baked into the API surface. (That is, something like session.requestFeatureB(featureBObject)).

So this suggests to me that we probably still want to keep the same model as we have now (or at least one that's not drastically different) where the immersive state of a session is something set at creation time. The big change would be to suggest very strongly (because I doubt that we can require it) that creating an immersive session should NOT trigger permission prompts! They should still be gated on user activation, but otherwise should be allowed to be created without seeking further acceptance from the user.

(Note: Nell has already indicated to me that she doesn't feel as strongly as I do about immersive being an immutable attribute, so don't take the above as ground truth.)

Of course, that may work for some browsers, but others may want to do like Edge does currently and display a permission prompt for accessing immersive hardware features at all. This is an understandable stance and we should provide a reasonable mechanism for it. In the case that a browser wants to treat any immersive hardware access as a PDP feature AND wants to only show a single permission prompt when it can, I would say that the immersive hardware access permission should be triggered not on the initial session request but on either the first PDP feature request from that origin OR the first call to requestAnimationFrame from an immersive session. (If the permission is denied any active immersive sessions would be ended and future requests for them would be rejected. Otherwise the first rAF frame is left pending until all necessary permissions are accepted.)

This pattern should keep dialogs aligned with contextually sensible user actions, prevent the browser from being required to show stacks of permissions sequentially, still allow for just-in-time permissions for features that aren't needed right away, doesn't require two different variants of the feature APIs, and lets browsers be as light touch or as aggressive on permissions as desired and still give developers a way to produce predictable behavior across the board.

Keep in mind that the permission is for the entire origin, and not a "I want this specific call to go through". Also, we don't need to have a "one feature, on permission" model. Requesting one feature in code may trigger a permission dialog that subsequently covers multiple other features.

Feature interaction

In discussing the above, I ended up fielding some questions about what the theoretical feature requests above would return. I feel that's worth stubbing out just to make the scenario a bit more realistic.

In some cases, it seems like the API request wouldn't have to return anything, and simply resolving or rejecting would suffice. For example, requesting that AR passthrough be enabled wouldn't have much to return because it simply activates a compositing feature.

xrSession.requestARPassthrough();

(Quick side note on that, BTW: Even though this isn't a serious API proposal I think something along these lines could be workable even on devices like HoloLens/Magic Leap, where it could functionally be a no-op that resolves immediately.)

In other cases, a feature API request could easily just return the desired value. A good example of this might be asking for camera RGB data:

arMediaStream = await xrSession.requestARCameraStream();

In this case there's a single, clear, desired value that is likely to be used immediately, so returning it immediately upon the UA permission policy being satisfied is sensible. You could also theoretically use this JUST to ensure that the correct permissions were acquired by calling the function and ignoring the returned value, which would cause it to garbage collect almost immediately. It's worth considering that doing so may be a semi-heavy operation in some browsers, however.

Finally, it feels like there's multiple APIs where the request should actually return an object that is used to then control the behavior of the feature requested. For example, with environmental meshing:

xrEnvMesher = await xrSession.requestEnvironmentMeshing();
// Some time later...
xrEnvMesher.addEventListener('meshchanged', onMeshChanged);
xrEnvMesher.start();
// Even later...
xrEnvMesher.stop();

In this case the feature is known to be heavyweight and requires some more fine-tuned control and interaction. Thus an object is returned that has all the methods needed and which can be used to actually activate the heavy lifting as needed, while any permissions necessary are taken care of at request time.

Which patterns we use for which features is definitely something that should be evaluated on a case-by-case basis.

Conclusion

I'm not convinced that anything described above is the perfect solution, there's a few unaddressed issues that are adjacent to this one (testing for support for the purpose of showing buttons, for example), and I'll admit the "wait till the first rAF to prompt" pattern feels a bit janky for those browsers that would need it. But I feel like the discussions around this have been very helpful in allowing me to really grok some of the usage patterns and challenges around this particular API. It's difficult to capture all of it without this becoming a novel, but I'm happy to field questions in the meantime!

speigg commented 6 years ago

@toji Your proposal looks very promising! No pun intended :)

xrSession.requestARPassthrough();

Small suggestion: how about something like requestEnvironmentBlending() or requestEnvironmentPassthrough() to be more consistent with the existing environment blending spec. Likewise with requestEnvironmentCameraStream().

I should note that the one "feature" I don't feel fitting cleanly into this architecture is whether or not a session is immersive or inline. This distinction seems "special" since it determines not only where the content is displayed but also may determine what sets of features are accessible to the session itself. (For example, a Pixel phone could use AR passthrough on an inline session via AR core, but not an immersive one because Daydream doesn't support it and the phone's cameras are obscured anyway. Thus it's helpful to distinguish between modes.)

I understand why making 'inline' vs 'immersive' mutable throughout the session may be problematic if the goal is to have a consistent set of features throughout a session's lifetime, however I think there is a lot to gain in embracing the dynamic availability or non-availability of features—applications should already be structuring their rendering code around the given set of XRViews, so if these XRViews were to change dynamically based on 'inline' vs 'immersive' mode, an app should be able to instantly adapt accordingly. Likewise, if features can come and go (based on changing permissions or other reasons), applications should be able to react. IMO, the only reason that an application should end an XRSession and give up, is if the one or two features that it absolutely requires are not supported at all on that platform—not if they simply aren't available right now at the moment they are requested.

In other words, I don't think applications should be relying on requests to change session state in order to determine whether or not they should end their session and give up. Rather, applications can ask the session if certain features are supported (while at the same time asking for permission to use such features)—and then fail only if those features are not supported at all (not just if the user simply denies permission to those features). One reason for this is to allow the UA/user to change the permissions dynamically without disrupting the session. If the UA/user disables a feature that is actually supported and which the application considers to be necessary, then the application should prompt the user to enable that feature.

More so, I think we may want to explictly distinguish between APIs that request a change in session state, vs simply asking for access to certain features. For example, we may want to adopt a pattern such as "use*()" when only requesting access to certain features (and ensuring their support on the current platform):

let xrSession = await xrDevice.requestSession();

// Required features
Promise.all([
  xrSession.useEnvironmentBlending(),  
  xrSession.useEnvironmentMeshing()
]).then((values) => { 
  // If we succeed, then these features are supported,
  // and permission has been requested (not granted)
  startFrameLoop();
}).catch(() => {
  // If we fail, this means the features requested are not supported at all
 // and since this session will never support what we need, we might as well end it
  xrSession.end();
});

onFrame(xrFrame) {
  // If we are here, it means we have a session that *potentially* supports what we need
  // ... but perhaps right now it does not 

  if (xrFrame.environmentBlendMode === 'opaque') {
    // let the user know we don't have what we need, and ask them to enable that feature
    showPromptToEnableEnvironmentBlending() 
  } 

  if (!xrFrame.environmentMesh) {
    showPromptToEnableEnvironmentMeshing()
  }

  if (!xrFrame.immersive) {
    // we may want to render inline (optional)
    renderInline()
  } else {
    renderImmersive()
  }
}

onEnableEnvironmentBlending() {
  xrSession.requestEnvironmentBlending(true) 
  // the UA might tell the user they need to take their phone 
  // out of the enclosure to enable environment blending, 
  // or change the session mode, or whatever
}

onEnableEnvironmentMeshing() {
  xrSession.requestEnvironmentMeshing(true) 
  // Again, UA might ask the user to confirm, 
  // and may change the session mode if necessary
}

With this kind of API, the UA/user is free to enable / disable any features as desired, and to change between 'inline' and 'immersive' modes as desired. If an application requires a certain feature (e.g., environment blending or camera stream), it would prompt the user and attempt to request that feature only if the user indicates that they want to re-enable that feature.

It's tempting to allow the session's immersive state to be mutable, set after the fact with a call similar to xrSession.setImmersive(true). I'd personally shy away from that, though, since that leads to a lot of complications regarding session features that appear or disappear after the mode switch. It also means that some features become order-of-operations dependent, which is weird for situations where we're trying to make things highly asynchronous.

The order-dependency problem here is also alleviated with the approach I have outlined, without requiring 'inline' and 'immersive' to be separate session types. For example:

Promise.all([
  xrSession.useImmersiveRendering(),
  xrSession.useSomeImmersiveOnlyFeature()
]).then(()=>{
  // If we succeed, then these features are supported,
  // and permission has been requested (not granted)
})

onFrame(xrFrame) {
  render()

  if (!xrFrame.immersiveOnlyFeature) {
    showPromptToEnableImmersiveOnlyFeature()
  }

  if (!xrFrame.immersive) {
    // we may want to render inline (optional)
    renderInline()
  } else {
    renderImmersive()
  }
}

onEnableImmersiveOnlyFeature() {
  xrSession.requestImmersiveOnlyFeature() 
  // UA may ask the user for confirmation here
  // Since this is an immersive-only feature, the UA should also inform the user
  // that this feature would require switching to an immersive mode
}

This way, the application doesn't even have to know that a certain feature is "immersive-only". Simply by requesting a certain feature, the UA can enable/disable other features as necessary, while the application simply reacts to whatever is available.

blairmacintyre commented 6 years ago

(your epic comment is going to be very hard to respond to, @toji !, so I'm going to just go for it and make a bunch of replies, so big, some small, I suspect)

One thing you say I want to call out:

Given that we all appear to agree that inline sessions without an AR passthrough should be allowed without permissions or user activation this seems like a tractable idea.

I think you need to define this more. IF the inline session gets any motion sensor data, this is not true. The reason the devicemotion API was never ratified, and that it's being deprecated, is that it can be accessed without permission and serves as a threat. So, if a webxr session gives anything like the device orientation or motion, it will require a permission.

If it is more about getting a control flow for rendering that matches an eventual rendering loop, but doesn't give any device sensor data, than this is probably true. But we should define that, if so.

blairmacintyre commented 6 years ago

Another. I don't know you mean by "AR passthrough being another data stream" in this context, like here:

This would include things like AR passthrough, which in reality can be viewed as just another data stream

A WebXR session needs to know if the device is AR (there is a view of the world, either via composited video or transparency) or VR (there is nothing seen by the user except what the session renders), in order to decide on what to render (e.g., skybox?).

A WebXR session may want to know if it's video passthrough or optical see-through AR, so it can decide on some rendering approaches (since those displays show things differently).

"Requesting AR passthrough" does not give more data, it's just a feature request (as you then highlight further down).

blairmacintyre commented 6 years ago

Another. Decisions being made regarding your implementation are bleeding into examples, and make it confusing and hard to discuss. For example

I should note that the one "feature" I don't feel fitting cleanly into this architecture is whether or not a session is immersive or inline. This distinction seems "special" since it determines not only where the content is displayed but also may determine what sets of features are accessible to the session itself. (For example, a Pixel phone could use AR passthrough on an inline session via AR core, but not an immersive one because Daydream doesn't support it and the phone's cameras are obscured anyway. Thus it's helpful to distinguish between modes.)

This first half of this seems true, but the example is true if-and-only-if you assume "immersive" == "hmd". (This seems to be implied in your comments, but I don't think this has been agreed on. It may be that it is also assume by others, or not).

The problem with this interpretation is that it results in the common path for developers to be that they build "HMD only" content. The current samples in the samples repo, for example, have samples that only run on HMDs (i.e., they request "immersive") for no good reason, aside from that it's easier to create samples that impose this restriction than it is to create samples that are flexible.

An alternative interpretation here:

immersive means "not inline".
A UA can implement it in many ways, but (perhaps) every UA that supports webxr can be required to support something that is "not inline"
In your pixel example, the "immersive AR" session is full screen (not inline) rendering, perhaps even disallowing 2D DOM overlay.
The pixel example could, also, give the USER the option to have "immersive VR" be HMD (daydream, or gear on a Samsung phone) or just use a full-screen handheld monoscopic view (that uses ARCore for 6d motion tracking).

blairmacintyre commented 6 years ago

Regarding whether a session's immersive state is mutable: I think it should be, especially if we think about situations like the "Article" demo Google created. It would be nice to create a session, display it inline, and then be able to toggle to/from immersive mode (akin to toggling video display fullscreen and back while the video is playing). Without creating/destroying sessions (which would cause additional permission prompts)

Regarding your concern:

It's tempting to allow the session's immersive state to be mutable, set after the fact with a call similar to xrSession.setImmersive(true). I'd personally shy away from that, though, since that leads to a lot of complications regarding session features that appear or disappear after the mode switch. It also means that some features become order-of-operations dependent, which is weird for situations where we're trying to make things highly asynchronous.

An alternative view is that "permissions" are "permissions", but don't guarantee data! For example, when I added the ability to switch cameras (world to user facing) to the WebXR Viewer, I ran into the fact that ARKit's world motion tracking only works with the forward facing camera. So when the user switched to the user facing camera, those ARKit anchors break. And, similarly, face tracking only works (on ARKit) with the iPhoneX on the user facing camera.

If I give permission for "world meshes", and the user switches to the user facing camera, that permission doesn't become invalid, even though the mesh is no longer delivered and the anchors all are destroyed.

And if we had a feature request for "face tracking" in your list above, it wouldn't make sense for it to fail up front if the initial state wasn't "user facing camera on iPhoneX" ... it might fail on a different phone, but on the iPhoneX, it might succeed (yes, you can track faces) but wouldn't give any data to the app unless the user-facing camera was in use (assuming an app that wants to let the user toggle cameras, instead of forcing one).

So, in your examples, the user would be asked for permission if some configuration of the display (immersive or not, front or rear camera) supported it, but the feature wouldn't "run" (or deliver data) unless it was in the right situation.

This might require each feature to have an "active" flag, or provide some way of communicating to the programmer what is required for it to work, so the programmer could make choices or otherwise inform the user ("To put dog ears on your face, you must switch to the selfie camera.")

blairmacintyre commented 6 years ago

When this gets written up, can you find another example besides this one:

In other cases, a feature API request could easily just return the desired value. A good example of this might be asking for camera RGB data:

arMediaStream = await xrSession.requestARCameraStream();

My current understanding is that the WebXR API will not provide access to the camera stream. I say this because I explicitly requested this be considered at the face-to-face meeting, and received considerable pushback from some folks there (i.e., with the suggestion to look at leveraging other web APIs).

I personally am open to the idea of providing camera data, but it is a major undertaking to do it right, and there has been zero discussion of it for quite some time. So, I think we should either engage more fully with it, or stop using it as an example.

blairmacintyre commented 6 years ago

@speigg wrote:

Small suggestion: how about something like requestEnvironmentBlending() or requestEnvironmentPassthrough() to be more consistent with the existing environment blending spec.

Actually, please no! Programmers should request AR, but discover what form of AR the display supports. Otherwise, programmers are encouraged to write code that only works on one sort of displays.

I'm not jazzed by the name arPassthrough but it's ok if it is not assumed to be "videoPassthrough" ... I'd suggest arOverlay but there are companies that want to distinguish between "overlay" and "merging" content with the view of the world. Here we want to request "a view of the world is shown integrated with my content" with the implication being "I don't want to render a background". Not sure what the terminology that will best represent that, but passthrough isn't terrible.

blairmacintyre commented 6 years ago

@speigg wrote:

In other words, I don't think applications should be relying on requests to change session state in order to determine whether or not they should end their session and give up. Rather, applications can ask the session if certain features are supported (while at the same time asking for permission to use such features)—and then fail only if those features are not supported at all (not just if the user simply denies permission to those features). One reason for this is to allow the UA/user to change the permissions dynamically without disrupting the session. If the UA/user disables a feature that is actually supported and which the application considers to be necessary, then the application should prompt the user to enable that feature.

I would agree we need the ability for the UA to enable/disable features after they give permission, but I am not sure breaking the permissions apart is a good idea, nor do I think telling the application when those permissions are enable/disabled (explicitly) is a good idea.

First, I would wonder about the fingerprinting implications of telling the app the full capabilities of the display, independent of permissions. Should the app know that "meshing is possible" but "the user denied access" vs just knowing it can't use meshing? I would favor the app not being able to distinguish between the two cases, both because it encourages app writers to deal with it, and because it dissuades them from coercion ("I know your device supports it, so even though I have the ability to offer you a downgraded experience, I'm not going to, I'm going to require the permission.").

We played with this in Argon4, where the location permission was essentially a toggle; if the app asked for geolocation, the user was presented with a permission prompt, but from then on out, they could toggle location on/off. If the user toggled it off, the app would just stop receiving position updates.

In the case of meshing, for example, the flow I would want is:

app asks for meshing feature permission
if the feature doesn't exist, or the user says no, the feature request fails
if the user says yes, the request succeeds, perhaps with the object returned to control it (as @toji suggests).
the UA can provide a way for the user to turn the permission off at any point. The app stops receiving mesh updates when the feature is turned off. The user can turn this feature back on at any time.

There are some tricky questions for any feature like this, though, that might dictate if it can be toggled. Specifically, can we provide the right guarantees? It's easy to see that the UA can stop sending mesh updates, but internally, the world understanding WILL be updating, and the mesh representations refined. If we start sending meshes again, will we be able to filter out the parts of the mesh that were "learned" when it was off.

Consider plans in ARKit/ARCore. When I walk through my house, the floor plane is extended, and separate smaller planes are often merged. If I turn off meshing when I go into one room, and ARKit decides to extend the floor plane into that room (along with it's geometry outline), it will be VERY hard (impossible?) for the browser to remove the additional knowledge from the ARKit data if I turn meshing back on when I leave the "sensitive room", thus leaking data.

In contrast, image or face detection and tracking, where the data being sent does not have long term history implications, might be good candidates for toggling access. The application would not be able to distinguish between "there are no faces" and "the user disabled detection", which is exactly what we want.

speigg commented 6 years ago

@blairmacintyre wrote:

An alternative view is that "permissions" are "permissions", but don't guarantee data!

Yes, that’s a nicer way of saying what I was trying to explain above. The key is, as you stated, that the feature request does not fail just because permissions aren’t granted. Likewise, the xrSession.use*() API pattern I was suggesting above is effectively both a feature detection and a permission request. As described, there is no way for the application to know whether or not it has permissions, only whether or not certain features are active. Explicitly checking permissions (if there is a use-case that needs it) may require a separate API.

blairmacintyre commented 6 years ago

Final comment, @toji. I like the direction of this. Modulo my continual pushback on your implied (or explicit) definition of "immersive" :smile:, I think this direction is a good one. I really like that it opens the door to us encouraging developers to write flexible apps.

I would like us to consider making demos and samples that are a bit more flexible, though.

For example, I would hope we can dramatically limit our use of the "required features" pattern, and instead set up these samples to react to what is available.

For example, instead of this:

let xrSession = await xrDevice.requestSession();

let xrEnvMesher;
let xrEnvLight;

// Required features
Promise.all([
  xrSession.requestARPassthrough(),
  xrSession.requestEnvironmentMeshing(),
]).then((values) => {
  xrEnvMesher = values[1];
  startFrameLoop();
}).catch(() => {
  // Whoops! Something we needed isn't there.
  xrSession.end();
});

// Non-required feature
xrSession.requestEnvironmentLighting().then((envLight) => {
  xrEnvLight = envLight;
}); // No catch, don't care.

I would prefer us to have our samples to do something like

let xrSession = await xrDevice.requestSession();

let xrEnvMesher;
let xrEnvLight;

// strongly preferred feature flags
let xrSky = true;      
let xrMeshControl = null;

xrSession.requestARPassthrough().then((blendmode) => {
  xrSky = false;   // don't draw skybox
}); // No catch, flag set.

xrSession.requestEnvironmentMeshing().then((xrMeshControl) => {
  xrMeshing = xrMeshControl;  // we have a mesh object, will use it
}); // No catch, flag set.

// Less important, non-required feature
xrSession.requestEnvironmentLighting().then((envLight) => {
  xrEnvLight = envLight;
}); // No catch, don't care.

speigg commented 6 years ago

@blairmacintyre wrote:

Actually, please no! Programmers should request AR, but discover what form of AR the display supports. Otherwise, programmers are encouraged to write code that only works on one sort of displays.

Did you mean “should not request AR”? Assuming so, I think it depends on what the semantics of this “request” are. On a XR device that supports both environment passthrough/blending (AR) and VR, I think it’s fine if the app (1) asks if this feature is available and asks for any necessary permissions (if any), and (2) under the right circumstances (in response to user input), asks the UA to toggle the environment passthrough (if possible). But certainly, in any case, I agree that the app should react to the current environment passthrough / blend mode.

blairmacintyre commented 6 years ago

@speigg I think I misunderstood you:

Did you mean “should not request AR”? Assuming so, I think it depends on what the semantics of this “request” are. On a XR device that supports both environment passthrough/blending (AR) and VR, I think it’s fine if the app (1) asks if this feature is available and asks for any necessary permissions (if any), and (2) under the right circumstances (in response to user input), asks the UA to toggle the environment passthrough (if possible). But certainly, in any case, I agree that the app should react to the current environment passthrough / blend mode.

I was interpretting the two modes as variations of AR (optical see through vs camera overlay). Do you mean them as "AR" vs "VR"?

speigg commented 6 years ago

@blairmacintyre

In the case of meshing, for example, the flow I would want is:

app asks for meshing feature permission if the feature doesn't exist, or the user says no, the feature request fails

if the user says yes, the request succeeds, perhaps with the object returned to control it (as @toji suggests).

the UA can provide a way for the user to turn the permission off at any point. The app stops receiving mesh updates when the feature is turned off. The user can turn this feature back on at any time.

So, the only problem with this flow is that the application does not benefit if the user grants permission after originally denying it. This is the flow I’m suggesting:

app says it wants to use a feature. This may trigger a permission request. If and only if the feature does not exist, the returned promise rejects.
if and only if the feature exists, the promise is resolved successfully, but there is no guarantee that the app has received permission or that the data or state associated with that feature will be immediately available (if ever).
UA lets the user control permissions. If the user grants a permission after originally denying it (perhaps they now see the value in granting the app access to that feature), the app can immediately start using that feature (starts getting any data associated with that feature).

speigg commented 6 years ago

@blairmacintyre wrote:

I was interpretting the two modes as variations of AR (optical see through vs camera overlay). Do you mean them as "AR" vs "VR"?

Not sure what you mean. I was just suggesting a different name for the requestARPassthrough API that @toji suggested. I think this concept maps to the existing environmentBlendMode spec, where an “opaque” blend mode is effectively VR (and anything else is AR). Thus, if an app wishes to explicitly toggle between an AR mode and a VR mode, they might call xrSession.requestEnvironmentBlending(true/false).

speigg commented 6 years ago

@blairmacintyre in your example code:

xrSession.requestARPassthrough().then((blendmode) => {
xrSky = false;   // don't draw skybox
}); // No catch, flag set.

This probably isn’t the right place to be “reacting” to session state such as the blend mode. The blend mode rather should be checked every frame (XRFrame.environmentBlendMode). For VR, the environment blend mode is “opaque”. In other words:

onFrame(frame) {
  if (frame.environmentBlendMode === “opaque”) {
    drawSkyBox = true
  } 
  ...
}

As a general rule, all session state that an app cares about should be checked for changes on every frame.

blairmacintyre commented 6 years ago

@speigg

So, the only problem with this flow is that the application does not benefit if the user grants permission after originally denying it.

Yes, that was intentional. If the user denies, it's done for that session. I went this way because I don't think this step in your flow is a good idea

app says it wants to use a feature. This may trigger a permission request. If and only if the feature does not exist, the returned promise rejects.

Specifically, this allows the app to detect all possible features of the hardware, independent of whether the user allows it. Based on the "fingerprinting by iterating through device and capability lists" I don't think we will be able to allow this, practically.

I could be wrong, but that's my assumption anyway.

blairmacintyre commented 6 years ago

This probably isn’t the right place to be “reacting” to session state such as the blend mode. The blend mode rather should be checked every frame (XRFrame.environmentBlendMode)—-and as a general rule, all session state should be check for changes on every frame.

In general, I would agree with you, except I have had no luck in pushing for a more reactive architecture here. For the value of frame.environmentBlendMode to change after the session has been set up, without the application knowing about the change, the UA would need to provide an interface to allow the user to make the change.

The WebXR API session flow and so forth does not support this. There are no provisions for notification when something changed, and all of the sample flows being proposed are set up assuming all changes to the session are trigger by the web app itself. Since the web app is trigger all changes, it would be able to update flags like my xrSky flag when the change request resolves.

This is unfortunate, but because of it, there is no need to complicate the render loop with checking for such things.

speigg commented 6 years ago

@blairmacintyre You’re right, I thought the environmentBlendMode was already specced as a property of XRFrame, but rather it’s on XRSession. Though I do think it best belongs on XRFrame (I suppose I should open a separate issue for that). At least in this one case, I don’t think there are any drawbacks to pushing for a more reactive architecture.

toji commented 6 years ago

This issue feels like it's gotten pretty unwieldy, and the conversation and current thinking has moved back and forth enough times that it's hard to follow while reading this. I'm inclined to close it down in favor of some more granular PRs/issues.

419 was just merged, which reconfigures session creation a bit, and I just added #423 and #424 to cover some of the other topics. Let's continue the conversation there.