immersive-web / webxr

Repository for the WebXR Device API Specification.
https://immersive-web.github.io/webxr/
Other
2.99k stars 385 forks source link

session options: immersive? #388

Closed blairmacintyre closed 6 years ago

blairmacintyre commented 6 years ago

There are a few different issues that are touching on session options, but I have a basic questions I'd like to discuss directly.

What does immersive mean? In the current doc, I find A session is considered to be an immersive session if it’s output is displayed to the user in a way that makes the user feel the content is present in the same space with them, shown at the proper scale. Sessions are considered non-immersive (sometimes referred to as inline) if their output is displayed as an element in an HTML document.

But this isn't satisfactory. What does "same space" mean? What is "proper scale" and why does it have anything to do with the kind of session you are getting? And why is non-immersive "sometimes referred to as inline"?

I'm initially interested in this from the viewpoint of handheld AR: is fullscreen AR (of the sort we see on handhelds using ARKit and ARCore) immersive? Why or why not? Is it "non-immersive" only when showed "inline" (e.g., not full screen)?

My initial expectation is that full-screen video-mixed handheld AR is immersive, but I'd like clarification. Is there a difference between full-screen video-mixed handheld AR and AR or VR on HMDs, aside from mono-vs-stereo? Clearly there are different capabilities that devs will want to know about (touchscreen vs not, for example).

speigg commented 6 years ago

You’re right, the way the term “immersive” is defined and used here seems a bit fuzzy and/or contradictory. Traditionally, “immersive” might be synonymous with endocentric display (graphics that are displayed as if seen from the user’s physical eyes), which seems to be the definition we are using here... except that definition doesn’t work for the typical handheld AR usecase.

It seems like the closest concept would actually be something like “fullscreen”. If so, it might make sense to define an immersive session as a session which attempts to maximize the use of the available pixels on a given display (by controlling where the layer is rendered and how it is composited, and perhaps hiding any other non-immersive content if necessary). Or perhaps we just rename it to a “fullscreen” session?

Either way, I don’t think “same space” or “scale” should be a factor here, because (1) ensuring that the scene is truly in the “same space” on a video see-through handheld display would require a large FOV back-facing camera and the ability to track the user’s face, and (2) why shouldn’t the UA allow the user to zoom in/out (effectively scaling the space / projection matrix), if they desire to do so?

toji commented 6 years ago

Agreed upon re-reading the text that this is an insufficient and confusing definition. Let's be real here: "immersive" effectively means "in a headset", but I'm not sure how to state that in such a way that it wouldn't inadvertently exclude things like CAVE systems or similar not-quite-headset-based tech.

One term that I've latched onto recently that may prove instructive is "inline", which is effectively the opposite of an immersive session: One that is displayed as an element on a browser page.

Matt-Greenslade commented 6 years ago

Maybe 'inline' means the content is embedded in a viewer or app that is 2D and not mapped on top of the real environment and 'immersive' means the content is expanded or launched into a 3D mapping overlaying the real environment. It could be inline with the potential to be immersive on a click of course.

DRx3D commented 6 years ago

... "immersive" effectively means "in a headset",

... "inline" which is effectively the opposite of an immersive session

I am not sure any of these terms ("immersive" or "inline") really capture the right concepts.

I think most people would agree that the physical world is immersive (if you disagree please state so and why), just not computer generated. Now if I step backwards through a doorway, things are still immersive. The previous view is now constrained by the door opening; however, there are new elements in my immersive space. Perhaps even a TV with playing some game. I can choose to focus on the TV, but the other elements don't go away.

Certainly this could be reproduced (without touch or smell) by a headset. But what if I were wearing AR glasses (e.g., Hololens). I would experience everything as before. but there can be other information around the visual field that is computer generated (e.g., clock, current score, etc.). If you say that this is immersive in the sense that Brandon describes), then how does this differ from a non-full screen page with 3D content where the stuff around the 3D space is non-3D, but coordinated with the content.

It can't be content necessary to see in stereo because you need to support people with disabilities. Also, caves don't need to be stereo.

Perhaps that concept that is desired is that it makes the user feel like he/she is part of the environment. There is still no touch or smell; but those can be V2+. Unfortunately, that concept is more art and the expertise of the content developer than the technician.

I am not saying that the underlying concepts are incorrect, just a better means for expressing them is needed. At this point I am not sure what really distinguishes immersive from inline.

-- Leonard Daly 3D Systems & Cloud Consultant LA ACM SIGGRAPH Past Chair President, Daly Realism - /Creating the Future/

blairmacintyre commented 6 years ago

@toji if immersive means in a head mount, than it should say that and we shouldn't be using a vague term like "immersive". And this raises the very real question: why are we encouraging people to build things that ONLY will work in headsets? Isn't the point of the web to have the content work in various ways? Why would we want things to only work on a handheld vs only work on a headset?

This feels like exactly the kind of path that things like media queries were created to avoid, in the desktop-vs-handheld world. We don't want people saying "this page is for handhelds" and "this page is for desktops". They should react. Mice+keyboard is different than touch, but sites have learned to "deal" (with the help of frameworks).

Similarly, headset (head-coupled stereo + controllers) is different than full-screen handheld (flatscreen + touch) is different, but it's pretty trivial to see how frameworks could let developers adapt to these two common scenarios.

My contention is that a site should work on a headset or on a phone with something like ARKit/ARCore.

"Inline" is fine; except I don't see any of the things we are building supporting "inline". It implies that the content is in a page, and (for example) that page is a normal page, could be scrollable, etc.

Full-screen handheld AR is not "inline" Full screen handheld AR

blairmacintyre commented 6 years ago

@DrX3D I don't think we should get hung up on this sort of pondering of what "immersive" means w.r.t. the real world. (sorry, people have been talking about "immersive" and "presence" and so on for decades, I don't see this as useful when we're talking at this level.)

I was expecting that "immersive" in the API to refer to situations where the device's display was consumed by the visual experience. So, headsets are immersive, and fullscreen graphics on a 2D display is immersive. The implication, to me, is that this is more than just "div covering screen" as the platform may implement it such that it uses prediction, higher framerates, etc., to give a better experience.

I was expecting "inline" referred to a page rendering the content in a div/canvas, and that it would be displayed in the page as the author saw fit. It could get at the tracking info, for example, but would not benefit from the platform rendering capabilities. We've seen "inline" in many WebVR demos, especially on mobile, where the polyfill used the orientation APIs to match some 3D to phone motion.

toji commented 6 years ago

There's nothing about the API shape that was intended to encourage only one type of content or another. We do, however, need a method for allowing developers to choose where and how their content is displayed. At the moment the immersive flag is the mechanism to support that choice. It's pretty trivial to write content that works well in both immersive and inline modes, since the configuration of the session is a pretty minor part of the overall code and we've done a good job at providing reasonable abstractions to the rendering and input mechanisms.

Also, I find there to be exactly zero difference between "fullscreen AR" and AR content shown in an element that's not big enough to cover the entire screen. Mechanically they are exactly the same, with the only difference being the size the developer chooses for the element (or maybe use of the fullscreen API). I consider them "inline" because in both cases they are presented as elements within the page. If the developer has chosen to make the AR content the only element in the page then so be it.

A final point I want to make is that simply saying "immersive means content is displayed in a headset" is not accurate, because you may be displaying inline content in a browser that's displayed in the headset, and that wouldn't be the same thing.

NellWaliczek commented 6 years ago

Yeah, this is definitely not a straightforward naming issue partially because it's the concepts themselves that are slightly unclear. Heck, originally "immersive" was "exclusive" but that wasn't the right mental model either.
The key differentiator for me is around the interaction model. Are you putting 2D buttons on a screen (or something that is vaguely screenlike) or are your UI elements 3D objects in your scene? Developers need to be able to make that distinction so they build appropriate interaction models.

DRx3D commented 6 years ago

... putting 2D buttons on a screen (or something that is vaguely screenlike) or are your UI elements 3D objects in your scene?

So if I have an AR lens system that shows 3D models, then it is immersive. What happens if I add 2D icons and sensors to the display? Does that make it non-immersive?

I am asking these questions because readers and future users of the document will probably not have the same level of experience and understanding as this group. Using terms that are not clear (at least in today's language) will cause problems in the future, especially as the model is extended to environments that we cannot even yet conceive.

-- Leonard Daly 3D Systems & Cloud Consultant LA ACM SIGGRAPH Past Chair President, Daly Realism - /Creating the Future/

blairmacintyre commented 6 years ago

@NellWaliczek I agree. I see this as something the web page REACTS to, not something they request. "Oh, the session I've gotten doesn't have a 2D display, so I need to put buttons in 3D" or "Oh, the session I've gotten has a 2D display, I can do buttons on the screen and/or buttons in 3D".

@toji

There's nothing about the API shape that was intended to encourage only one type of content or another. We do, however, need a method for allowing developers to choose where and how their content is displayed. At the moment the immersive flag is the mechanism to support that choice. It's pretty trivial to write content that works well in both immersive and inline modes, since the configuration of the session is a pretty minor part of the overall code and we've done a good job at providing reasonable abstractions to the rendering and input mechanisms.

I think we may be talking past each other. Developers need to know what the capabilities of their device is (as @NellWaliczek so clearly points out) so they know what to do.

But that's not choosing the display mode the ask for in session options: if "immersive" means "headset" and "immersive" is a flag I can choose to require of my session, then haven't I limited my app to immersive?

Also, I find there to be exactly zero difference between "fullscreen AR" and AR content shown in an element that's not big enough to cover the entire screen. Mechanically they are exactly the same, with the only difference being the size the developer chooses for the element (or maybe use of the fullscreen API). I consider them "inline" because in both cases they are presented as elements within the page. If the developer has chosen to make the AR content the only element in the page then so be it.

I think this may depend on how these things are implemented, which is probably biasing both of our interpretation of this. I expect there to be a difference

Now, practically, in a browser on a phone, it may be that the difference is minor if the user was going to full-screen their inline DOM element.

But, if you consider the 2D-page-in-a-browser situation (e.g., Edge pages on the wall in Hololens or Windows MR; browser pages floating in space in Firefox Reality or the Oculus Browser), then the "inline" case still has the canvas displayed in the 2D panel, while the "immersive" case switches to 3D immersive mode.

To me, from a programmer perspective, there are still just 2 modes: "inline" which says "render the 3D into a canvas and display in my 2D page" and "immersive" which says "render in 3D on the display, in whatever way that means on this display".

There needs to be some way of understanding what the characteristics of the session are, but I think those should be features of the session that has gotten created, not creation options.

Here's another reason why that matters. I'm on a nice modern smart phone, that could support Daydream/GearVR and ARCore. I would love to have a web browser that, when a page I'm on says it wants to do an immersive "VR" session gives me the choice to do it in the HMD, or to do 2D "VR" (i.e., fullscreen 3D graphics, not stereo, perhaps using ARCore for full 6D motion tracking). From the viewpoint of the webpage, they will get a session back, and it will have 1 or 2 cameras, and it will support a 2D touch screen or not.

That should be up to the user (hey, maybe they have a daydream or not; maybe they have one but don't want to use it right now, etc).

blairmacintyre commented 6 years ago

@DrX3D

So if I have an AR lens system that shows 3D models, then it is immersive. What happens if I add 2D icons and sensors to the display? Does that make it non-immersive?

I'm not sure what "AR lens system" means; HMD?

Regardless, I don't see the inclusion of 2D icons and sessions changing if between immersive or not, but I may not be understanding what you are asking.

I am asking these questions because readers and future users of the document will probably not have the same level of experience and understanding as this group. Using terms that are not clear (at least in today's language) will cause problems in the future, especially as the model is extended to environments that we cannot even yet conceive.

I agree; that's why I brought this topic up, because the language in the document is unclear.

speigg commented 6 years ago

Agreed upon re-reading the text that this is an insufficient and confusing definition. Let's be real here: "immersive" effectively means "in a headset", but I'm not sure how to state that in such a way that it wouldn't inadvertently exclude things like CAVE systems or similar not-quite-headset-based tech.

@toji I believe the terms "endocentric" (views that are in alignment with the user's natural viewing frustum, such as HMDs, CAVE, etc) vs "exocentric" (opaque handheld and/or stationary displays presenting content with viewing frustums that are external to the user) can be used to describe what you are saying here.

Nevertheless, my understanding is that 'inline' sessions and 'immersive' sessions may have different capabilities and affordances... i.e, an 'inline' session may lack support for specific features (6DOF tracking, video-see-through, object tracking, anchors, etc.), and presentation of XR content is entirely under application control. On the other hand, an 'immersive' session would support these kinds of additional features, and the presentation of XR content would be more under user/UA control (rather than application control). To me, this suggests that there is a need for defining both of these session types and interaction modes on handheld (non-stereo) XR devices.

speigg commented 6 years ago

There needs to be some way of understanding what the characteristics of the session are, but I think those should be features of the session that has gotten created, not creation options.

@blairmacintyre Yes, there is that too. One approach is that all XR apps implicitly support an 'immersive' mode (for XR-first browsers, this means XR apps would probably launch in an 'immersive' mode). More so, it would be nice if the user could seamlessly switch between 'inline' and 'immersive' interaction/presentation modes.

Perhaps an app can simply request an "AR" vs "VR" session, and then (after receiving a session) can check how the layer will be presented — whether or not it should be presented 'inline' by the application, or as part of an 'immersive' interface managed by the UA. But I think it should be “easier” for an app to support an ‘immersive’ mode than a ‘inline’ mode (less setup, no need to muck around with the DOM, etc)

blairmacintyre commented 6 years ago

Something that came up when I was trying to explain my position on this to @TrevorFSmith, which might also have to do with why @toji doesn't necessarily buy into what I'm saying.

Let's ignore the word "immersive". When I look at the API, I see two "cases":

In the later case, this may be perceptually immersive headworn display AR/VR; or it may be magicwindow style AR/VR. While both of these demand quite different UI's, etc., my hope/expectation is that a programmer would not explicitly have to request headworn or magicwindow; rather they would say I want to take over rendering (somehow).

The would clearly need to be able to understand the context of the rendering session that results from this (touchscreen or not, interaction device capabilities, stereo or mono, etc) in order to create the best UI.

They might even decide "screw this, I'm not going support handheld phone AR/VR, so I'm going to pop up a bit of graphics saying 'sorry, only hmd's need apply, hit "ok" to exit session'".

But, I would prefer them to react to a situation they don't want to deal with (potentially by testing for the capabilities they need) then have to request (in turn) the sorts of situations they might handle, until they get one that's ok.

So, in this context, I think of "immersive" as the second situation, and "inline" as the first. Perhaps we should have stuck with "exclusive." :smile:

The discussion on the call this week makes my envisioned situation easier: if we request a session with few or no parameters, and then test for capabilities, the approach happens naturally:

TrevorFSmith commented 6 years ago

I think we're combining two dimensions of choice and it's leading to terminology confusion:

Here's how I break it down:

Display mode

Content type

Web coders need to understand what display mode they's using and what content types they should create and those are somewhat separate concerns.

TrevorFSmith commented 6 years ago

Display modes

Three display modes

Control types

Three control types

Overlay and spatial controls in portal mode

Portal display mode

blairmacintyre commented 6 years ago

From the viewpoint of developers, especially when they consider what kind of UI to create, those differentiations are good. Clearly, developers need to understand the context.

But, the tension is see if differentiating between what the developer can/should/needs-to request, and what they discover about the device.

I see no reason for a developer to differentiate their webXR request between portal and immersive: it's unlikely that any individual display supports both, and the differences they encounter in creating UIs are something that they can deal with (or not). They need to know what situation they have. They may choose to say "I'm sorry dear user, I have not created a portal UI, only an HMD/controller one, so I'm not going to show you anything", but I think it's MUCH more likely that all the frameworks that evolve will make it easy to at least provide a trivial UI for all of these situations (even if it sucks).

In contrast, flat/page vs immersive/portal is definitely something they would want to request. On a phone, that's the difference between full-screen graphics and a possibly scrollable page (or a non-scrollable UI with the 3D scene inset in a part of the screen). On an HMD, that could differentiate between a 2D page placed in the 3D world (with 3D content embedded in it), and the full-world rendering we associate with immersive viewing.

The only tension I see in terms of session options is the idea of having the "immersive flag" mean your "immersive display" mode. Perhaps we should have left the flag as "exclusive". Right now, we have no way of differentiating between "flat/page" and "portal/magic window", if "immersive == immersive" 😄

blairmacintyre commented 6 years ago

Here's the analogy I use. Right now, web pages do NOT simply fail on a device that hasn't been accounted for.

I might go to a website on my phone that has no custom phone UI, and it looks terrible and might not work well. But I can zoom, or painfully get the information I want. This is increasingly rare, though, as more and more frameworks and tools provide something. My Georgia Tech's lab website has a mobile version built into the Wordpress style I chose, which I didn't even know until I accidentally went to it.

We need the same for WebXR. By default, sites should work everywhere, where "work" means not having something say "this device doesn't support webxr" when it does.

The solution we are close to is:

Even if the content is nonsensical: frameworks will likely evolve to communicate what's missing, and perhaps even provide reasonable fallbacks.

TrevorFSmith commented 6 years ago

Blair wrote:

I see no reason for a developer to differentiate their webXR request between portal and immersive

The most persuasive argument I heard from you yesterday (and please tell me if I'm misrepresenting) is that on the existing web when you use a handheld browser to visit a site that is not responsive and is designed for a desktop sized screen, the handheld browser does its best to render the site, using pinch zoom and other tricks. So, why shouldn't a handheld browser that offers only flat and portal display modes do the same thing and offer the user the best possible experience for sites that are only designed for immersive display mode.

After sleeping on it, I think that the problem is I don't believe that it's possible for browsers to automatically offer reasonable experiences across portal and immersive display modes. Because we're using the black box of a WebGL context, there's not enough information for a browser in portal mode to give the user a usable display and interaction into an experience that is designed only for immersive mode. Especially with new input methods that only work in immersive display mode (e.g. hand gestures and eye tracking come online) there is no equivalent to pinch zooming that can give the user reasonable access.

speigg commented 6 years ago

Also, the only situation in which an outputContext should really be necessary is the “inline” display mode... so why make an app provide this if it wants to support the “portal” display mode? I’d imagine that a lot of apps won’t care at all about the scaled down “inline” use-case (what Trevor calls “flat” display mode), and would likely not bother with it at all... which seems fine to me, as long as the simplest path allows for both “portal” and “immersive” display (as Trevor defines them).

blairmacintyre commented 6 years ago

After sleeping on it, I think that the problem is I don't believe that it's possible for browsers to automatically offer reasonable experiences across portal and immersive display modes. Because we're using the black box of a WebGL context, there's not enough information for a browser in portal mode to give the user a usable display and interaction into an experience that is designed only for immersive mode. Especially with new input methods that only work in immersive display mode (e.g. hand gestures and eye tracking come online) there is no equivalent to pinch zooming that can give the user reasonable access.

Perhaps, perhaps not.

Will all UIs work on all devices? No. Especially not for custom UI and interaction techniques. Do all touch-enabled websites work on the desktop (where "pinch zoom" isn't supported)? Of course not. Do modern web frameworks support developers building applications that work across different devices, by providing alternatives and making it straightforward to build functional (even if not awesome) fallback UIs? of course they do.

Is it easy to imagine different UAs evolving compatibility hooks and interactions to simulate common interaction modes they don't support? Sure. More importantly, I think there are a few basic interactions (selection, for example) that are easy to implement on both. On Hololens, Microsoft opted for a few standard gestures (bloom, airclick) that will be easy to implement everywhere, even if they don't take advantage of the device completely.

I firmly believe that a large proportion of apps (90%?) will be more than capable of running on a range of devices, with a relatively simple framework providing adaptive UIs.

Will this be possible if we encourage patterns where developers create content that refuses to run in modalities they don't support? No. Especially if the "easy" case for developers is "pick your mode and only support that", people will do that. And then there will be no pressure on frame developers to support reacting to different modes. On the other hand, if the easy case is "dedicated takeover" or "in page", and the dedicated runs on both handheld and headmount, we'll quickly have at least basic support for common UI metaphors on both.

My issue is with the API surface and the patterns it encourages, or even enforces, is that we're making an up-front decision to enter an undesirable long term situation where the majority of content works in on either hmd's or handhelds, but not both.

blairmacintyre commented 6 years ago

At the end of the day, I'd much rather have a crappy phone UI, where all menus and interactors are "in 3D in the world" and I do pointing and selection by tapping on the screen (instead of waving a wand or pointing a finger), than not be able to use the "AR app designed for an HMD" at all on my phone.

It's pretty easy to imagine a phone-based UA pretending to have a wand floating a foot in front of the screen, pointing forward and slightly up (or even being "movable") so that it can work with wand-based UIs, if this becomes a problem.

speigg commented 6 years ago

@TrevorFSmith wrote:

The most persuasive argument I heard from you yesterday (and please tell me if I'm misrepresenting) is that on the existing web when you use a handheld browser to visit a site that is not responsive and is designed for a desktop sized screen, the handheld browser does its best to render the site, using pinch zoom and other tricks. So, why shouldn't a handheld browser that offers only flat and portal display modes do the same thing and offer the user the best possible experience for sites that are only designed for immersive display mode. [...] there's not enough information for a browser in portal mode to give the user a usable display and interaction into an experience that is designed only for immersive mode.

I don’t imagine “immersive” mode on a handheld device being a way that the browser deals with apps designed for HMDs. I think Blair was just using the term “immersive” inclusively (to include what you call “portal” display mode). Apps would still have to adapt their UI for either handheld or HMD display modes accordingly (using best practices for each). The issue, IMO, is not whether the UA can force an app designed for HMDs to work on a non-HMD device, but whether an app developer is forced (encouraged) to design their app to work for both HMD and non-HMD XR devices.

TrevorFSmith commented 6 years ago

Gheric wrote:

I think Blair was just using the term “immersive” inclusively (to include what you call “portal” display mode).

Yes, I think it's not helpful to use "immersive" to include handheld displays when as far as I can tell nobody else is using it that way. As far as I'm concerned, there's no such thing as "immersive" on a handheld device because "immersive" refers to a display type and not a content type. The user is literally not "immersed" in the display, they are holding it at arm's length.

I suspect that the confusion is that you and Blair are using "immersive" to mean spatial content that is located around the user, so you two are talking past others in the group.

speigg commented 6 years ago

@TrevorFSmith one reason for using the term ‘immersive’ inclusively is so that all apps can use the “simple” path to supporting XR on different displays (not needing to provide an outputContext). As Blair said, perhaps the session creation parameter here should have remained “exclusive”.

TrevorFSmith commented 6 years ago

Blair wrote:

At the end of the day, I'd much rather have a crappy phone UI, where all menus and interactors are "in 3D in the world" and I do pointing and selection by tapping on the screen (instead of waving a wand or pointing a finger), than not be able to use the "AR app designed for an HMD" at all on my phone.

If laser pointing were the only input types we could expect to use in immersive displays then I might agree with that path. Since we already have input devices and gestures with more complexity, and we already have immersive locomotion and display tricks that simply won't work in portal display mode, it doesn't seem possible for UAs to actually ship what you're suggesting.

It's better to admit that portal and immersive display modes are inherently different and creators must address them separately in order to be in any way usable. The fallback for sites with immersive designs isn't portal display mode, it's flat display mode.

speigg commented 6 years ago

Perhaps a better option is for the session creation parameter to inverted to be “inline”, with portal/immersive display modes being the default. Though more ideally the session creation options will go away entirely (based on discussion in other threads), and this issue becomes moot.

blairmacintyre commented 6 years ago

@TrevorFSmith I guess we'll agree to disagree. I created a pile of demos over the years (with Argon) that had basic immersive and touch screen UIs, and it "just wasn't that bad". So, I am more optimistic, I guess.

ddorwin commented 6 years ago

On the topic of whether "Portal Display" should be "immersive mode":

There is currently no "full-screen graphics" mode in WebXR or elsewhere. The Fullscreen API allows a subset of the page (as specified by an HTML element) to be rendered in fullscreen. The entire contents of the element might not even be displayed (i.e., try the Fullscreen button on https://permission.site on a phone). The author gets to decide whether the occupies the full screen.

Note that both the Fullscreen API and WebXR immersive sessions require a user gesture, so applications will likely want to support "Flat Display" to provide a preview on page load. Also, because fullscreen is just changing how an HTML element is displayed, it is simple to go in and out of "Portal Display." Immersive sessions, on the other hand, may require different graphics adapters, will likely involve the user changing displays (including placing a phone in a headset), and can be rendered at the same time as "Flat Display" or "Portal Display."

Both "Flat Display" and "Portal Display" also allow DOM elements to be displayed to the user in addition to the "XR content." If "Portal Display" was an immersive session, this would not be the case. One implication of this is that all smartphone AR applications would need to recreate all their UI in WebGL. (The upside of that is that they would be more prepared for (edit) "Immersive Display" in AR headsets, but most probably wouldn't bother and would just use fullscreen to simulate, which is exactly what we have today.)

TrevorFSmith commented 6 years ago

@blairmacintyre My argument isn't that it's impossible for authors to create experiences that work in both portal and immersive display modes. My argument is that it's not possible for UAs to fudge portal access to immersive experiences using the equivalent of pinch-to-zoom like tricks. The author has to do the work and so they need to make different things happen for different display modes.

Immersive display and input hardware has moved beyond the Argon (or WebXR Viewer) style of handheld-only AR so authors have to explicitly and separately support portal and immersive display modes.

For example, TiltBrush written for an immersive display mode and dual tracked inputs can't be fudged by the UA to work in portal display mode. The author can write the app so that it supports both display modes, but they have to explicitly do so.

blairmacintyre commented 6 years ago

@ddorwin as you describe it, I do not see any difference between "portal" and "flat". Which may be what you expect. But, the implication is that the only way to do AR on handhelds is inside a DIV: there is no way to do the equivalent of what most native apps do, which is render the video full screen.

Is that the way you envision it?

The implication is that on an AR capable phone (not VR capable) there only one kind of session, the inline session?

In all the discussions that have been going on (especially before we changed "exclusive" to "immersive") there was an implication that there were two kinds of sessions: ones that you could get without a user permission prompt, that you might use for rendering a non-tracked scene (like the initial 3D model in a div on your Article article), and another kind that is more "full access to underlying platform".

Have these been collapsed into one on the phone?

speigg commented 6 years ago

@TrevorFSmith wrote:

My argument is that it's not possible for UAs to fudge portal access to immersive experiences using the equivalent of pinch-to-zoom like tricks. The author has to do the work and so they need to make different things happen for different display modes.

I think everyone agrees that the app has to do the work to adapt to each display type and capabilities. I also don’t think the UA should be trying to fudge things using tricks to make content designed for immersive displays work on non-immersive displays. I could be wrong, but I don’t think Blair envisions apps working that way either.

Immersive display and input hardware has moved beyond the Argon (or WebXR Viewer) style of handheld-only AR so authors have to explicitly and separately support portal and immersive display modes.

Here is what I don’t understand... how is it that requiring applications to handle both “portal” and “immersive” sessions together (via the same session creation API call, which is essentially what this issue is arguing for), precludes authors from “explicitly and separately” supporting portal and immersive display modes?

speigg commented 6 years ago

@TrevorFSmith wouldn’t something like this allow applications to “explicitly and separately” handle both “portal” and “immersive” display modes?

// no display mode specified in session options
let xrSession = await xrDevice.requestSession() 

onFrame(xrFrame) {
  if (xrFrame.displayMode === “immersive”) {
    ... 
  } else if (xrFrame.displayMode === “portal”) {
    ...
  } else if (xrFrame.displayMode === “inline”) {
    // this is the only mode that requires apps to setup their own outputContex
    ...
  }
}
blairmacintyre commented 6 years ago

@speigg, @TrevorFSmith isn't saying it can't be handled, he's simply saying that (in his view) the complexity of (and different UIs for) handling the different cases is such that it's reasonable to let the developer pick the modes they want to handle and explicitly not support other modes.

speigg commented 6 years ago

@blairmacintyre ah... got it. I suppose I disagree then. Especially with the input abstractions that are now part of the core WebXR spec, which should make it easier for applications to handle touch vs wand vs gaze input in the same way.

blairmacintyre commented 6 years ago

@speigg ya, since it's not been built yet, we're basically all arguing opinions at this point. 😆

speigg commented 6 years ago

Anyways, if the issue is that apps should be allowed to request sessions for only certain display modes, I disagree, but for arguments sake, it still shouldn’t be the “simplest” path. By default, the API should allow applications to receive a session that works for all display modes (or at least both “portal” and “immersive” display modes).

This should make everyone here happy:

// I only want to support immersive display mode 
let xrSession = xrDevice.requestSession({displayModes:[“immersive”]})

// I only want to support portal display mode
let xrSession = xrDevice.requestSession({displayModes:[“portal”]})

// I only want to support immersive and portal display modes
let xrSession = xrDevice.requestSession({displayModes:[“immersive”,”portal”]})

// I only want to support inline display mode
let xrSession = xrDevice.requestSession({displayModes:[“inline”]}) // or “flat”

// all your display are belong to me... (I support all display modes)
let xrSession = xrDevice.requestSession()
ddorwin commented 6 years ago

Additional thoughts on "immersive" and related aspects of session creation:

blairmacintyre commented 6 years ago

thanks @ddorwin. Some comments on your thoughts:

At the most basic level, the purpose of the "immersive" distinction is that you would render non-immersive immediately and have a button to start an immersive session. (This was much clearer with WebVR's requestPresent().)

That's a reasonable way to think about it, yes, that's what I was thinking. It holds up on the various "setups" I can imagine (desktop + external; phone; phone + headset adaptor; standalone headset with 2D-views-floating-in-3D).

Immersive WebXR sessions are really a parallel platform separate from the (2D) web platform.

  • Nothing rendered in the 2D web can be displayed in or over the immersive session.
  • Non-immersive/inline is just a that is part of the 2D web platform.
    • Apps that rely on this will not translate well to immersive sessions.
  • I think "exclusive" is potentially a better term.
    • I think this originally meant exclusive access to the device.
    • It is really exclusive responsibility for rendering.
    • There is probably a better term along those lines.

Yes.

Because immersive sessions really are this special case, a presentation paradigm does seem to make sense.

Whether it's "presentation" or not, it is a significant action. For example, it could be UA/user driven, like we did in Argon. There hasn't been much interest in that here, primarily (I think) because we would then require all UAs to create new native interfaces to give the user the ability to switch to/from immersive mode.

Regardless of whether it's app or user driven, it does feel like an explicit action.

The only exception to this is if you follow a link while already in immersive mode; it would be reasonable for the destination page to enter immersive mode automatically (assuming some form of UA permission / control so the user knows what's happening).

At the f2f, we discussed the idea that immersive/presentation could really be an upgrade of an existing session. In most cases it will be as there is no reason to mirror. In that case, maybe we should bring back something like requestPresent() on XRSession.

Brandon's proposed session rework addresses this, I think, if we want it to; "requestPresent" could be viewed as akin to the other capability requests.

The current requestSession() design allows mirroring to work like magic window. Maybe whether to continue rendering to the could be an option for requestPresent()

  • I think there may be other modes of rendering in the future.
  • Thus, a bool and decision at session creation may not be the best option, especially if we want apps to generally work across all clients.

Yes, that's actually my overall issue.

  • Other possible modes include rendering outside a floating browser window and non-headset-based augmentation.

Interesting; I hadn't considered the "rendering outside a floating browser" (by which I assume you mean the ongoing discussions elsewhere about ways to pop 3D content out of a page) as a WebXR mode. Unclear how that will work to me ... but it's worth considering.

  • I think methods for rendering in different ways might be better: i.e., requestExclusivePresentation() and requestExternalPresentation()

Perhaps. Might be worth walking through some cases.

blairmacintyre commented 6 years ago

After going through all of this, I think the issue is still open, but clearer. So let me try again.

I think that any UA, HMD or not, should be able to support something they call "immersive" mode. What it means is that the page renders 3D only, and takes over the display. No mix of DOM and non-DOM content.

"Inline" means that the content is in a canvas and that canvas can be put in the DOM and used like any other canvas, mixed with content, etc.

A phone could support both without an HMD, at the discretion of the UA.

The only issue I see here is that there is no guarantee that there is only one version of each, and that both immersive and inline support the same mix of AR and non-AR (i.e., overlayed on the world somehow, or not)

For example:

Thinking about my current project (WebXR Viewer):

toji commented 6 years ago

Woo boy. There's a lot going on in this thread since I last sat down to look at it, and while I've tried to read everything I'm sure I haven't grokked everyone's positions correctly, so forgive me if I say something that was already refuted by an earlier comment.

I want to try and nail down is the core display modes we want to enable, since it doesn't seem to have come to a resolution above. I've seen multiple times this listed as three distinct things, which @speigg referred to as immersive, inline, and portal. (And thanks @TrevorFSmith for the infographics on this subject. That was educational.)

I feel pretty strongly that we shouldn't re-invent pieces of the web platform we don't have to. Given that the Fullscreen API already exists, and that our "inline" page output is done via a canvas element, it's already possible to achieve the "portal" effect described above by simply making the canvas element fullscreen. Thus I'm heavily inclined to say we shouldn't have any notion of "portal" mode in the API. (That's not to say that it may not be useful to use that verbiage in tutorials, support libraries, or other supplementary material. It just wouldn't be a formal concept in the API.)

So with that said, let me speak to @blairmacintyre's most recent post.

I think that any UA, HMD or not, should be able to support something they call "immersive" mode. What it means is that the page renders 3D only, and takes over the display.

There's room for UA choice here, but I worry that this would be surprising behavior for most people. Let's assume that no matter how an "immersive" mode manifests, it'll require user activation (which typically == a button). So if we tell the page that they can use immersive mode, they add a button which the user clicks on, and the result is that the page simply goes fullscreen. That's probably not meeting user or developer expectations. You could make a case for that being a valid interpretation, but I have a hard time seeing most browsers following it.

"Inline" means that the content is in a canvas and that canvas can be put in the DOM and used like any other canvas, mixed with content, etc.

That's exactly how I've been viewing "inline" content.

it's unclear if I can effectively support inline AR

our current "display video natively, overlay 3D" would be an "immersive" mode.

Could you expand on these? I'm beginning to think that there's an implementation limitation in another browser or library I'm not aware of that's feeding into this discussion.

speigg commented 6 years ago

@toji wrote:

Given that the Fullscreen API already exists, and that our "inline" page output is done via a canvas element, it's already possible to achieve the "portal" effect described above by simply making the canvas element fullscreen.

Possible, yes, but as currently specced this requires creating an canvas/outputContext, placing the canvas in the DOM appropriately (and perhaps calling the fullscreen API as you said)—for what should arguably be a much simpler situation than rendering to an immersive (e.g., HMD) display. The primary argument as I understand it is that the initialization of an XR session should not be fragmented between different types of XR devices, as it encourages the development of tools and applications for one kind of device, and not another (particularly if “immersive” sessions are somehow perceived as superior, more important, or more full-featured). Supporting “inline” display takes extra work, and that’s fine, but for that reason it should be an additional (optional) feature of an XRSession. Likewise, the simplest default path for creating a (“non-inline”) XRSession and rendering graphics to the display should work for any XR Device—whether the resulting display mode is “immersive” or “portal” (and if both modes are supported, it should be up to the UA/user to decide which one is appropriate).

TrevorFSmith commented 6 years ago

I'm still failing to understand the stance that portal and immersive display modes can be irrelevant to authors. Each mode requires radically different interaction and content design. In portal mode, authors will need to create overlay controls, indicators for how the user should move the handset to help SLAM, and a whole host of other features that are essentially different work than what needs to happen in immersive display mode.

toji commented 6 years ago

Oh, I forgot to mention @speigg: Thanks for introducing me to the terms "endocentric" and "exocentric"! I wasn't aware of phrases for those concepts previously. It seems to me that based on the Wikipedia entries I linked, though, that both of the terms describe a variant of what I would consider an "immersive" display for the purposes of the API. The distinction is in how they achieve it. Specifically, a CAVE is used as an example of an exocentric environment.

as currently specced this requires creating an canvas/outputContext, placing the canvas in the DOM appropriately (and perhaps calling the fullscreen API as you said)—for what should arguably be a much simpler situation than rendering to an immersive (e.g., HMD) display.

I see where you're coming from on this a bit better now, but I'm still not sure I support the idea of this as a core API concept, because it IS still just an alternate way of getting the same effect as a fullscreen inline canvas. I would be 100% fully on board with tools like A-Frame providing it as a easy-to-use display mode, though.

I'm still failing to understand the stance that portal and immersive display modes can be irrelevant to authors. Each mode requires radically different interaction and content design.

I'm with Trevor here. It would certainly be possible to build some basic UIs that were immersive-centric and which would continue to work inline due to our current input model. I have a hard time seeing many larger projects being satisfied with that, though, when there's ease of use, accessibility, and developer familiarity benefits to going with an overlay UI when showing inline content.

Which leads to another point: I think we need to be prepared for the fact that, at least initially, we're going to see a fair number of potential users that only care about phone AR. It's got both buzzword appeal and a larger potential user base than VR, plus it doesn't require the user to "mode switch" by donning a headset. I'm absolutely a believer in both AR and VR, and I want to see people make content that scales to various environments as cleanly as possible. However, if we start off saying, in effect, "You must design your app to be usable by this modest slice of VR users in order to access this much larger market of phone AR users, even though there's no technical reason for that dependency" we'll drive content that otherwise would have happily lived on the web to native apps that have no such restriction. We must allow developers to say "I only care about use case X", or we've failed at a pretty fundamental aspect of our API design.

Don't get me wrong: I feel we should definitely make it as easy as possible to create responsive XR content. (With any luck the hardware ecosystem will start to encourage that anyway.) I just don't see how we could enforce it without driving developers away.

speigg commented 6 years ago

@TrevorFSmith wrote:

I'm still failing to understand the stance that portal and immersive display modes can be irrelevant to authors. Each mode requires radically different interaction and content design. In portal mode, authors will need to create overlay controls, indicators for how the user should move the handset to help SLAM, and a whole host of other features that are essentially different work than what needs to happen in immersive display mode.

Portal vs Immersive have different design implications which can (and probably should) be handled by an application-level toolkit that helps authors do the right thing in each circumstance (much like I imagine your PottasiumES framework does), in the same way that modern UI frameworks give developers the tools needed to adapt the layout and presentation of their content for different screen sizes and input capabilities.

@toji wrote

I see where you're coming from on this a bit better now, but I'm still not sure I support the idea of this as a core API concept, because it IS still just an alternate way of getting the same effect as a fullscreen inline canvas. I would be 100% fully on board with tools like A-Frame providing it as a easy-to-use display mode, though.

I think I am still failing to communicate the problem... essentially, as a developer, I’m either in control of certain things, or the system/UA/user is in control (inversion of control). The assumption with an “inline” display mode is that the app remains in full control over how to render (including where to place the canvas on the screen), while with the “immersive” display mode the system dictates how the app must render and controls how that content is presented on the display. This distinction is important, because the semantics of “who is in control” has implications for the kinds of user interfaces the UA is able to embed the XR content within, and allows the UA to give the user direct control over how XR content is viewed (which is important for an XR-first browser). For “inline” display mode, the XR layer is already embedded and presented within the DOM, and there isn’t much the UA can do to “enforce” a different layout. With “immersive” display mode, the UA can enforce how and where content is rendered to the display. Speaking to the future, this has implications beyond what a single app (or application-level framework, like AFrame) is able to do. The missing piece here is a display mode for handheld devices that has the same “inversion of control” semantics that the “immersive” display mode currently (with regard to rendering, at least) employs.

So this is not really about the fullscreen “effect” and whether or not that is possible for the app to do on its own... rather it’s about the semantics of application controlled displays vs UA controlled displays, both extremes of which have their uses.

speigg commented 6 years ago

@toji wrote:

Oh, I forgot to mention @speigg: Thanks for introducing me to the terms "endocentric" and "exocentric"! I wasn't aware of phrases for those concepts previously. It seems to me that based on the Wikipedia entries I linked, though, that both of the terms describe a variant of what I would consider an "immersive" display for the purposes of the API. The distinction is in how they achieve it. Specifically, a CAVE is used as an example of an exocentric environment.

Sure! BTW, a CAVE system (or a projected AR system, like RoomAlive) can be egocentric if it tracks the user’s head (so content can be rendered from the user’s perspective). Likewise, a handheld display can be egocentric if it tracks the user’s head (and renders content with the appropriate off-axis perspective projection matrix), and even AR can be done egocentrically on a handheld device by using a “virtual transparency” technique.

Edit: I see the Wikipedia entry you are referring to, about “exocentric vs endocentric environments”. That article uses these two words to refer to physical location of the display (essentially, on the user’s head or not). This doesn’t seem like quite a useful distinction as egocentric vs exocentric rendering, which is what I meant. See the classic paper by Milgram on categorization of mixed reality displays for the terminology I am referring to.

speigg commented 6 years ago

@toji wrote:

We must allow developers to say "I only care about use case X", or we've failed at a pretty fundamental aspect of our API design.

That’s totally fine. Giving authors the option to explicitly exclude (or include) support for certain display modes does not preclude: 1) a display mode with “inversion of control” rendering/presentation semantics on handheld devices (same “inversion of control” rendering semantics as “immersive” display mode, but let’s call it “portal” display mode to distinguish from rendering on HMDs and such). 2) a default “path of least resistance” session initialization in which both “immersive” and “portal” display modes are implicitly considered supported by the app.

blairmacintyre commented 6 years ago

Reading through the above, I think people are talking past each other.

I don't think either @speigg or I believe or have a "stance that portal and immersive display modes can be irrelevant to authors." Asserting that would be silly. I'm sorry if I'm somehow phrasing things that makes it sound like that. Obviously, creating good APIs for different sorts platforms will require non-trivial work.

Similarly, I don't think anyone asserted that "we start off saying, in effect, 'You must design your app to be usable by this modest slice of VR users in order to access this much larger market of phone AR users, even though there's no technical reason for that dependency'. Nothing I (or @speigg) has proposed does not 'allow developers to say "I only care about use case X"', I completely agree that if we don't support this 'we've failed at a pretty fundamental aspect of our API design'.

So, let me try again.

As (@ddorwin? Someone?) observed, "immersive" is really "exclusive presentation". Changing to "immersive" might have been a mistake. The gist of what @toji says about "inline" mode I tend to agree with, in that it's the right way to add XR content/elements to an existing website. And I completely agree with @TrevorFSmith and @toji when they argue that the bulk of the UI (and, thus, aspects of the app organization, and so on) for a site that wants to support both handheld AR/VR and HMD AR/VR will need some non-trivial differences.

And I agree that even if a handheld AR/VR app chooses to implement their API entirely in WebGL. ignoring the DOM, the 3D UI for a touch screen will need to be different than for an HMD plus 3D interaction; I've built simple demos that work in all 3 ways (touchscreen + DOM, touchscreen + WebGL UI, touchscreen + HMD) I will verify it's non-trivial. But I also agree with @speigg that over time, some of the pain of this will be handled by frameworks, at least for common cases.

What I'm arguing for is something else.

The only thing I disagree with @toji on is if "immersive" (or, instead, "exclusive presentation") makes sense on handhelds. I think it does, for a few reasons.

I'm not arguing against supporting inline+fullscreen, I'm arguing against NOT SUPPORTING "dedicated display" mode. Obviously, user's would need to know that they are on a handheld w/ touchscreen, not an HMD, just as they will want to know the nature of the controllers available to them on an HMD (just buttons/joysticks, 3DOF, or 6DOF, fingers or hardware device, 1 or 2 hands?)

Perhaps we need to revert from "immersive" to "exclusive" again! 😆

blairmacintyre commented 6 years ago

One more thing I wanted to call out from @toji's post above:

There's room for UA choice here, but I worry that this would be surprising behavior for most people. Let's assume that no matter how an "immersive" mode manifests, it'll require user activation (which typically == a button). So if we tell the page that they can use immersive mode, they add a button which the user clicks on, and the result is that the page simply goes fullscreen. That's probably not meeting user or developer expectations. You could make a case for that being a valid interpretation, but I have a hard time seeing most browsers following it.

I don't think there would be confusion, and we could probably talk about this at the F2F or outside this. We are clearly thinking about this differently. Perhaps because of the switch to "immersive" as the language, and the fact that for you this implies "HMD". If we used the term "dedicated rendering", and provided a way for the developer to know if immersive was going to an HMD or to "full screen on the phone" (so they could, for example, create the appropriate icon, or even not provide such at option), would that ease your concern?

(It seems that no matter how many times I say "immersive doesn't mean HMD to me", so when I talk about fullscreen immersive on handhelds I mean "dedicated non-DOM-bound rendering" it's not being heard)

speigg commented 6 years ago

@blairmacintyre wrote:

(It seems that no matter how many times I say "immersive doesn't mean HMD to me", so when I talk about fullscreen immersive on handhelds I mean "dedicated non-DOM-bound rendering" it's not being heard)

Right. If “immersive” must mean “HMD-like”, then certainly the name of the “immersive-web” group seems to be quite limited in scope, and secondly I would have to revoke my original proposal (issue #320) to rename the “exclusive” option to “immersive” as my intent was not to limit the “exclusive”/“immersive” display mode to HMDs.