Rethink non-goals: placing DOM elements in the 3D scene

haywirez commented 4 years ago

The explainer clearly states that:

This API is not intended to support placing DOM elements in the 3D scene. It does not address use cases such as placing labels directly on 3D objects or world features.

But as I'm trying to develop a proof-of-concept application, I'm now facing a myriad of layout & 2D UI design related issues (flexbox layouts, typography, text alignment, CSS animations etc.) that are already solved on the web. While the DOM overlay could fulfil the use cases of designing simple HUDs and such, the ability to add something like styled DOM elements to surfaces of 3D objects (maybe with a fixed depth per z-index layer) is exactly what would be needed down the line.

While clearly a major headache, I think this should be given another consideration and a graduated approach (perhaps dom-layers instead of dom-overlays), otherwise we'll have to reinvent the wheel...

klausw commented 4 years ago

I agree that 3D-place DOM elements would indeed be very useful, but I think that should be a separate API.

Have you seen the design choices document in this repository? To be useful, 3D placed DOM would need support for multiple separately-placed DOM elements, with placement controlled by the application, but would not necessarily need complex input beyond clicking on things.

One of the issues making DOM overlay difficult was that it is intended to support all of HTML, including non-flat inputs such as select dropdowns and text input. That's a challenge for the UA, and would be much harder for arbitrarily-placed elements.

I'd suggest keeping an eye on the Layers proposal: https://github.com/immersive-web/layers . That mentions DOM elements as one of its use cases, though last I checked that wasn't fully fleshed out yet.

keverw commented 4 years ago

There's also security concerns too depending on how it's implemented. I posted a similar issue here since I also think using HTML/CSS/JS mapped as a texture on an object could be an interesting use. My idea is just limit it to first party content though to try to reduce the security surface.

https://github.com/immersive-web/proposals/issues/57

But I do agree, seems like since this area is still new and groundbreaking, a lot of reinventing the wheels to create rich UIs.

greggman commented 4 years ago

One of the issues making DOM overlay difficult was that it is intended to support all of HTML, including non-flat inputs such as select dropdowns and text input. That's a challenge for the UA, and would be much harder for arbitrarily-placed elements.

Why? It already works. 3D CSS already does this. The only change is rendering. In fact you can go in say Oculus's home and open 5 or 10 different desktop windows and put them all at different angles and interact with them exactly as you'd expect.

I guess I don't understand the issue with supporting 3D out of the box. It's already there, it already works, you just need to render things to each eye. No?

https://threejs.org/examples/css3d_youtube.html

arthurmougin commented 3 years ago

Why? It already works. 3D CSS already does this. The only change is rendering. In fact you can go in say Oculus's home and open 5 or 10 different desktop windows and put them all at different angles and interact with them exactly as you'd expect.

Actually it only work on non-Immersive mode. In headset XR, the headset display only receive what the 3d engine send to it. In the case of css3d, the engine send a black or transparent hole where the element should have been. What we want with this issue is to also send the dom to the display. (dom as texture or as XR layer, both options would work for me)

klausw commented 3 years ago

@greggman wrote:

I guess I don't understand the issue with supporting 3D out of the box. It's already there, it already works, you just need to render things to each eye. No?

I think this isn't as simple as it sounds, at least based on my experience getting the current DOM overlay working in Chrome. Web browsers are complex, and there are assumptions built into current rendering that can be difficult to change. Currently the content quads generated as part of layout get rendered one time, and there's no concept of a viewer or eye position. At minimum there'd need to be a spec change to introduce the concept of an eye position to enable 3D CSS with correct stereo perspective. Doing this through CSS only would be a fairly invasive change. Alternatively, the responsibility could be handed off to JS similar to the XR render loop, but this has its own issues. Currently, DOM compositing happens independently from the XR render loop, and tightly coupling the render loops would be likely to introduce a lot of extra latency.

Again, I agree that 3D DOM elements would be a cool feature, but it's not just a simple matter of saying that this should "just work" as part of the WebXR DOM Overlay.

Bigger picture, the working group consensus seems to be that DOM integration in WebXR can only ever support two of the following three features for security/privacy reasons:

arbitrary placement in 3D space including correct occlusion with drawn content
interactive DOM content
cross-origin content such as arbitrary iframes

The WebXR DOM Overlay module supports interactivity and cross-origin content, but not arbitrary 3D placement.

If I remember right, the WebXR Layers API supports 3D placement and interactivity for DOM content, but not cross-origin content. @cabanier, is that correct?

The third alternative would be to support only non-interactive DOM content with arbitrary 3D placement and cross-origin content, but as far as I know there's no current proposal for a spec that supports this scenario. If this were implemented, applications could still support interactivity by manually doing hit testing and generating synthetic DOM events, but the security model wouldn't allow passing such events to cross-origin content.

A full XR implementation of the threejs "3D YouTube" example would need all three of those features (assuming it's not hosted by YouTube itself), so my understanding is that there's no current path to getting this working within WebXR.

cabanier commented 3 years ago

If I remember right, the WebXR Layers API supports 3D placement and interactivity for DOM content, but not cross-origin content. @cabanier, is that correct?

I made a proposal to have WebXR Quad and Cylinder layers populated with a same-origin URL. This is not implemented yet. I agree that this is a complex feature to specify and implement. Having parts of your DOM appear in 3D is not possible because of the assumptions that are built into browsers (as you mention).

klausw commented 3 years ago

Thanks Rik for confirming. I'll close this issue as not feasible for the DOM overlay module. (This doesn't mean that it's out of scope for WebXR in general, but it's not currently on the roadmap due to the issues mentioned.)

greggman commented 3 years ago

@greggman wrote:

I guess I don't understand the issue with supporting 3D out of the box. It's already there, it already works, you just need to render things to each eye. No?

I think this isn't as simple as it sounds, at least based on my experience getting the current DOM overlay working in Chrome. Web browsers are complex, and there are assumptions built into current rendering that can be difficult to change. Currently the content quads generated as part of layout get rendered one time, and there's no concept of a viewer or eye position. At minimum there'd need to be a spec change to introduce the concept of an eye position to enable 3D CSS with correct stereo perspective.

There is no need to change CSS to support VR, just has a Unity/Unreal app requires no changes to the scenegraph to support VR. In those engines you take your non-VR content, you pop in the VR component, you get VR. No changes needed to your app. Similarly the browser has all the data it needs. CSS gives each plane a 3D position, same a as scenegraph in a standard 3D game engine. The top level app just sets the CSS to the world orientation of the part of the DOM it wants to see and everything below is relative.

Currently, DOM compositing happens independently from the XR render loop, and tightly coupling the render loops would be likely to introduce a lot of extra latency.

this is already true on non VR. You render to a canvas and the browser composites that with the frame. Of course there is overhead but that overhead is tiny compared to reimplementing the entire HTML stack in JS just because you want an 2D UI projected in your VR scene.

Again, I agree that 3D DOM elements would be a cool feature, but it's not just a simple matter of saying that this should "just work" as part of the WebXR DOM Overlay.

It is

Bigger picture, the working group consensus seems to be that DOM integration in WebXR can only ever support two of the following three features for security/privacy reasons:

Security issues are secondary IMO. You can require same-origin on all content a start. That would still allow apps access to all of the browser platform in VR. A huge win.

As for other security issues. If the browser is the one doing the compositing and re-reading from the backbuffer is disabled then there are no security issues that I know of.

klausw commented 3 years ago

@greggman

CSS gives each plane a 3D position, same a as scenegraph in a standard 3D game engine. The top level app just sets the CSS to the world orientation of the part of the DOM it wants to see and everything below is relative.

There isn't really a native Z position in CSS.

The CSS z-layer property is just for ordering, and using that as a Z position for rendering would horribly break sites. It's often abused with values that have no relation to physical placement.

While 3D CSS transforms do establish a kind of scene graph, they tend to be used with fairly arbitrary values combined with a manually applied perspective transform with the goal of getting something that kind of looks right when drawn on a 2D screen. For example, [these samples](https://developer.mozilla.org/en-US/docs/Web/CSS/transform-function/perspective()) use arbitrary values such as 4cm for the perspective transform.

I think it wouldn't work well to have the 3D CSS transforms be used as-is as a scene graph automatically. At minimum there would need to be semantic changes such as ignoring perspective(), and it's likely that the result may not look quite right for pre-existing sites if it used arbitrary units.

It sounds as if it would be possible to do an opt-in version where the site agrees to use 3D CSS transform in a constrained way (maybe a @xr media type?), and such a site would need to take care to ensure that it consistently uses units to ensure that the x/y and z coordinates produce a proper 3D result.

However, this still leaves the issue of having to change the DOM compositor to do two output passes. This is doable, and I think it may not even be all that difficult to build a proof-of-concept that shows this behavior. However, getting this code to production quality, including ensuring there's no performance impact or undue maintenance burden for non-3D applications, is another matter entirely. I'd be happy to be proven wrong here if someone wants to take a stab at it, and I think the feature would be useful, but based on my experience working on Chromium source I think it's likely to be a substantial project, and I'm not aware of anyone currently planning to do this work.

this is already true on non VR. You render to a canvas and the browser composites that with the frame. Of course there is overhead but that overhead is tiny compared to reimplementing the entire HTML stack in JS just because you want an 2D UI projected in your VR scene.

Immersive XR sessions work entirely differently, the opaque renderbuffer is independent of the canvas and has a completely different compositing pipeline. The overhead wouldn't be tiny, it could easily include a full frame of added latency.

Security issues are secondary IMO. You can require same-origin on all content a start. That would still allow apps access to all of the browser platform in VR. A huge win.

Considering security is a requirement for launching features in the browser. Yes, requiring same-origin simplifies things greatly, but then we're looking at a different API variant with different tradeoffs, for example in that case you'd also likely want to have options to place DOM elements in the scene at known locations as opposed to having them placed arbitrarily by the UA as part of a self-consistent DOM object.

Again, I agree that a true 3D DOM would be a cool feature, but this is the wrong place to discuss it. The WebXR DOM Overlay chose different tradeoffs (allow cross-origin content, don't allow arbitrary 3D placements).

You're welcome to discuss a proposed spec variant or new specification with the community group if you have a concrete proposal and know an implementer willing to do the work, but just saying "it's easy, just do this please" isn't sufficient.

As for other security issues. If the browser is the one doing the compositing and re-reading from the backbuffer is disabled then there are no security issues that I know of.

Sorry if I'm coming off as defensive or unhelpful here, but this is a rather sore subject that has been discussed ad nauseam in the working group over multiple years, with many false starts and abandoned proposals. For example, one of the concerns is that it's unsafe to allow UA-mediated interactions with cross-origin content if that content may be occluded by other scene content, or where the user may not be aware that they are interacting with such content.

immersive-web / dom-overlays

Rethink non-goals: placing DOM elements in the 3D scene #28