Global environment blend mode does not work with "third eye" video capture views

Manishearth commented 4 years ago

We currently have a per-session environment blend mode. However, first person observer views used for video capture on AR devices (see https://github.com/immersive-web/webxr/issues/1045 for details) might have a different blend mode from the one being used for rendering to the eyes (likely additive) than used for the first person observer view (likely alpha-blend).

Perhaps it should be per-view? (I'm not sure if this being a breaking change is a big deal right now. Looking around I don't see much use of this attribute yet, and three.js is using this attribute, but incorrectly)

toji commented 4 years ago

I'm not sure how practical I think this use case is, but let's address that in a moment and instead assume we're trying to get it to work:

I'm trying to think of a scenario where a third-person observer view would be forced to use additive blending when the first person views are shown with alpha blending? If the primary display is additive you could argue that forcing the observer display to additive as well would give you the most accurate view of what the in-XR user is seeing.

Otherwise, I wonder if the more interesting thing is to indicate which views are observers, as there's potentially other interesting alterations that you'd want to make to that view that wouldn't apply to the primary views. (Screen-space UI, displaying hidden or debug elements, etc.) I guess if we expect that these kinds of observer views are going to be more common then we'd also need to differentiate each view by interaction space, etc.

With that being said, I'm curious what sort of observer views we'd expect that aren't handled by the current "Spectator mode" technique. If the intent is to just show a third view on the primary monitor of a system the headset is wired into that would give page developers a lot more control over the content than a more general runtime tool. If the intent is to allow for other tracked screens, however (let's say a mobile AR view of the headset user's environment) then this seems like something that we should look at supporting through mechanics like cloud anchors.

blairmacintyre commented 4 years ago

Hi @toji ... @Manishearth said what it was for: doing video capture on an AR device. The video camera is not aligned with either of the see-through views. It has a different pose, and is ... video.

As for "how practical" it is: it's essential.

Manishearth commented 4 years ago

Spectator mode only works on tethered headsets right now, and also is not tracked.

The intent here is to be able to "cast" what the person is seeing through their device, which is not possible with unaligned camera feeds as with most AR devices.

Manishearth commented 4 years ago

/agenda to discuss the general space here

thetuvix commented 4 years ago

As mentioned in the discussion on the last call, the Mixed Reality Capture feature on HoloLens 2, where a user captures a photo or video from their device while an app is rendering holograms, requires the app to render from the camera's pose. This is because the camera is far enough away from the displays that if you just steal one eye's image and distort it to match the camera's pose, holograms attached quite off relative to nearby objects. In this case, the blend mode for the primary stereo view is "additive", while the blend mode for the mono observer view is "alpha-blend".

Generally, most apps do fine rendering RGBA color buffers with the same shaders for all 3 views, as the alpha will be implicitly thrown away for the "additive" primary stereo view (this is one reason WebXR defines its color buffers to premultiplied) and the alpha will help improve video composition for the "alpha-blend" mono observer view. However, we should let apps know that the mono observer view will be "alpha-blend" in case they do wish to

Otherwise, I wonder if the more interesting thing is to indicate which views are observers, as there's potentially other interesting alterations that you'd want to make to that view that wouldn't apply to the primary views. (Screen-space UI, displaying hidden or debug elements, etc.) I guess if we expect that these kinds of observer views are going to be more common then we'd also need to differentiate each view by interaction space, etc.

Agreed! Apps may wish to hide certain UI elements or add branding frames (e.g. a border with a logo) only for the observer view.

Beyond that, we also do need apps to opt in explicitly that they understand and will respond correctly to requests to start rendering into that third observer view. If an app's shaders are tuned for 2 parallel stereo views, will only render for the first 2 views, or worse, will crash if given more than 2 views, our compositor would rather bite the bullet and just distort one of the eyes - it won't give ideal results, but neither will a black recording or an app crash!

Manishearth commented 4 years ago

Oh, forgot to update this.

We discussed this in the call and one bit of consensus, especially given @thetuvix's feedback, was that this needs to be an opt in. Currently threejs crashes when you attempt to give it three views: it actually iterates through the views array just fine, but it matches views to a camera array that's set up beforehand, and it has a length of 2.

The rough API surface that was discussed was an XR feature for first person observer (and in the future, maybe one for quad views). Frameworks are encouraged to pass it in if they are written to handle the extra view. UAs will only return this extra view in the views array if capture is happening (it might even return it every other frame to account for different frame rates!). If this feature is enabled we can have an overriding environment blend mode on the FPO view object, or something.

immersive-web / webxr-ar-module

Global environment blend mode does not work with "third eye" video capture views #53