Layers on devices with views of different resolutions

asajeffrey commented 4 years ago

The layers spec doesn't currently support devices with different resolutions for different views (e.g. the recording view is different from the left and right eyes). At the moment, when textures are allocated (https://immersive-web.github.io/layers/#allocate-color-textures and ditto depth/stencil), all views are the same size.

cabanier commented 4 years ago

That is correct. I believe I brought this up before and I was told that Hololens works around this by just allocating the same size texture for the camera as for the eyes. We also don't handle things like different blend modes and frame rate.

If we're going to properly support a camera view, we should add explicit support in the WebXR spec.

asajeffrey commented 4 years ago

@Manishearth does the hololens camera view have the same resolution as the eye displays?

Blend mode is definitely an issue, c.f. https://github.com/immersive-web/webxr-ar-module/issues/53

Frame rate is a tricky one, I'd expect lots of content assumes there's only one rAF cadence, and will be quite surprised if different rAF callbacks have different views.

Manishearth commented 4 years ago

I believe the hololens matches frame rates (either by reducing eye framerate when recording or by ignoring every other camera frame).

The resolutions are indeed different, but this isn't a big deal, views can have different sizes.

cabanier commented 4 years ago

I believe the hololens matches frame rates (either by reducing eye framerate when recording or by ignoring every other camera frame).

The resolutions are indeed different, but this isn't a big deal, views can have different sizes.

This means that texture arrays are not going to work in this workflow. This is another indication that we should treat the observer view differently from the regular views.

Manishearth commented 4 years ago

To me that seems like it's an indication that the texture array approach could be improved :smile: . Observer views aren't the only such example, the Varjo quad display is also one, as are potential CAVE systems. The observer view is just the one I've actually been working with.

cabanier commented 4 years ago

To me that seems like it's an indication that the texture array approach could be improved 😄 .

WebGL defines that all texture in a texture array are the same size. If different views have different sizes, we need to specifiy that the UA has to reject texture arrays for projection layers. (I will add some spec text to clarify this)

A lot of experiences are going to break if the recommended workflow isn't working anymore. As @thetuvix mentioned, a UA might have to work around this by calculating the observer view itself or letting the experience set up a different rendering pipeline for the observer view.

Manishearth commented 4 years ago

Right, so perhaps the solution there is to allow multiple projection layers, or for a projection layer to have multiple texture arrays. "Letting the experience set up a different rendering pipeline" is exactly the issue here :smile:

cabanier commented 4 years ago

I think there should be a new session for the observer with its own frame rate, blend mode and views array. Tinkering with the textures will result in confusing logic in the spec and eventually author code.

asajeffrey commented 4 years ago

Hmm, that's an interesting idea, having more than one session per experience. At the moment, sessions are initiated by the content, how would it work if the device wanted to use more than one session? I suspect a lot of code will break if there's more than one session object.

Manishearth commented 4 years ago

Right, it feels like the openxr model for this: you have a single session with multiple "view configurations" that can be addressed independently is the right one.

I think multiple sessions will be way more confusing to deal with both in spec and author code -- there's so much cross-synchronization that would need to be done.

asajeffrey commented 4 years ago

In terms of different frame rates... are there devices with different frame rates where there's not a single main frame rate we can use? E.g. for devices with a camera, the frame rate of the headset display, not the camera.

cabanier commented 4 years ago

In terms of different frame rates... are there devices with different frame rates where there's not a single main frame rate we can use? E.g. for devices with a camera, the frame rate of the headset display, not the camera.

Magic Leap's display runs at 120fps. I'm unsure what the frame rate of its camera is but it's surely much lower. @thetuvix said that Hololens renders all frames for the camera but ends up throwing most of them away which is not ideal.

Moreover, it will be hard to match up predicted camera poses with predicted viewer ones which will make it hard to avoid jittering.

Manishearth commented 4 years ago

@thetuvix said that Hololens renders all frames for the camera but ends up throwing most of them away which is not ideal.

Right, but that's already a choice made by Hololens, we're not making that choice for them.

cabanier commented 4 years ago

I think multiple sessions will be way more confusing to deal with both in spec and author code -- there's so much cross-synchronization that would need to be done.

No, I think this will be far less confusing because it allows you to break up your logic at a very high level. Game logic + your existing stereo renderer + your existing mono renderer vs Game logic + a hybrid mono/stereo renderer filled with if/else blocks

cabanier commented 4 years ago

@thetuvix said that Hololens renders all frames for the camera but ends up throwing most of them away which is not ideal.

Right, but that's already a choice made by Hololens, we're not making that choice for them.

Can you elaborate? Are you saying we should also drop frames?

asajeffrey commented 4 years ago

We could provide that info to content providers, e.g. when content requests the subimage for a view, we could provide a flag saying "this subimage will be thrown away". If we wanted to be more agressive about it, we could return a null subimage, though I suspect this would result in a lot of content throwing exceptions.

asajeffrey commented 4 years ago

I should probably give an example of what I meant by having a main frame rate...

Imagine a headset that runs at 120fps, with a camera running at 25fps. If we tried matching both frame rates, we'd end up running (if I've done the math correctly) 140fps, with uneven gaps between the frames. But... I'm not sure such devices exist! I suspect that most devices have the secondary display running at a fraction of the primary (e.g. 30fps rather than 25fps).

cabanier commented 4 years ago

I think we should discuss this in a call. I'm wary to add a lot of special case code if we can solve it in a cleaner way.

cabanier commented 4 years ago

/agenda how should we handle views from different devices (ie eye displays + observer)

asajeffrey commented 4 years ago

I'm not sure I'd word it that way, for the case of HoloLens, there's only one device but it's got more than two views, and they have different properties (e.g. resolution, alpha blend, framerate,...).

cabanier commented 4 years ago

I'm not sure I'd word it that way, for the case of HoloLens, there's only one device but it's got more than two views, and they have different properties (e.g. resolution, alpha blend, framerate,...).

OK :-) More than one type of display per session.

Maybe I should open an issue on the WebXR spec and propose a "observer" session than can run concurrently with an immersive one.

Manishearth commented 4 years ago

I really think that an observer session would be super heavyweight.

Furthermore, as mentioned before, we need to deal with this for quad views and CAVE anyway. My understanding is that the webxr spec was intentionally designed with an uncapped and unconstrained number of views. We should try and handle this without recapping that number to two.

cabanier commented 4 years ago

I really think that an observer session would be super heavyweight.

Furthermore, as mentioned before, we need to deal with this for quad views and CAVE anyway. My understanding is that the webxr spec was intentionally designed with an uncapped and unconstrained number of views.

I think we can deal with quad views and CAVE since those systems all render at the same framerate, blend mode, time warp, etc. Observer views are different and yes, they are heavy weight. There is no way around it. Simply adding a view and expecting it to render correctly won't work (as @thetuvix also mentioned).

Manishearth commented 4 years ago

I think we can deal with quad views and CAVE since those systems all render at the same framerate

quad views also have differing resolutions (not frame rates), which is the issue in question here. The framerate thing is a separate issue.

The point is, things that work based off a texture array will need tweaking if we want them to work on systems with different sizes of view. Probably by accepting multiple texture arrays, though we can also declare we want to defer solving this problem and spec it to error out or only work with the primary views and expect content to fall back to regular textures.

Simply adding a view and expecting it to render correctly won't work (as @thetuvix also mentioned).

Works fine on hololens' openxr implementation. The dropping of frames is suboptimal but that's a choice made by the system.

Fwiw you actually can handle multi-framerate views by sending down different per-frame view arrays based on whether or not the observer view needs to be rendered. Hololens doesn't seem to expose the bit of "is this frame going to be thrown away", so we didn't implement it that way, but a device with a different observer framerate that wishes to make this optimization totally can do that.

cabanier commented 4 years ago

though we can also declare we want to defer solving this problem and spec it to error out or only work with the primary views and expect content to fall back to regular textures.

Yes, I'm ok with deferring. PR #159 allows different texture size so that should address some of the concerns

Simply adding a view and expecting it to render correctly won't work (as @thetuvix also mentioned).

Works fine on hololens' openxr implementation.

No. @thetuvix explicitly said that that scenario didn't work and that they ended up assembling the camera view themselves or let the application explicitly code for it.

thetuvix commented 4 years ago

Agreed that we should have a call here!

Some quick notes on how HoloLens 2 works:

The only efficient path for the HoloLens 2 compositor is for the app to submit the primary stereo views as a 2-element texture array. That is a key feature we are looking for from the layers API, so let's not lose that :smile:
The HoloLens does not allocate the same texture size for the mono observer view. That size will depend on whether a photo or video is being captured, among other factors. This does mean that the third observer view cannot participate in the same texture array as the primary stereo views unless the app is willing to waste some pixels. That's not something that we currently recommend.
The blend mode is different between the primary stereo views and the mono observer view - as discussed in https://github.com/immersive-web/webxr-ar-module/issues/53#issuecomment-641673302, the blend mode for the primary stereo views is "additive", while the blend mode for the mono observer view is "alpha-blend".
HoloLens 2 has a steady state app render rate of 60fps. When Mixed Reality Capture is active, the app is dropped down to 30fps, to free up enough headroom on the device for the third render and the MRC overhead. This matches the 30fps rate of the camera, and so there are no dropped app frames.
Specifically, the app is asked to render that third observer view on the same 30fps frame cadence and phase as the primary stereo views. It's the platform's job then to reproject that app observer view render to the nearest captured camera frame. During this time of heavy device load, we would not ask the app to run an independent out-of-phase update loop, as the app would then have to update 60 times a second with jittering frame cadences and would lose the ability to do efficient rendering across the three views. This would also result in lots more moving parts for app developers and would fall outside the architecture of most existing render engines.
Our previous native API did follow the "bag of views" approach, giving you a HolographicFramePrediction each frame with an array of HolographicCameraPoses, and it's your job as an engine to suck it up and just render for whatever poses happen to show up each frame. However, we found when tiptoeing in that direction on HoloLens 1 that if we ever did give an app a third view, it would usually render incorrectly, render black or crash if it hadn't actually hardened for that situation. On HoloLens 2, where Mixed Reality Capture does encourage rendering from a third perspective, we made the PhotoVideo view configuration explicitly opt-in. That way, if the app hasn't tested for this case and doesn't opt-in, we can still do the least-bad "distort one of the eyes" approach, which is at least better than incorrect rendering, black rendering or a crash. That same incremental approach could work for WebXR - keep the existing "bag of views", but only enumerate optional secondary views when an app does the appropriate hardening and then uses a module-defined API to opt in. When the app does opt-in, add an attribute to each view (defined in that module) to let the app know which is which each frame.
In OpenXR, when we introduced our "secondary view configuration" vendor extension, we discussed adding an additional per-view shouldRender bit each frame to let the app know whether pixels it produces for that view will be used or not. This would allow a system with a 90fps display and a 30fps observer camera to tell the app which 2 of each 3 frames don't require an observer view. We skipped that for now in our vendor extension since we'd always return true on HoloLens 2, but that sounds like a fine addition in a cross-vendor layers module here. I would not recommend beyond that to have truly independent frame loops - as discussed above, that is a far heavier lift for apps and engines from a performance, architecture and confusion perspective.

Manishearth commented 4 years ago

@cabanier the thing that didn't work is "just giving the application an extra view", which apps are typically not prepared for. the current plan for WebXR is to make this an opt-in via a feature instead. However, it will still be "just another view".

The concern raised by Alex is already resolved if we go with the feature route.

Manishearth commented 4 years ago

I should make the PR for that extra feature so we aren't talking about hypotheticals :smile:

cabanier commented 4 years ago

@cabanier the thing that didn't work is "just giving the application an extra view", which apps are typically not prepared for. the current plan for WebXR is to make this an opt-in via a feature instead. However, it will still be "just another view".

I'm unconvinced that adding another view and reusing the same rendering path is a good solution. For one, maybe HL could get by at rendering the scene at 30fps but that it is not a good solution as such a low framerate will cause swim and user discomfort (especially if they are used to high framerates such as 120fps)

A new session will give us everything we need and it can be easily defined. It's true that the drawback is that this introduces 2 render loop but I suspect that each will be simpler.

Manishearth commented 4 years ago

The framerate issue has a solution: send down the third view for only some of the raf frames. There's nothing that prevents this, and when I write the text for the first person observer view I hope to call it out. Hololens has made a choice in which this isn't necessary because it's always pegged to a lower fps, but other devices can choose otherwise.

Multiple sessions is not an easy thing to define. We have a concept of exclusive access to the device, and a lot of spec text is written assuming this, especially given that most backends have a concept of exclusive access. Multiple sessions will complicate the spec and complicate implementations. Multiple views is already supported by the core spec, and while we need an opt in so people don't shoot themselves in the foot, I strongly feel this is the path we should be taking.

I would rather not make such a drastic change to the core spec just to avoid doing some work on the layers spec. Supporting multiple view configurations with texture array projection layers is not impossible! It needs an API to be designed, but it can be designed, and that can be done in parallel with all the other work.

thetuvix commented 4 years ago

As discussed above, fractional frame cadences for secondary views seem fine to me. If HoloLens had the headroom to keep rendering the primary views at 60fps, we would do as @Manishearth is suggesting and still render the observer view at 30fps by just excluding it from rendering every other frame. That way, the app can continue running its update loop at 60Hz and render at a steady cadence.

Running two independent render loops on arbitrary out-of-phase cadences also requires the simulation update for your scene to tick at those arbitrary times as well. Since you must serialize simulation in most engines, you can't just run two independent loops as you might try for rendering. This means your engine would end up with one unified update loop anyway, except now it jitters between phase A and phase B. I expect it's that impact on update rather than render where independent frame loops would break the architecture of most render engines.

raviramachandra commented 4 years ago

I'm unconvinced that adding another view and reusing the same rendering path is a good solution. For one, maybe HL could get by at rendering the scene at 30fps but that it is not a good solution as such a low framerate will cause swim and user discomfort (especially if they are used to high framerates such as 120fps)

@cabanier I am too unsure of rendering stereo at 30 fps, let me check internally and get back.

cabanier commented 4 years ago

OK. I can see that there is not much support for 2 different render loop. Thinking about it more, different loops would also require new layers for the extra session which would be a bit annoying.

I'm still a bit hesitant for cases where the camera is not a fractional cadence of the display but maybe we can make it that we lower the device or camera framerate so it can match. @Manishearth @asajeffrey PR #160 should address cases where views have different resolutions. Can you take a look?

immersive-web / layers

Layers on devices with views of different resolutions #158