`normDepthBufferFromNormView` in XRDepthInformation is not a rigid transform

bialpio commented 10 months ago

We should look into changing the type of normDepthBufferFromNormView since it can be a non-rigid transform - rigid transforms cannot apply scaling, and this is what can be performed by this transformation.

What we'd need is a general-purpose matrix type in the IDL. There is a DOMMatrix which I think could work here, but this is going to unfortunately be a breaking change. 😢

cabanier commented 5 months ago

As it turns out, this is something that is also needed on the Quest side. The Quest OpenXR depth extension exposes a pose and the fov that should be exposed as a projection matrix to the author. Quest browser currently assumes that frame and depth matrices match but that is not always true. I've been told that other changes are coming that make it even less true.

We should indeed extend the interface and provide a matrix. The only other way would be to "shift" the depth pixels on the fly on the user agent side and that would be very expensive. cc @alcooper91 @toji

cabanier commented 5 months ago

After talking to our people working on depth sensing, we potentially need another change to the interface and we might need update to how occlusion is implemented. I invited one of the engineers to come talk to us during a weekly call

/agenda discuss needed changes to depth sensing spec + implementation

AdaRoseCannon commented 5 months ago

I know I keep mentioning it but is it possible to have a simpler dedicated interface for Occlusion because it's what most people will use it for and another more generic API for getting the depth map for other purposes like physics? Having the details hidden a bit might help with making it generic.

cabanier commented 5 months ago

I know I keep mentioning it but is it possible to have a simpler dedicated interface for Occlusion because it's what most people will use it for and another more generic API for getting the depth map for other purposes like physics? Having the details hidden a bit might help with making it generic.

Unfortunately, I was told that it's a tradeoff between complexity and ease of use. We can make it simpler but then it will be too slow to be useful

alcooper91 commented 5 months ago

I believe Piotr's original intent in relaxing this here wasn't to allow for depth information that didn't match the XRView that it was attached to, but rather to allow mirroring (e.g. a y-flip), since the intention behind this transform at the start was simply to account for 2D device orientation (e.g. portrait vs landscape or y-flip).

While there is an additional cost to be paid to shift the pixels, either the UA or the site is ultimately going to have to pay it, and allowing a full transform that essentially allows the depth buffer to not be aligned with the XRView makes it feel wrong to me that it is attached to the XRView, which is a much larger breaking change. I believe we could normatively specify that any transformation that needs to happen be applied lazily, which would still give developers control of the performance without having to alter the API shape. The other alternative that exists is that we could leverage secondary-views to expose views that are aligned with the depth cameras that should not be rendered to.

Given that Chrome has shipped the Depth API and does see external usage of it, I'm hesitant about such a large breaking change when alternatives exist.

I'd also be open to a simpler "turn on occlusion" API, and while I agree there is a tradeoff, you could imagine there being some system setting to do this that maybe isn't too slow to be useful (I'd guess Apple has this), and then it becomes a developer choice of which path you want to take.

cabanier commented 5 months ago

While there is an additional cost to be paid to shift the pixels, either the UA or the site is ultimately going to have to pay it, and allowing a full transform that essentially allows the depth buffer to not be aligned with the XRView makes it feel wrong to me that it is attached to the XRView, which is a much larger breaking change. I believe we could normatively specify that any transformation that needs to happen be applied lazily, which would still give developers control of the performance without having to alter the API shape. The other alternative that exists is that we could leverage secondary-views to expose views that are aligned with the depth cameras that should not be rendered to.

Supposedly we tried implementing your proposal (= shift the pixels) but found that it was not performant enough and experiences couldn't get the framerate or level of detail. However, if a site wants to shift the pixels, they could choose to implement it themselves. It would likely be faster than having the UA do it.

Meta now recommends that the depth buffer is rendered into a shadow map using the fov and pose. The texture, pose or fov will NOT update every frame and this would allow reuse of the shadowmap. This is another attribute that needs tobe exposed.

I'm just relaying what the team that works on depth sensing told me. I invited one of their engineers to the next meeting to explain this more in depth.

Given that Chrome has shipped the Depth API and does see external usage of it, I'm hesitant about such a large breaking change when alternatives exist.

I believe Chrome only shipped a CPU implementation so it's using a different API. Maybe we can only update it for GPU depth sensing?

I'd also be open to a simpler "turn on occlusion" API, and while I agree there is a tradeoff, you could imagine there being some system setting to do this that maybe isn't too slow to be useful (I'd guess Apple has this), and then it becomes a developer choice of which path you want to take.

AVP is running on a desktop class GPU. We also don't know how complex their solution is. Given what I was told, I don't want a simple API that solves all problems but is unusable in real world experiences. I'd rather expose the platform primitives from OpenXR and tell authors how to use them.

alcooper91 commented 5 months ago

Supposedly we tried implementing your proposal (= shift the pixels) but found that it was not performant enough and experiences couldn't get the framerate or level of detail. However, if a site wants to shift the pixels, they could choose to implement it themselves. It would likely be faster than having the UA do it.

Meta now recommends that the depth buffer is rendered into a shadow map using the fov and pose. The texture, pose or fov will NOT update every frame and this would allow reuse of the shadowmap. This is another attribute that needs tobe exposed.

I'm just relaying what the team that works on depth sensing told me. I invited one of their engineers to the next meeting to explain this more in depth.

I think all of this would also be possible through use of a secondary view, which IIRC should already have it's own fov and pose and then this becomes less of a breaking change? UAs could decide if they want to support the depth on the primary views or not.

AVP is running on a desktop class GPU. We also don't know how complex their solution is. Given what I was told, I don't want a simple API that solves all problems but is unusable in real world experiences. I'd rather expose the platform primitives from OpenXR and tell authors how to use them.

Ack, but I think the request from Ada was that this would be a separate API? A UA wouldn't have to implement it/grant it to sessions if performance was bad.

cabanier commented 5 months ago

I think all of this would also be possible through use of a secondary view, which IIRC should already have it's own fov and pose and then this becomes less of a breaking change? UAs could decide if they want to support the depth on the primary views or not.

Can you explain how a secondary views would work? I think those are designed as additional views that you can render to, not where you can render from.

AVP is running on a desktop class GPU. We also don't know how complex their solution is. Given what I was told, I don't want a simple API that solves all problems but is unusable in real world experiences. I'd rather expose the platform primitives from OpenXR and tell authors how to use them.

Ack, but I think the request from Ada was that this would be a separate API? A UA wouldn't have to implement it/grant it to sessions if performance was bad.

Wouldn't it be possible to do the same in JS/WebGL/WebGPU using a framework? If so and this approach is better for you, we could design that API. On Quest this API would have to be polyfilled.

alcooper91 commented 5 months ago

Hmm, it does look like it's intended as an optional space for you to render to: https://www.w3.org/TR/webxr/#primary-and-secondary-views, I thought there was already some form of precedent for using them to get additional data out to the page. Regardless, there may be a way to tweak this to make something that is potentially less of a breaking change regardless, if we truly feel this is a route we have to go down.

The other change that I suggested also FWIW was to lazily shift the pixels which would give the page a bit more control over the performance. Though I obviously haven't run numbers, my strong intuition is that running such a shift in native code will be faster than running it in JavaScript (maybe WebGPU mitigates this performance impact, but for CPU I cannot see the page being able to do it faster).

Wouldn't it be possible to do the same in JS/WebGL/WebGPU using a framework? If so and this approach is better for you, we could design that API. On Quest this API would have to be polyfilled.

I'm not sure I follow.

Meta now recommends that the depth buffer is rendered into a shadow map using the fov and pose. The texture, pose or fov will NOT update every frame and this would allow reuse of the shadowmap. This is another attribute that needs tobe exposed.

Do you have a link to this recommendation? Presumably it's surfaced as a demo/blog post for developers?

cabanier commented 5 months ago

The other change that I suggested also FWIW was to lazily shift the pixels which would give the page a bit more control over the performance. Though I obviously haven't run numbers, my strong intuition is that running such a shift in native code will be faster than running it in JavaScript (maybe WebGPU mitigates this performance impact, but for CPU I cannot see the page being able to do it faster).

It's not just shifting the pixel, they have to be reprojected. (Maybe this is what you meant?)

Wouldn't it be possible to do the same in JS/WebGL/WebGPU using a framework? If so and this approach is better for you, we could design that API. On Quest this API would have to be polyfilled.

I'm not sure I follow.

If your implementation would benefit from such an API but Quest can emulate it, that seems acceptable.

Meta now recommends that the depth buffer is rendered into a shadow map using the fov and pose. The texture, pose or fov will NOT update every frame and this would allow reuse of the shadowmap. This is another attribute that needs tobe exposed.

Do you have a link to this recommendation? Presumably it's surfaced as a demo/blog post for developers?

I couldn't find any docs but here is a link to our sample that uses this technique: https://github.com/meta-quest/Meta_OpenXR_SDK/tree/1c060299abe66e325dec6f36a2d506fda52f31cc/Samples/XrSamples/XrPassthroughOcclusion

cabanier commented 5 months ago

documentation on the OpenXR depth extension: https://developer.oculus.com/documentation/native/android/mobile-depth/

alcooper91 commented 5 months ago

It's not just shifting the pixel, they have to be reprojected. (Maybe this is what you meant?)

Yeah, reproject.

documentation on the OpenXR depth extension: https://developer.oculus.com/documentation/native/android/mobile-depth/

Thanks! I'll take a look, FYI that the other link you sent 404s for me (Not sure how Github handles private repos, I guess 401 could leak information).

cabanier commented 5 months ago

documentation on the OpenXR depth extension: https://developer.oculus.com/documentation/native/android/mobile-depth/

Thanks! I'll take a look, FYI that the other link you sent 404s for me (Not sure how Github handles private repos, I guess 401 could leak information).

Sorry about that! I didn't notice that it was a private repo. If you install the Meta OpenXR SDK, it should be part of the samples.

immersive-web / depth-sensing

`normDepthBufferFromNormView` in XRDepthInformation is not a rigid transform #45