immersive-web / raw-camera-access

Spec draft: https://immersive-web.github.io/raw-camera-access/. Repository for experimentation around exposing raw camera access through WebXR Device API. Feature leads: Piotr Bialecki, Alex Turner, Nicholas Butko
Other
38 stars 12 forks source link

Interoperability and compatibility - web developers support #13

Open bialpio opened 1 year ago

bialpio commented 1 year ago

I'm preparing to kick off blink launch process for Raw Camera Access API, and one part of the Intent To Ship asks for signals from web developers - please take a look and let me know what you think! If my understanding is correct, we're looking for feedback along the lines of "the API [does / does not] solve my use case with [no / some / major] issues/workarounds needed", but don't feel limited by this formula!

So far the I'm aware of an issue around API ergonomics (see comment) - I think this can be tackled at a later stage, let's get the API out the door first and see what the main pain points are.

Pinging folks that may have some thoughts here: @nbutko, @tangobravo, @mrdoob, @elalish.

elalish commented 1 year ago

Regarding getting the image back on the CPU, it seems like readPixels should be perfectly sufficient. However, it does seem to be calling out for an XR sample.

tangobravo commented 1 year ago

This seems like a reasonable, low-overhead way to expose access to the underlying camera pixels for rendering full-screen effects.

It's certainly a little odd for Computer Vision use-cases when compared to getUserMedia - the frame is at screen res rather than a selectable camera res, and is cropped to the screen size, camera model would need reverse-engineering from the view matrices.

As @elalish mentions readPixels is a reasonable way to get at the data on the CPU, and it would be possible to render to a smaller framebuffer and readPixels on that to reduce the readPixels overhead. So CV use-cases would be "somewhat supported" by this, but a more async API built around camera frames ( perhaps just based on getUserMedia - https://github.com/immersive-web/webxr/issues/1295 ) would be a better fit.

The main question for me when considering to declare "public support" for this API addition is whether it would still provide value if a gUM-based proposal were also implemented.

The restriction to view-aligned cameras and synchronous access does seem likely to permit a more efficient implementation than one using MediaStreams. Therefore I'm happy to declare support for this API for full-screen rendering effect use cases, with the understanding that CV use cases are also possible but would likely transition to a gUM-based API if one were implemented.

For fear of sounding like a broken record, this is unlikely to lead to us using WebXR in practice on mobile, as it would still be subject to the issues we have with the implementation of the immersive-ar session on mobile.

shanumante-sc commented 1 year ago

This API solves the following limitations we have with WebXR:

  1. We can now run internal CV algorithms on top of the camera feed.
  2. We can provide the camera feed as a background for our rendering engine to render into (for cases where we might want to show a cropped feed to the user). We still need to clear the default WebXR framebuffer with some default value like (1,1,1,1) so that camera feed rendered into the default framebuffer is hidden.

Like others have mentioned, getUserMedia route would have been preferable, so that nothing needs to be changed in our existing pipeline to produce inputs for our AR engine. However, this API atleast helps us get started.

And I would like to reiterate the point made above that we still will have some challenges with regards to using WebXR on mobile with the restriction of using "immersive-ar" mode.

In a hopeful future where the "immesive-ar" restriction is lifted to give us 6dof poses in an inline session, do you expect this camera access API would also be made available in the inline session? Or can we just use gUM in that case?

nbutko commented 1 year ago

One issue that we ran into recently was not being able to access the texture in a supplied (non-webxr) rendering context. According to the engineer who worked on this, the spec allows for it but regardless of the context passed in, the base layer context is used. Does that sound expected?

Ideally we'd be able to access the texture in an offscreen context so that shaders that process the camera feed don't interfere with the rendering engine.

bialpio commented 1 year ago

Thanks for the responses, everyone!

@nbutko - can you shed some more light on what you're trying to do? We don't block offscreen contexts explicitly in our Chrome implementation, so I'd expect this to work. You mention "non-webxr rendering context" - can you elaborate on this? You shouldn't be able to create a XRWebGLBinding instance using non-XR-compatible context. Additionally, the access to the texture has to happen [within a request animation frame callback](https://immersive-web.github.io/layers/#:~:text=an%20opaque%20texture%20is%20considered%20invalid%20outside%20of%20a%20requestanimationframe()%20callback%20for%20its%20session.) - if you need to access it outside of the rAF loop, you'll need to copy the texture. If you still have trouble with it, can you file a Chrome bug at http://crbug.com? It seems that what you're experiencing may be an implementation issue in Chrome, not a spec issue itself.