Capturing what's on the display

blairmacintyre commented 6 years ago

I'm wondering if we should consider having some methods for capturing what's on the display using the appropriate native APIs.

In Hololens, for example, it's possible to trigger a capture of the view (composited with the camera). On Android and iOS, it's possible (in native apps) to use low level APIs to efficiently record what's on the display. Same for Windows and MacOS.

There is a html-media-capture API for capturing an image from a camera, and it's possible to capture images from video if you have access via WebRTC.

But, in a discussion with some developers working with the various AR APIs (Google's WebAR and Mozilla's WebXR), as well as past discussions with folks using WebVR, it seems like it would be useful to have a way to capture an image or video of the view on the "current" display (compositing all layers and things like video that might not be available to the page).

One could imagine those being saved and edited using native APIs (as is done in native apps), being made available to application-level "share sheets", and (for images at least) making them available to the javascript context (e.g., for use in the app, like uploading a thumbnail to a application server during a save, for example).

Obviously, such a capability would need to trigger a user-permission request.

I'm thinking about this here because we really want to leverage the native APIs tied to the AR/VR hardware.

Utopiah commented 6 years ago

Yes could be quite useful for at least 2 usages :

anchoring of the poor (ask user to snap the current view then align later)
visual memorization or note taking cf e.g. Note-Taking in Virtual Reality Using Visual Hyperlinks and Annotations

See also current limitations https://github.com/google-ar/WebARonARCore/issues/36

tbalouet commented 6 years ago

Definitely needed for experience sharing as well, I think that people will really want to share their XR experiences in the future. On the same note, the API should allow to screenshot/record only selected layers of the app. Those layers being the video feed, the 3D scene and the 2D DOM (in case the developer want the users to only record 3D + video without DOM by example)

cvan commented 6 years ago

Thanks for filing this. This topic comes up often and needs proper support IMO or at the least documentation on MDN of how to capture WebXR experiences.

There's the MediaRecorder API, which ships in release-channel Firefox, Chrome, and Chrome for Android (not yet in Edge nor Safari). If you have a mirrored canvas, it works quite well for capturing what the user is seeing in the HMD. I've used it for capturing A-Frame / three.js WebVR scenes quite well (without noticeable perf degradation).

Theoretically, you can capture an OffscreenCanvas too just fine. I realise you probably are more curious about what the headset is rendering, not necessarily that of the mirrored canvas context. For that, I know there are native media-capture APIs exposed in the SDKs for OpenVR and Samsung Gear VR (not sure abutout Rift and Daydream).

Let me know if you have questions about using this API. (I wrote a WebExtension for selecting canvases on a page, capturing them to videos, and providing downloadable videos.)

P.S. This issue has come up several times in Issues and discussions on the mailing list. May want to resolve this issue or reverse-"mark as duplicate" the other similar issues.

kearwood commented 6 years ago

I would add to these use cases, live streaming. Perhaps some incantation of getUserMedia could be used to capture the composited WebVR output?

I can imagine three variations to the capture:

1 - Create a stream from just one eye that is already rendered 2 - Create a stream from an additional "camera" in between the two eyes. 3 - Create a stream from a 3rd person perspective.

Unless every page includes its own capture functionality, the WebXR API would need to express the intent of the user and where the capture should come from.

For #1, this could be done by the UA transparently from the content and perhaps be exposed with a WebExtension API or used by the page itself. For #2 and #3, we would need to expose an additional 3rd "eye" or at least expose the transform of a virtual camera that content should submit frames to.

For the #3 scenario, content may wish to interact with the virtual camera and allow it to be moved around. A typical use case would be to allow users outside of VR to join in a teleconference.

I would suggest that any API that exposes composited output be explicitly defined in regards to UI and other private information that is composited by the browser on top of the canvas. Eg, the camera should exclude details composited by the browser itself (ie. user's bookmarks and history displays) and exclude content from other origins in a multitasking scenario. For many browser implementations, this would translate to generating the feed from the browser's submitted frames and layers rather than asking the lower-level VR/AR API to return a fully composited 2d preview.

if a fully-composited 2d preview is necessary, perhaps we could explore specialized WebExtension API's that are used at a higher privilege level.

toji commented 6 years ago

One thing to point out related to this issue: I recently asked on Twitter if devs were using preserveDrawingBuffer: true with VR content and why. Overwhelmingly the people who were using it (only 9% of respondents) indicated that they used it for taking screenshots. As a result, having an explicit capture mechanism would help us patch up around the last big use of preserveDrawingBuffer: true, which XRWebGLLayer won't support.

NellWaliczek commented 6 years ago

I believe this was the intent of our adding XRPresentationContext, though it hadn't originally been envisioned with a mechanism to turn on/off layers (mostly because we still only have one at a time!). When we start the multilayer discussion we should keep in mind that the XRPresentationContext may also need some attention.

I am curious though, if there is something that folks feel would need to be added immediately to XRPresentationContext or if these request are for the next round of spec work?

Utopiah commented 6 years ago

Somehow this doesn't seem mentioned here or in #254 but another use of the camera is for shape recognition (eventually via https://wicg.github.io/shape-detection-api/ if it gains traction) or more generally typical machine learning techniques (e.g. via https://github.com/PAIR-code/deeplearnjs ). This might seem too demanding with 1 camera or regarding resources but since the trend for both is to improve fast could be important to handle this use case. Related discussion https://twitter.com/braddwyer/status/957986699857948677

Actually partly covered in https://github.com/immersive-web/ideas/issues/4

blairmacintyre commented 5 years ago

@TrevorFSmith can we move this to the proposals repo, per the F2F discussion

thetuvix commented 5 years ago

There a few key benefits to having the UA handle mixed reality capture explicitly:

When WebXR supports multiple compositor layers, the app doesn't have any single canvas that represents the full composition of its virtual content. It would be wasteful for the app to do its own composition here each frame when that composition will also happen in the platform XR compositor.
For see-through devices, composition of the real world will occur optically, requiring explicit composition with the camera to achieve a full capture, which may itself be provided by the underlying platform for maximum quality. Having the UA manage mixed reality capture would let the app cover both capture of both see-through and pass-through devices with a common API.

TrevorFSmith commented 5 years ago

It seems like forward motion on this Issue has stalled so I'm going to close it. If you have a plan for how to make progress then ping me and we can re-open it.

ddorwin commented 5 years ago

See also the use cases in https://github.com/immersive-web/webxr/issues/694#issuecomment-501876257. There might also be some approaches related to Web Share for those particular use cases.

sjcobb commented 3 years ago

I'm sad to see discussion has stalled in this area (am I missing something?). This is the most important use-case for me currently, I'm working on a WebXR version of this prototype. I can build it the traditional way I've been doing using desktop software like DaVinci Resolve to composite the video and ThreeJS scene together, but I hate camera tracking and was hoping to capture everything in real-time in AR.

I know there are platform-specific ways to accomplish this goal, but I had frankly taken it for granted that this would be supported. Would building something at the library-level be a good approach (similar to this j360 demo)? I'm thinking a module to use with Three.js when xr renderer is enabled, that allows you to record the full composited video or export out individual layers separately (if you want to apply different effects / color grading after the fact).

cabanier commented 3 years ago

The WebXR spec added support for secondary views that are designed to capture output. The idea is that the author can request them and if available they would render the scene to them but without controllers or other elements that are just for the primary viewer.

immersive-web / proposals

Capturing what's on the display #36