w3c / mediacapture-transform

MediaStreamTrack Insertable Media Processing using Streams
https://w3c.github.io/mediacapture-transform/
Other
44 stars 20 forks source link

Video data rotation should be explicit #65

Open tangobravo opened 2 years ago

tangobravo commented 2 years ago

When obtaining live camera data in native apps, the data is provided in the sensor-native orientation. There's usually an API to determine the orientation of the sensor relative to the device's natural orientation, and another to get the current screen rotation vs natural orientation. The combination of all of that allows developers to determine the required rotation of the frames for display so they appear the right way up.

This feels correct to me, as the rotation of the frames is fundamentally a display transform - if the screen rotation occurs between camera frames then the same frame will need to be rendered with a different rotation transform for the different screen orientations.

"Rotating to be the right way up" is implicit on the web - setting srcObject of a <video> to a MediaStream just automatically applies the required rotations so things appear the right way up. Before this specification, access to the pixel values in the video involved rendering the <video> node to a canvas and reading back the data - so drawImage or texImage2d would need to be called on the main thread, and screen.orientation could be relied on at that point to determine the orientation of the frame.

MediaStreamTrackProcessor supports access to frames through a stream interface, often on a Worker - decoupled from the main thread and where screen.orientation is unavailable.

My preference would be that the data in the VideoFrame is never rotated and is always in a consistent orientation, along with some way of finding out how that image orientation relates to the device coordinate system from https://w3c.github.io/deviceorientation/#deviceorientation.

I think we at least need an orientation attribute in VideoFrame to make this important data available. Ideally I'd add a constructor option to MediaStreamTrackProcessor to request unrotated frames too. I'm doing frame-to-frame comparisons so have to undo any rotation on the data for that to work well.

I suspect people generating synthetic streams with MSTG would also prefer to do so in a fixed orientation without needing to respond to screen orientation updates. AFAIK videos are generally encoded in sensor-native orientation with metadata describing rotation, so there are probably generally applicable issues here for WebCodecs too.

tangobravo commented 2 years ago

https://github.com/w3c/webcodecs/issues/351 is the issue in WebCodes discussing orientation metadata for VideoFrame.

tangobravo commented 2 years ago

I think the orientation for VideoFrame is not quite what is needed in this case actually.

In a camera app you'd always save out the image in native sensor orientation, and then set the frame's orientation metadata based on the device orientation at the point of capture - so no relation to the screen orientation at that point, and it's not clearly defined what the correct option is if the device is at a 45 degree angle or horizontal.

Media capture is a bit different - on a mobile device at least both the sensor and the screen are at fixed relative orientations, so to display it the right way up you need to align the appropriate edge of the sensor with the same edge of the screen - ie there's always a well-defined rotation that's a multiple of 90 degrees, and it only depends on the screen orientation at the time of display.

In the current Chrome implementation, the actual data provided by copyTo has been rotated so it's the right way up for a particular screen.orientation (generally the current one, but it seems undefined exactly when in the pipeline that would be checked).

I think VideoFrame needs some metadata screenOrientation or suchlike so we know the screen.orientation that was assumed when the data was rotated. Then on rendering at any point in the future an appropriate transform can be computed for that frame.

alvestrand commented 2 years ago

This would seem to be a general MediaStreamTrack issue rather than a MediaStreamTrackProcessor issue. When we produce a VideoFrame from a MediaStreamTrack, the resulting orientation metadata should be accurate; orientation is also a required function when sending a MediaStreamTrack over a PeerConnection, so it's not a new thing.