w3c / mediacapture-extensions

Extensions to Media Capture and Streams by the WebRTC Working Group
https://w3c.github.io/mediacapture-extensions/
Other
19 stars 14 forks source link

Add powerEfficientPixelFormat #59

Closed henbos closed 2 years ago

henbos commented 2 years ago

Fixes #13.

Add a constraint to allow the application to influence the performance/quality tradeoff. For example we can require the the pixel format is efficient. In practise this could be used to avoid MJPEG decoding load of high resolution USB 2.0 cameras while still allowing high resolution uncompressed NV12 on USB 3.0. The application does not have to care about specific pixel formats, but it can constrain on "efficiency".


Preview | Diff

henbos commented 2 years ago

@handellm

jan-ivar commented 2 years ago

For example we can require the the pixel format is efficient.

When you say "require", I'm not a fan of sites restricting what cameras can be used, as this limits user choice.

But luckily, this PR doesn't do that, since you've not modified the allowed required constraints for device selection.

In practise this could be used to avoid MJPEG decoding load of high resolution USB 2.0 cameras while still allowing high resolution uncompressed NV12 on USB 3.0. T

So:

Avoiding power-inefficient modes of a user-chosen camera seems desirable. But why would a user agent need app input to do this? If it has this ability and knowledge, shouldn't it apply it by default? And if it's the default, isn't it better for the app to constrain on what it wants, instead of what it doesn't want? E.g. {height: {min: 1080}} for quality over performance? That way the browser can do the power-efficient thing when it can, and only resort to other modes when it has to to meet required app constraints.

The answer to this question may be that it's not web compatible at this point. Just wanted to raise it.

youennf commented 2 years ago

powerEfficient seems to be something difficult to define so I am not sure I like it. For instance {height: 1080, powerEfficient: true} could lead to choosing a frame rate of 1 frame per second?

I would also echo Jan-Ivar question about UA trying to choose the most optimised pixel format it can by default. The question is: Are some web apps interested in selecting non optimised format (MJPEG say might be handy in some cases)?

If so, an API that may be useful would be something like:

henbos commented 2 years ago

Reply to @jan-ivar

Avoiding power-inefficient modes of a user-chosen camera seems desirable. But why would a user agent need app input to do this? If it has this ability and knowledge, shouldn't it apply it by default? And if it's the default, isn't it better for the app to constrain on what it wants, instead of what it doesn't want?

The rationale for adding an additional constraint was this: "How can the user agent know what to prefer (quality or performance) if the app does not say? And how can the app say what it wants, if it cannot say what to prefer (quality or performance)?"

But it is true, the user agent could default to preferring performance such that if you ask for ideal:1080p you'd get 480p or 720p based on efficiency. In practise 720p would also be an inefficient format on USB 2.0, but limiting the app to 480p seems a bit excessive for someone asking for 1080p!

The answer to this question may be that it's not web compatible at this point. Just wanted to raise it.

It's hard to imagine not causing web compatibility issues without a new constraint. And the fitness distance algorithm would become less relevant if user agents started to add its own fitness distance penalty based on performance into the mix. We might be able to solve it without a constraint like you say, but I'm not sure it's the path we want to go.

So:

  • {height: 1080, powerEfficient: true} might yield <1080p if it avoids MJPEG over USB 2.0?

Wouldn't {height:1080} ensure exact 1080 or fail? But ideal height and exact powerEfficient might yield <1080p, yes. However to avoid 480p you might want to use advanced constraints to ensure powerEfficient only plays a role in Full HD. (That or multiple GUM calls?)

henbos commented 2 years ago

Reply to @youennf

For instance {height: 1080, powerEfficient: true} could lead to choosing a frame rate of 1 frame per second?

Yes. You would have to be careful with powerEfficient since high resolution efficient formats may have worse fps. To avoid frame rate degradation, it would be wise to specify a min frameRate as well.

If so, an API that may be useful would be something like:

  • Open the camera, apply some settings.
  • Expose some additional settings that can be set, including available native pixel formats.
  • Let the web application decide which native pixel format it wants (the default being the UA preferred one).

If we gave the app full control of camera capture formats then we might want to go down this route. But I think that is a separate issue.

But for now, I don't think the app would or should care about specific pixel formats. It should only care if it's powerEfficient or not. Even the big 'baddie' MJPEG could be powerEfficient if there is a hardware decoder available, and I don't think we need to expose this level of detail to the app for what I am trying to solve here.

henbos commented 2 years ago

powerEfficient seems to be something difficult to define so I am not sure I like it.

Efficiency is hard to define, but that doesn't mean it's not useful (the other powerEfficient being a prime example of this). I still think that a separate constraint is more easy to define than fitness distance penalties.

henbos commented 2 years ago

Show of hands?

Proposal A:

Proposal B:

jan-ivar commented 2 years ago

I think I prefer A, because it's declarative and satisfies web compat: apps have come to expect 1080p when they ask for height: 1080 without other constraints. Breaking this and pushing apps back into crafting multiple complex constraints queries seems like a step backwards.

With fitness distance, we worked hard to walk back the constraints language from something extremely complicated (min/max/exact/advanced) to something more declarative.

youennf commented 2 years ago

With fitness distance, we worked hard to walk back the constraints language from something extremely complicated (min/max/exact/advanced) to something more declarative.

Adding a powerEfficient constraint is extremely complex and does not scale well. First the more you add constraints, the less predictable the fitness distance makes sense (adding carrots and leeks...). Second, powerEfficient is a very loose term (at least in MC, we have a common understanding about HW support). I do not expect A to be a viable interoperable solution.

There are alternatives we could consider by splitting powerEfficient in two:

  1. Expose available natively supported pixel formats. The good news is that we already have a definition of pixel formats at https://www.w3.org/TR/webcodecs/#pixel-format. It might be ok to expose it as a constraint (sigh... mainly for applyConstraints use though).

  2. Allow the web application to select a set of constraint values that perfectly matches a natively supported camera configuration. To mimic what OS APIs are doing (a good thing in general), we can expose presets like discussed in https://github.com/w3c/mediacapture-extensions/issues/12 (which does mention exposing pixel format). Presets would include available camera native pixel formats at native resolution/frame rates. 'powerEfficient' is then defined by web applications according what the camera can offer and what the web application needs, not by user agents.

henbos commented 2 years ago

That would be Proposal C: Expose camera presets to web pages.

The app would not care about VideoPixelFormat per se, only if it is or isn't avoiding software decoding in the conversion between native capture and whatever the rest of the UA pipeline is using. So I would suspect that even in Proposal C, you'd end up with something like powerEfficient in the presets anyway. In which case "presets versus constraints" becomes a tangental discussion to whether or not powerEfficient is a good idea.

Second, powerEfficient is a very loose term (at least in MC, we have a common understanding about HW support).

Would you still oppose this if the definition was as well defined as MC's powerEfficient? I'd be happy with powerEfficient: This capture format avoids software decoding of MJPEG.

youennf commented 2 years ago

Would you still oppose this if the definition was as well defined as MC's powerEfficient?

I do not think this is possible. powerEfficient for MC is restricted to the encoder or decoder. That would mirror here to scoping the definition to cameras. Cameras that are outputting MJPEG can be considered power efficient if they are doing this natively, in HW say.

Your definition of powerEfficient encompasses the whole video pipeline, not just the camera. If an application does media recording in JS using MJPEG directly, this qualifies as power efficient.

I'd be happy with powerEfficient: This capture format avoids software decoding of MJPEG.

If what you are after is avoiding MJPEG video frames, a dedicated API (do not use MJPEG) can be built for that. This seems like an edge case that web developers will not be aware of and where the UA default is of prime importance. I would not use the term powerEfficient for that API.

Exposing pixel format selection instead seems to fix the issue and would be useful outside of this edge case. Now that we are exposing camera pixel format with media capture transform to JS, the actual pixel format being used by cameras might become an interop issue.

So I would suspect that even in Proposal C, you'd end up with something like powerEfficient in the presets anyway

Not really. If a preset is used, it will be powerEfficient with regards to the camera. It is then up to the web application, that is designing the video pipeline, to see whether the camera preset output matches the rest of the pipeline. If the pipeline is RTCPeerConnection, YUV is probably the best choice. If the pipeline is WebCodecs, MC will tell what pixel format seems best. If the pipeline is video analytics or canvas, RGB might be a better choice.

henbos commented 2 years ago

I'm not trying to create a pixel format picking API, I'm only trying to avoid software MJPEG decoding.

henbos commented 2 years ago

Exposing pixel format selection instead seems to fix the issue and would be useful outside of this edge case.

I believe "SW MJPEG or not?" is the main case, not the edge case. That's when there's a big performance hit.

In theory, you could have different formats, including RGB. But in practise, besides MJPEG, NV12 and YUY2 are the formats exposed by popular webcams and as far as I know NV12 is always better, so the browser can decide.

I am operating under the assumption that pixel format selection is not a significant concern today (@handellm please sanity check, maybe RGB cameras are more popular than I think), but if that is incorrect, my arguments may fall.

I would not use the term powerEfficient for that API.

Okay, can you come up with a better name? Or are you saying I should abandon this PR?

youennf commented 2 years ago

I can see a few choices:

  1. A pixelCapturingFormat constraint with values like nv12, I420, mjpeg...
  2. A useMJPEG or disallowMJPEG boolean constraint

Both are fine to me. I tend to prefer the first one as it might be useful to applications using media capture transform. If we ship 2 and later on want to do 1, we might have to expose two constraints for the same parameter.

Note that in both cases, web applications will need to check the result of getUserMedia and might sometimes have to call applyConstraints to actually opt out of MJPEG. That is where exposing presets will make web applications life so much easier.

henbos commented 2 years ago

I’d prefer to disallow MJPEG (what not to do) than to constrain on a specific format (what to do) since I’m happy to let the browser decide and I don’t think constraints can easily express ”anything but”, so I like disallowMJPEG.

But one thing: if HW decoding is available there’s no reason to avoid it.

henbos commented 2 years ago

Alternatively if it was possible to check if HW decoder availability with a different API, one could constrain disallowMJPEG with !hasHW. But disallowMJPEG would be good enough in the meantime

youennf commented 2 years ago

HW decoding is available there’s no reason to avoid it.

HW decoding is still decoding, and is tied to using video frames, not the camera. Media Capabilities might anyway provide you this information so that the web application can decide what to do.

henbos commented 2 years ago

👍

jan-ivar commented 2 years ago

"disallow" implies exact. How about avoidSoftwareMJPEG (or avoidMJPEG)?

jan-ivar commented 2 years ago

Also, as much as I like to pick on software MJPEG as the next person, I confess I don't follow the arguments for not just calling this powerEfficient, as a more abstract expression of the app's concern here. The scope here is camera settings.

For instance {height: 1080, powerEfficient: true} could lead to choosing a frame rate of 1 frame per second?

Yes it could, because in theory {height: 1080} could also lead to choosing a frame rate of 1. But is it a realistic concern?

henbos commented 2 years ago

I confess I don't follow the arguments for not just calling it powerEfficient.

I think the argument is to separate ”how to open the camera” from ”how to use the frames” which would in theory be a later step in the pipeline.

In Chrome, decoding happens either inside the capture process or even earlier by the OS, but in theory the implementation could perform decoding much later depending on what API is consuming the frames.

henbos commented 2 years ago

In macOS 12 for instance, MJPEG is performed by the OS before it even reaches Chrome where it arrives as NV12. This causes performance issues that we’d like to avoid using this constraint. In this case decoding step and capture step is not entirely separate

jan-ivar commented 2 years ago

In this case decoding step and capture step is not entirely separate

Ok, so wouldn't one option be to scope this efficiency camera constraint to those cases?

HW decoding is still decoding, and is tied to using video frames, not the camera. Media Capabilities might anyway provide you this information so that the web application can decide what to do.

@youennf Ok, so instead of

  const stream = await navigator.mediaDevices.getUserMedia({video: {height: 1080, powerEfficient: true}});

...you're proposing apps write the following?

  const isHW = (await navigator.mediaCapabilities.decodingInfo({
    type: 'webrtc',
    video: {
      contentType: 'video/x-motion-jpeg', // or 'video/x-jpeg'?
      width: 1920,
      height: 1080,
      bitrate: 6000000, //?
      framerate: 30
    }
  }).powerEfficient;
  const stream = await navigator.mediaDevices.getUserMedia({video: {height: 1080, avoidMJPEG: !isHW}});

From my reading of the spec, that's querying "WebRTC receive capabilities" (vs. "WebRTC send capabilities" for encodingInfo) which sounds like an RTCPeerConnection source (a transceiver.receiver.track), not a local camera.

henbos commented 2 years ago

A constraint that tells getUserMedia() to avoid MJPEG decoding would only sensibly be used by an app that does not have an MJPEG-only pipeline. I think this makes it OK not to entirely separate capture from decoding (argument in favor of powerEfficient). But if we can’t do that, avoidMJPEG still solves many issues.

But wrt Jan-Ivar’s comment: I don’t think there currently exists an API for isHW that maps 1:1 to this, because the MJPEG decoder involved here is not from the same set of decoders as MediaCapabilities queries (e.g. if the OS does it for us prior to frames entering the browser this could be separate than what decoders the browser has implemented for other APIs in a pipeline).

henbos commented 2 years ago

Show of hands?

Proposal A:

Proposal B:

Proposal C

alvestrand commented 2 years ago

Proposal B gets my vote. I don't care much about names, as long as the description is clear, but avoidInefficientMjpeg has the advantage of being explicit about what it's doing.

youennf commented 2 years ago

Oh, so I misunderstood the situation: the MJPEG decoding happens very close to camera level, not downstream and MJPEG frames will never be flowing to the video pipeline.

Also WebCodecs does not have a MJPEG frame, so as soon as we would plug a track transform, the UA would have to convert MJPEG frames to whatever is most appropriate.

I would be tempted to name the constraint to avoidPixelFormatConversion or avoidCompressedPixelFormat, the problem is related to any compressed frame format, not to MJPEG.

henbos commented 2 years ago

Oh, so I misunderstood the situation: the MJPEG decoding happens very close to camera level, not downstream and MJPEG frames will never be flowing to the video pipeline.

That's correct. MJPEG gets decoded regardless.

I would be tempted to name the constraint to avoidPixelFormatConversion or avoidCompressedPixelFormat, the problem is related to any compressed frame format, not to MJPEG.

Conversions may occur somewhere either at capture or later in the pipeline (NV12 to I420 or RGB or YUY2 to NV12), so avoidPixelFormatConversion doesn't quite describe this. But avoidCompressedPixelFormat works, since that's what this is about.

@youennf Regarding whether or not "isHW" should be baked into this or a separate API... should we go with avoidCompressedPixelFormat or avoidInefficientCompressedPixelFormat for now?

handellm commented 2 years ago

Going back to the original problem (increased power consumption on MJPEG), I think some variant of avoidInefficientCompressedPixelFormat, powerEfficient, "isHW" makes sense. This is only a problem when there is no hardware support for MJPEG decode.

Saying that, avoidCompressedPixelFormat is also useful as a constraint for quality reasons, (if compressed means lossy in this context).

handellm commented 2 years ago

Elaborating a little more on the availability of hardware MJPEG decode - in Chrome this is deployed and used on Windows MediaFoundation and on ChromeOS devices.

youennf commented 2 years ago

This is only a problem when there is no hardware support for MJPEG decode.

Do we have data supporting this? I would guess that an additional decoding step is always more expensive, not sure by how much, but still.

in Chrome this is deployed and used on Windows MediaFoundation and on ChromeOS devices.

Does this mean that Chrome would prefer selecting MJPEG pixel format in those platforms to match web app resolution constraints but not in other platforms lacking MJPEG HW decoders?

If so, avoidInefficientCompressedPixelFormat makes some sense.

henbos commented 2 years ago

Yes avoiding MJPEG only when there is no HW support is the use case we have in mind.

I updated the pull request to avoidInefficientCompressedPixelFormat. Please take a look

handellm commented 2 years ago

This is only a problem when there is no hardware support for MJPEG decode. Do we have data supporting this?

We have internal data that supports this, on Mac, Windows and ChromeOS platforms.

henbos commented 2 years ago

Ping. What is the next step? Review and merge or is there still no consensus here?

jan-ivar commented 2 years ago

Sorry I dropped the conversation here.

Oh, so I misunderstood the situation ...

@youennf were your objections to powerEfficient based on that misunderstanding? This still seems like the better name to me now that we're clear this is about inefficiencies in camera decoding. I also have some review comments on the naming since this is a constraint.

youennf commented 2 years ago

Other PRs are adding new constraints in their own section which seems more tractable.