w3c / media-capabilities

Media Capabilities API
https://w3c.github.io/media-capabilities/
Other
78 stars 33 forks source link

Retrieving RTCRtpCodecCapability from MediaCapabilities when queried for webrtc #185

Open youennf opened 2 years ago

youennf commented 2 years ago

As discussed in https://github.com/w3c/webrtc-svc/issues/49, while Media Capabilities have a webrtc mode, it might be difficult to actually call https://w3c.github.io/webrtc-pc/#dom-rtcrtptransceiver-setcodecpreferences from Media Capabilities results. The issue is that setCodecPreferences takes a RTCRtpCodecCapability value, which the web application currently would have to query from WebRTC API after calling MC or rebuild the RTCRtpCodecCapability itself.

To improve this, a straightforward approach would be that MediaCapability would provide a RTCRtpCodecCapability field, say inside MediaCapabilitiesInfo, when the configuration type is webrtc.

youennf commented 2 years ago

@chcunningham , you seemed ok with this approach in the webrtc thread. Do you have any additional feedback? Should we loop in additional people or is it reasonable to start writing a PR?

drkron commented 2 years ago

I'm happy to see that there's an interest in using MediaCapabilities API for WebRTC.

I think that an example is helpful to understand how the API would be used. So the API is called something like this

let mediaConfig = {`
  type: 'webrtc'.`
  audio: {
    contentType: 'audio/opus',
    channels: '2',
    bitrate: 132266,
    samplerate: 48000
  },
  video: {
    contentType: 'video/VP9; profile-id=1',
    width: 1280,
    height: 720,
    bitrate: 1234567,
    framerate: '25'
  }
};

result = await navigator.mediaCapabilities.decodingInfo(mediaConfig);

and given the proposed PR, result would be something like this:

result = {
  supported: true,
  smooth: true,
  powerEfficient: false,
  webrtcCodec: {
    clockRate: 90000,
    mimeType: 'video/VP9',
    sdpFmtpLine: 'profile-id=0'
  }
}

result.webrtcCodec could next be used as input to setCodecPreferences() to select this as the preferred video codec for this transceiver.

Is this a correct understanding?

Do we need a corresponding entry for the audio codec?

I think that a drawback of the MediaCapabilities API in this context is that the user needs to know the available codecs since there's no way to get a list of all codecs. This may not be a problem in practice though since there are only a few codecs to choose from.

youennf commented 2 years ago

Is this a correct understanding?

Yes

Do we need a corresponding entry for the audio codec?

Oh right, I forgot about a combined query. I guess we could go with webrtcCodec.audio/webrtcCodec.video or webrtcAudioCodec/webrtcVideoCodec. Wdyt?

I think that a drawback of the MediaCapabilities API in this context is that the user needs to know the available codec

The use case here is mainly to select one particular codec or a small list of preferred codecs. Agreed this does not work super well for the case of reordering all webrtc codecs but I am not sure how used/useful that is.

drkron commented 2 years ago

Oh right, I forgot about a combined query. I guess we could go with webrtcCodec.audio/webrtcCodec.video or webrtcAudioCodec/webrtcVideoCodec. Wdyt?

I have a slight preference for webrtcCodec.audio/webrtcCodec.video since it more clearly groups the WebRTC specifics, but I don't have a strong opinion.

chcunningham commented 2 years ago

Should we loop in additional people or is it reasonable to start writing a PR?

Looking at the PR now

I have a slight preference for webrtcCodec.audio/webrtcCodec.video since it more clearly groups the WebRTC specifics, but I don't have a strong opinion.

+1

chcunningham commented 2 years ago

PR generally looks good, but I have one concern: how should we set clockrate if the system / codec supports multiple rates?

For example, I see getCapabilities() returns two entries with mimeType: "audio/ISAC". The first has clockRate: 16000 the second has clockRate: 32000.

For the other members of RTCRtpCodecCapability can all be derived from the input configuration.

Should we follow that model for clockRate, adding it to the input dictionary? Would it be reasonable to instead always take the top clock rate? Open to alternatives... my WebRTC familiarity isn't strong enough for me to make a firm suggestion.

youennf commented 2 years ago

how should we set clockrate if the system / codec supports multiple rates?

Looking specifically at ISAC, it has wideband (16 KHz sample rate, 16KHz clock rate) and super-wideband (32 KHz sample rate, 32KHz clock rate). I believe https://www.w3.org/TR/media-capabilities/#dom-audioconfiguration-samplerate could be used for selection.

If there are several matching potential codec configurations, the default one (in terms of getCapabilities list order) should probably be selected. In the ISAC case, that would mean wideband for Chrome.

Should we follow that model for clockRate, adding it to the input dictionary?

That could be useful if we want to tackle the case of a codec with a sample rate but several clock rates. I am not sure how useful that is in practice and would treat it as a separate issue.

@aboba, @alvestrand, thoughts?

chcunningham commented 2 years ago

@aboba @alvestrand friendly ping for thoughts.

I follow the idea. I'm not RTC savvy enough to say whether sample rate and clock rate can/should be tied in this way. Also, sample rate is not currently required for MC, so the defaulting scenario is potentially real. We could alternatively make sample rate required (just for RTC) if that is desirable.

mimeType and sdpFmtpLine are found in our contentType string

I wan't to revisit this. I note that the sdpFmtLine can be pretty long in some cases... for ex: "mimeType": "video/H264", "sdpFmtpLine": "level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f"

@drkron I know profile-level-id is covered by contentType. What about about level-asymmetry-allowed and packetization-mode?

drkron commented 2 years ago

The sdpFmtpLine is more or less the same as the parameters of the mime/media type so they are covered by contentType as well. Here's the full list of H264 parameters, https://datatracker.ietf.org/doc/html/rfc6184#section-8.1 but I think that it's only level-asymmetry-allowed, packetization-mode, and profile-level-id that are used in WebRTC at the moment.

chcunningham commented 2 years ago

Thanks @drkron. Any thoughts on the clock rate question?

alvestrand commented 2 years ago

There's an 1-to-1 relationship between the triplet of "m=" line, the "a=rtpmap" line and the "a=fmtp" line and the MIME type (with parameters).

The "video" part of the MIME type lives on the m= line, the "h264" or "vp8" part of the MIME type lives on the a=rtpmap line, and the rest of the parameters live on the a=fmtp line. This is defined in the registration rules for RTP mime types (forgive me for not having the RFC number handy).

In addition, the clock rate and the number of channels (for audio) are supposed to be represented in the MIME type format as parameters.

For audio, the clock rate is an important parameter that the user needs to select for, so it needs to be part of the input parameters. For video, it's always 90000, so nobody cares about it.

chcunningham commented 2 years ago

Thanks @alvestrand. Let me try to combine your insights w/ my ISAC example. Relevant lines from my createOffer() call are

so we have "audio/ISAC ...", but I'm not sure how best to include the clockrate since it is not part of the a=fmtp line (no such line for this codec). Would it be correct to pass "audio/ISAC/1600" as the mime type for MediaCapabilities? Note that this has implications for other codecs; opus would then become "audio/opus/48000/2".

alvestrand commented 2 years ago

The critical RFC is RFC 3555 section 3 - a=rtpmap:103 ISAC/16000 should result in a MIME type of audio/isac; rate=16000

On Thu, Jan 13, 2022 at 9:02 PM chcunningham @.***> wrote:

Thanks @alvestrand https://github.com/alvestrand. Let me try to combine your insights w/ my ISAC example. Relevant lines from my createOffer() call are

  • m=audio 9 UDP/TLS/RTP/SAVPF 111 63 103 104 9 0 8 110 112 113 126
  • a=rtpmap:103 ISAC/16000
  • a=rtpmap:104 ISAC/32000

so we have "audio/ISAC ...", but I'm not sure how best to include the clockrate since it is not part of the a=fmtp line (no such line for this codec).

— Reply to this email directly, view it on GitHub https://github.com/w3c/media-capabilities/issues/185#issuecomment-1012464590, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADVM7KWGQZOQLMX3COX6JDUV4VWJANCNFSM5IJBSRRQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

chcunningham commented 2 years ago

Thanks @alvestrand. Do you think rate should be a strictly required part of the mime (at least for audio)? Earlier in the thread we talked about maybe defaulting to whatever would otherwise come first in terms of getCapabilities list order. If we return the a RTCRtpCodecCapability as part of the output MediaCapabilitiesInfo (per Youenn's PR), callers could inspect the return to know what we defaulted to.

alvestrand commented 2 years ago

RFC 3555 says "

  Required parameters
     If the payload format does not have a fixed RTP timestamp clock
     rate, then a "rate" parameter is required to specify the RTP
     timestamp clock rate.  A particular payload format may have
     additional required parameters.

There's no sense in having a rate on video, because it's always 90000. A default clock rate makes sense for audio/PCMA and audio/PCMU, because it's nearly always 8000. For nearly all other payloads I can think of offhand, the rate should be a required parameter.

On Fri, Jan 14, 2022 at 6:29 PM chcunningham @.***> wrote:

Thanks @alvestrand https://github.com/alvestrand. Do you think rate should be a strictly required part of the mime (at least for audio)? Earlier in the thread we talked about maybe defaulting to whatever would otherwise come first in terms of getCapabilities list order. If we return the a RTCRtpCodecCapability as part of the output MediaCapabilitiesInfo (per Youenn's PR), callers could inspect the return to know what we defaulted to.

— Reply to this email directly, view it on GitHub https://github.com/w3c/media-capabilities/issues/185#issuecomment-1013317848, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADVM7O7GEPVXVM2XAFX24TUWBMQLANCNFSM5IJBSRRQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

chcunningham commented 2 years ago

@drkron thoughts on the above? We currently aren't enforcing any requirements on having a rate. For example, one of the wpt tests expects supported=true for audio/ISAC

drkron commented 2 years ago

My thoughts on this is that it seems most in line with what's returned from RTCRtpReceiver.getCapabilities("audio") to specify channels and samplerate as explicit dictionary members of AudioConfiguration instead of specifying them as MIME type parameters (although this is an option according to specs). The clockRate can probably be deduced from the MIME type and samplerate? If channels/samplerate have not been specified it sounds good to me to use whatever come first in terms of getCapabilities as default value.

However, I think that @alvestrand and @youennf are the experts here so I wouldn't argue if they have a different opinion.

chcunningham commented 2 years ago

I follow. My leaning is to do whatever is easiest for API users. If RTC APIs typically break out components of samplerate etc, I agree it makes a compelling case for MC to do the same. I defer to RTC folks to build consensus on that. My main priorities are to ensure MC requires enough that inputs are clearly defined and meaning is always unambiguous.

alvestrand commented 2 years ago

Either representing as part of the MIME parameters or as separate attributes works, technically. I have a weak preference for separate attributes.

chcunningham commented 2 years ago

I think we're mostly converged. Let's wrap this up by planning out how to amend @youennf's PR (#186).

Currently the PR has a few sentences like:

... set webrtc’s audio to a RTCRtpCodecCapability dictionary representing the supported audio configuration.

We should add some steps here to clarify how construct the RTCRtpCodecCapability. The dictionary consists of

dictionary RTCRtpCodecCapability {
  required DOMString mimeType;
  required unsigned long clockRate;
  unsigned short channels;
  DOMString sdpFmtpLine;
};

Using our discussion above to map that from MediaCapabilities inputs, we have

@drkron @alvestrand does this match your expectations? @youennf thoughts?

aboba commented 2 years ago

If the goal is to replace getCapabilities() entirely, you'd also need to return info on "codecs" like rtx, ulpfec, red, flexfec, etc. To see what would be returned, look here.

chcunningham commented 2 years ago

@alvestrand @youennf - thoughts on the last 2 comments?

youennf commented 2 years ago

If the goal is to replace getCapabilities() entirely, you'd also need to return info on "codecs" like rtx, ulpfec, red, flexfec, etc.

As discussed during the last WebRTC WG meeting, the goal is not to fully replace getCapabilities() entirely, just the real media codecs that for instance WebTransport+WebCodecs say could be interested in.

We should add some steps here to clarify how construct the RTCRtpCodecCapability.

Sounds good.

@drkron @alvestrand does this match your expectations? @youennf thoughts?

Overall, this looks good. @alvestrand mentions that clockRate would be a required member and I wonder whether we could have it as an optional parameter (if not provided, use default values as currently being done by getCapabilities). Maybe we could leave clockRate to a follow-up PR?

I am also still unclear about whether we are fine/want defaulting rules or not. Say if I do not provide channel, or I just provide 'video/H264' or just 'video/H264'+profile-level-id, can I still get a valid capability dictionary>? Having default rules might be more web dev friendly so it is appealing to me.

If we anticipate WebTransport+WebCodecs RTC applications to use the 'webrtc' MC type (are we?), it seems we should not require these applications to pass parameters that would be RTP/SDP specific.

alvestrand commented 2 years ago

I'm happy with chcunningham's proposed algorithm for constructing an RTCRtpCodecCapability. I don't quite understand "webrtcCodec.audio.mimeType = the audio/* part of mcInput.audio.contentType" - could you give an example of the strings that would be used?

RTCRtpReceiver.getCapabilities('audio') currently returns (example):

mimeType: "audio/opus"; sdpFmtpLine: "minptime=10;useinbandfec=1"

What's passed in as mcInput.audio.contentType?

aboba commented 2 years ago

Discussion at April 2022 WEBRTC WG meeting is here. Summary is that there is little interest in having Media Capabilities return info on "fake" codecs (e.g. telephone-event, CN, FEC, RTX, RED).

chrisn commented 1 year ago

Minutes from April 12 2022 Media WG meeting: https://www.w3.org/2022/04/12-mediawg-minutes.html (preceded the WebRTC meeting)