w3c / encrypted-media

Encrypted Media Extensions
https://w3c.github.io/encrypted-media/
Other
180 stars 80 forks source link

Discussion of MPEG proposal to include encryption format in the codec string #400

Closed ddorwin closed 4 years ago

ddorwin commented 7 years ago

This issue is not directly related to EME, but I think it warrants public discussion since it could affect future EME-using apps and potentially other web APIs. The discussion started in #391.

In https://github.com/w3c/encrypted-media/issues/391#issuecomment-299059957, @dwsinger wrote:

  1. we have a proposal to define the sub-parameters of e.g. encv as the 4CC of the scheme, followed by the 4CC of the underlying format and its sub-parameters. Then the codecs string would read codecs="encv.cbcs.avc1.402567" (I made up the last bit).

In https://github.com/w3c/encrypted-media/issues/391#issuecomment-299064862, @dwsinger wrote:

The proposal #2 is on the table in the MPEG file format group. Comments/support/opposition would be welcome.

Note that "encv.cenc.avc1.402567" would also likely be valid and the same as how just "avc1.402567" is used today.

ddorwin commented 7 years ago

I have the following concerns about this proposal:

Since this is really a property of the container / file format, why not use "video/cbcs"? (It's perhaps unfortunate that "video/cenc" was not defined rather than using "video/mp4".)

joeyparrish commented 7 years ago

Adding this info to mime types sounds better than codecs, however I'm against video/cbcs because it loses container information. Is it mp4? What about TS? WebM?

Why not:

dwsinger commented 7 years ago

we could certainly introduce Yet Another Parameter to indicate encryption; that might be more backward compatible, indeed, as I think people are signalling encrypted content by using a plain codecs string and telling you some other way (e.g. DASH attributes) about encryption.

cconcolato commented 7 years ago

The idea behind using encv.avc1.402567 or encv.cenc.avc1.402567 was that implementations that are not capable of decrypting would not try to get the file because they would not understand the codecs value. If you use another sub-parameter, old implementations would ignore it, interpret the codecs value and try to download the file. Ideally, it would have been better with a different sub-parameter, but I think it's probably too late.

Regarding exposing other boxes in the MIME type, as Dave said, there is a discussion going on in MPEG which discusses even including some SEI messages (frame packing, 360 projection, ...) in the codecs. I've put a PDF version here. Feel free to comment on MPEG's mailing list or by email to me.

It might be a good idea to make these additional "codecs" elements both container-independent (MP4, WebM) but also codec-independent (video projection). I don't think we should interpret "encv", "cenc" ... as being container specific.

acbegen commented 7 years ago

I understand your motivation, Cyril, however, overloading the codecs value may lead us to more problems in the future. We better go with a cleaner approach and push additional values to a mime sub-parameters. Yes, old clients will not parse them and will ignore them by design, but I think we need to worry more about the future clients who will have the intelligence to parse a variety of values from the sub-parameters.

cconcolato commented 7 years ago

Some further comments on the points raised:

The protection scheme - and really all boxes - are more related to the container than the codec.

The protection scheme is here to tell you that the stream is protected and which encryption mode was used (full sample encryption, partial, CBC, CTR ...). The fact that in ISOBMFF it uses boxes to carry encryption information is not part of the scheme. I don't see how using encv.cbcs is more container-specific than encscheme=cbcs. Indeed 'encv' is an ISOBMFF 4CC, but so is 'avc1'. I don't see why other MIME types could not reuse the codecs parameter with encv.cbcs ...

Codec strings are currently container-independent, and the same string can be used with multiple containers.

It's a nice thing that it's possible and I don't see how using encv.cbcs makes it non-usable with other containers.

From a web platform perspective, if a decoder API is added in the future, it would likely accept container-independent blocks and codec strings. Adding container-specific information like this would force authors to use different strings for different APIs.

Again, I don't see the string as container-specific. 'encv' says this is encrypted video. 'cbcs' says AES-CBC mode partial video NAL pattern encryption.

Codec strings are much more likely to be passed around into deeper layers of a system than container. Implementations would need to strip this prefix before doing so.

The deeper layer (i.e. media layer) needs to know the data is encrypted. Carrying the information in a single string is easier than having 2 separate strings. But I think we are splitting hairs here.

Strings for some codecs, such as VP9 and HEVC, are already very long.

Indeed, but the problem comes from the quantity of information that has to be carried, and this does not change with one or the other syntax.

why not use "video/cbcs"

Because the file may contain other streams that are not protected (e.g. audio) and you would prohibit players from understanding that.

video/mp4; scheme=cbcs

How do you know to which track it applies ?

Yes, old clients will not parse them and will ignore them by design, but I think we need to worry more about the future clients who will have the intelligence to parse a variety of values from the sub-parameters.

I disagree. I don't think it is a good design to discard old players.

mstattma commented 7 years ago

I don't understand the argument around old players. Wouldn't they bail on the new codec strings while continue to do what they do today using the approach with a mime sub-parameter?

I also don't think it's practically relevant to be able to differentiate encryption information of individual tracks in muxed representations, it would be awkward to have e.g. cenc and cbcs in one file and force everyone to download both. And no harm is done in case some tracks are unencrypted along encrypted tracks following the same scheme.

Last but not least: deeper layers need more information than what's conveyed in the codec string (e.g. IVs) and thus need a different signalling anyways (which is what we have today).

dwsinger commented 7 years ago

I think the point is that a player that only intends to play the audio, which is unencrypted, and will ignore the video (perhaps we have a seeing-impaired user), would benefit from knowing that only the video is encrypted. codecs=avc1,mp4a;scheme=cbcs doesn't tell me which of the two tracks are encrypted. codecs=encv.cbcs.avc1,mp4a is fairly clear it's the video. The only value I see in the separate parameter is that it allows sub-parameters of the encryption scheme; otherwise it decouples encryption from coding as I say here, and overall it makes the set of parameter strings longer (by the new parameter name). The route of exposing more and more of the details of the internals of files, in MIME parameters, is a bottomless pit. I am not sure I like the codecs parameter, for this reason (I know I authored the first RFC). Overall, readers want to answer "can I process this file?" and the profiles parameter, where we reflect the ftyp brands, should answer that in a 'packaged' way.

joeyparrish commented 7 years ago

The example above (in which audio and video appear in the same container and with only video encrypted) seems unrealistic to me, so I don't find that a convincing argument against parameters.

dwsinger commented 7 years ago

Hi Joey

part of the problem is that MIME parameters are general, not used just in adaptive streaming (where indeed, multiplexing is less common). Simple download, for example (imagine a video element in HTML with multiple multiplexed choices all loaded over http).

acbegen commented 7 years ago

Using muxed streams is so old fashioned. Nobody wants to do it, and the devices will not likely do it in the near future. As for your use case, Dave, a bling person just downloading the audio and skipping the encrypted video, is IMO quite a bit stretch. I wonder how many such muxed content is available on the web to download. If a content provider wants to offer that service, they can well offer an individual audio track for such customers (and that audio will likely be descriptive or narrative - different from the normal audio that would go with the video).

acbegen commented 7 years ago

codecs=avc1,mp4a;scheme=cbcs doesn't tell me which of the two tracks are encrypted. codecs=encv.cbcs.avc1,mp4a is fairly clear it's the video.

I think we can come up with a way to say that the encryption mode applies to video, audio or both. I have seen weirder things being used in mime parameters. But honestly, the muxed scenario is not that much interesting to most of us.

The route of exposing more and more of the details of the internals of files, in MIME parameters, is a bottomless pit. I am not sure I like the codecs parameter, for this reason (I know I authored the first RFC). Overall, readers want to answer "can I process this file?" and the profiles parameter, where we reflect the ftyp brands, should answer that in a 'packaged' way.

Today, you wanna put the encryption mode in the codecs parameters, who knows tomorrow someone else will not try to indicate some other feature of the media in codecs parameter (like hdr vs non-hdr). We should not try to get away with the easier approach now knowing that we will have more issues in the near future.

acbegen commented 7 years ago

video/mp4; scheme=cbcs

How do you know to which track it applies ?

See my earlier comment.

Yes, old clients will not parse them and will ignore them by design, but I think we need to worry more about the future clients who will have the intelligence to parse a variety of values from the sub-parameters.

I disagree. I don't think it is a good design to discard old players.

We are not discarding them. What I am saying is that it is not the end of the world if an old client downloads something it won't be able to render. The share of such clients will diminish rather quickly in todays OTT world. We cannot give up doing the right thing for the sake of some old clients.

dwsinger commented 7 years ago

On Aug 2, 2017, at 4:37 , Ali C. Begen notifications@github.com wrote:

Using muxed streams is so old fashioned. Nobody wants to do it, and the devices will not likely do it in the near future. As for your use case, Dave, a bling person just downloading the audio and skipping the encrypted video, is IMO quite a bit stretch. I wonder how many such muxed content is available on the web to download.

the entire iTunes library?

not all the world is streaming.

If a content provider wants to offer that service, they can well offer an individual audio track for such customers (and that audio will likely be descriptive or narrative - different from the normal audio that would go with the video).

saying that other use-cases are not interesting doesn’t really cut it for me; we can’t do a design that ignores them.

I don’t want to put the encryption mode in the codecs parameter; I want to stop exposing details of the inside of the file format, and rely instead on the profiles parameter. That tells you the envelope e.g. “I am a compliant CMAF file to profile” which should be all you need to know.

David Singer Manager, Software Standards, Apple Inc.

ZmGorynych commented 7 years ago
  1. Regarding multiplexed, is there a use case when two entirely different encryption modes are used in the same stream (i.e., independent combinations of modes). For example, can you see a legitimate use case where you have cbcs-encrypted audio and cenc-encrypted video in a multiplexed ISO-BMFF?
  2. I think mixing (muxing?) bitstream characteristics (AVC/HEVC/H.266/AV1/VP9 et al) and cipher characteristics is a far more dangerous path. You have far more than two possible cipher modes (CENC has 4, at some point people may start using authenticated encryption like GCM, etc.). In this case, aren't you reaching a point where you have a combinatorial explosion of codec strings?
dwsinger commented 7 years ago

There are use cases where end-systems can handle, for example, encrypted media of one kind but not of another, so they need to know which media is encrypted.

But fundamentally we shouldn't do a design which leaves a gaping ambiguity. And exposing all the details, parameters, and so on, is, as I say, a bottomless pit; summarize into a profile.

ZmGorynych commented 7 years ago

"Bottomless pit" is linear increase in number of MIME parameters. We have only two and are considering establishing the 3rd "Profile" is an exponential explosion of profiles in a single parameter My preference would be linearly deepening pit of parameters and not exponentially exploding single profile parameter.

dwsinger commented 7 years ago

I am not sure I understand Alexis' pushback here; by bottomless pit I mean we can get endless requests:

These, and what codecs are in use, and so on, are all summarized in operational profiles. It's not an explosion of profiles to ask file writers to pick a profile that the file conforms to, and for implementers to support all the requirements of a profile. Then a simple profile check suffices.

joeyparrish commented 6 years ago

What if we added one or more fields to the capabilities object in requestMediaKeySystemAccess? For example:

videoCapabilities: {
  contentType: 'video/webm; codecs=vp9',
  robustness: 'HW_SECURE_ALL',
  encryptionScheme: 'cbcs',
  fullSampleEncryption: 'required',
},

This would be an explicit way to query EME for encryption-related features from the platform, without overloading or complicating existing strings such as robustness, content type, or codec. If this approach is generally agreeable, we could then debate the specific fields we would like to add, semantics, etc.

mrstux commented 6 years ago

@joeyparrish this would solve neatly a thorny problem we expect to see as soon as a user agent (ie Chrome) supports both CBCS and CTR.

As an implementer of an MSE/EME based player, we need some way to detect if the current user agent supports CTR or CBCS with a given CDM, so that we can provide the correct version of the content which will work, which on some devices would be CTR and on others CBCS.

At the moment we are limited to browser version identification, or suck it and see testing.

joeyparrish commented 6 years ago

@mrstux, I'm working on a more formal proposal based on my comment above. I will be publishing it soon.

joeyparrish commented 6 years ago

Here's my proposal:

https://github.com/WICG/encrypted-media-encryption-scheme/blob/master/explainer.md

Please read and file issues for feedback. Thanks!

joeyparrish commented 4 years ago

Given that the encryption scheme query API has landed, I believe we can now close this issue.