w3c / encrypted-media

Encrypted Media Extensions
https://w3c.github.io/encrypted-media/
Other
177 stars 79 forks source link

initDataType matching needs to be more explicit to provide for future evolution #149

Open jdsmith3000 opened 8 years ago

jdsmith3000 commented 8 years ago

The current EME and EME Registry specs allow apps to match initDataTypesupport between CDMs and content. This matching works today, but is limited to a current snapshot of initDataTypecapabilities. These are very likely to evolve, and to present changes to EME that will not be detectable with our current initDataTypematching. We need a defined way to handle changes, so that existing content continues to play on current implementations, AND new content can be detected by current implementations and played whenever possible.

For CENC, we specifically propose that the ‘pssh’ box version (e.g. ‘pssh’ version 0 now) be included in the ISO Common Encryption EME Stream Format and Initialization Data spec as a condition for the ‘cenc’ initDataType. This change would make ‘cenc’ initDataTypematching more explicit, and would provide a mechanism for future content evolution.

ddorwin commented 8 years ago

A few questions to make sure I understand the proposal.

Some initial concerns:

jdsmith3000 commented 8 years ago

The concern is broader than just the initData format. Content will need to evolve, and we need a defined mechanism to support that. In the case of CENC, the pssh box version represents a useful indicator of broader content changes. The difference between 'cenc'and 'cenc1'would be meaningful to the implementation.

I'm not sure offhand that we need to support mixed 'pssh' versions. I'd be interested in what others think.

We do believe a change like this is necessary. We believe CENC will evolve, and we don't have a mechanism now to detect changes.

ddorwin commented 8 years ago

I thought that might be the case. However, initDataType is not the appropriate place to communicate these differences in the stream. (See https://github.com/w3c/encrypted-media/issues/105#issuecomment-189072229 .)

Differences in content need to be communicated via the contentType. I don't have a good solution for this at the moment. Perhaps we should change the title of this issue or open a new one to continue this discussion.

I'm not sure offhand that we need to support mixed 'pssh' versions. I'd be interested in what others think.

I do think this is necessary. Implementations and key systems will adopt new versions on different schedules. For example, the Common System requires version 1, but I suspect other systems are fine with version 0 in most cases.

jdsmith3000 commented 8 years ago

I'm not proposing this for contentTypechanges, but for CENC ones. initDataand what it represents will change and cannot currently be expressed to apps.

ddorwin commented 8 years ago

From the spec POV, Initialization Data and initDataType are (mostly) independent of the contentType or media data. As you noted, "The concern is broader than just the initData format."

You may be conflating a file format specification (ISO BMFF and CENC) with a mostly orthogonal data format (identified by "cenc"). It just so happens that the file format can contain a specific data format, but this is not strictly necessary. It may help to mentally replace "cenc" with "psshboxes" (and similarly, "webm" with "keyid"). The current names are historical accidents/mistakes.


I understand that we may need an additional way to describe the content, but initDataType is not it (even though it may initially seem related). I believe this probably needs to be handled in the MIME type somehow. It should be possible to describe the content in a MIME type independent of EME. To that point, it is odd that:

  1. There is no MIME type distinction between clear ISO BMFF and encrypted ISO BMFF (both use "video/mp4").
  2. ISO BMFF supports multiple protection schemes, but all use the same MIME type ("video/mp4").
  3. This group has decided that all encrypted "video/mp4" uses the CENC protection scheme.
  4. No matter which cipher mode is used, all CENC content uses the same MIME type ("video/mp4").

Other than item 3, these are independent of EME. All are independent of initDataType.

tinskip commented 8 years ago

I believe the best way to signal the protection scheme is via the interface between the media stack and the content decryption module, as the signaling information is generally part of the media container, which the media stack has access to. It is also consistent with the handling of other encryption-related data, namely IVs and subsample encryption information.

ddorwin commented 8 years ago
  1. There is no MIME type distinction between clear ISO BMFF and encrypted ISO BMFF (both use "video/mp4").

This problem also exists for "video/webm", though items 2-4 do not.

We should also consider these issues when defining support for MPEG-2 TS CENC (#106).

ddorwin commented 8 years ago

Triaging per https://lists.w3.org/Archives/Public/public-html-media/2016Mar/0003.html. This is important, and we should continue discussing, but I don't believe it affects the spec.

joeyparrish commented 8 years ago

I would be against splitting the 'cenc' initDataType or extending it with version information. This would really complicate things for me in a DASH client, and I don't see any benefit to either the application or to the CDM/browser.

jdsmith3000 commented 8 years ago

The point wouldn't be to version initDataTypesarbitrarily, but to do so on meaningful changes that might CDM support. One way to do that is to tighten the definition of what we consider cenc now, so that if (or when) it changes in the future, apps have tools to detect the difference and gracefully handle the content.

We believe this issue should be considered V1 because it would better position EME for future change.

ddorwin commented 8 years ago

Can you give specific examples of such changes? How do you propose to tighten cenc? Also, my initial concerns remain unaddressed.

jdsmith3000 commented 8 years ago

Some of your concerns discuss imprecisions of our use of MIME types, and aren't specifically issues that I was trying to resolve. I opened this issue to specifically discuss the evolution of cenc and compatibility over time with EME. If pssh box versions rev (as they did from V0 to V1), that should identify a meaningful difference that EME should recognize, and handle by checking the initiDataType support from the CDM. That suggests we add identifying criteria to the registry for cenc beyond just the presence of pssh boxes. I've proposed trying to limit the pssh version to v0, and still think that is desirable. If it's not feasible, then recognizing either v0 or v1 would at least anchor the initDataType to a version, and provide a path for it to change.

joeyparrish commented 8 years ago

I negotiate with MediaKeys before I append an init segment, and therefore before I know what initDataType I'm going to get from the encrypted event.

jdsmith3000 commented 8 years ago

That means you know the possible initDataTypes in advance. You presumably control your content and what initDataType it implements. That wouldn't need to change, would it?

ddorwin commented 8 years ago

If we can limit this discussion to the Initialization Data format (i.e. PSSH box format), great. However, I'm concerned there may be other assumed implications - now or later - about the meaning and/or how it might be used. For example, [1]. Separately, I do think there are MIME type issues that need to be solved - perhaps we should file a separate issue or raise them in an appropriate forum.

As for the specific issue of "checking the initiDataType support," I can see the theoretical potential usefulness of detecting PSSH version capabilities in requestMediaKeySystemAccess() to determine whether the client can support the media streams and/or specific behaviors. (I wonder whether it would really be useful since a content provider is likely to have fallback PSSH boxes to support older clients.) However, I'm not sure it makes sense in the other places where initDataType is used - MediaEncryptedEvent and generateRequest(). This is also where the issues of multiple PSSH box versions is an issue.

Thus, I think we have three questions:

  1. Do we agree that the use case we are trying to solve is limited to feature detection via requestMediaKeySystemAccess()?
  2. If so, is this really something that needs to be solved/detectable?
  3. If so, how can we provide this capability without affecting most applications (via MediaEncryptedEvent and generateRequest()).

[1] "The concern is broader than just the initData format. Content will need to evolve, and we need a defined mechanism to support that." -- https://github.com/w3c/encrypted-media/issues/149#issuecomment-189451188

jdsmith3000 commented 8 years ago

I don't see the specific issues with using specific initDataTypes in MediaEncryptedEvent and generateRequest(). Please elaborate.

My primary concerns are with respect to initData format and how initData is used in content.

ddorwin commented 8 years ago

I don't see the specific issues with using specific initDataTypes in MediaEncryptedEvent and generateRequest(). Please elaborate.

These have been discussed above. For example, forcing applications to handle "cenc", "cenc2", etc. and the fact that the initData may contain various versions of PSSH boxes.

My primary concerns are with respect to initData format and how initData is used in content.

I'm not sure what you mean here, but this seems more about the content than the initData format. If a media format and/or initData format is changed such that the initData is used differently in that content, that seems like we need to change how the content is described more than the initData.

Again, it would really help this discussion to have specific examples, even hypothetical.

jdsmith3000 commented 8 years ago

The hypothetical situation is when/if cenc changes in a way that requires CDM changes to work, how will that be detected and supported? Content standardization work is ongoing, and it is reasonable to expect that changes like this will happen. The MSE type accepts codec string versions, but we've not done anything to allow similar contentType evolution in EME.

I don't believe apps would be forced to deal with updated contentType strings. As a general rule, we would not expect these to rev unless they carried a necessary distinction for CDM support. Applications would pick up changes only if they cared about them.

Versioning the pssh box is an option we could check now that we believe would satisfy this need going forward. Alternatively, new CDM capabilities could be expressed through keySystem strings, though that is indirect and would require implicit app knowledge of content decryption requirements.

jdsmith3000 commented 8 years ago

Correction: I meant the initDataType for the cenc versioning discussion above. contentType uses the MIME type and codec strings.

jdsmith3000 commented 8 years ago

I propose specifically that we be more specific on what we assign initDataType = “cenc”. I’m not proposing we create any new initDataType strings now, but create the potential to use these in the future.

Options are:

ddorwin commented 8 years ago

The first option sounds fine to me. Since v1 already exists, and we use it for the common format, I think we should include it. I think we could say that this currently includes v0 and v1. If/when there is a v2, we would still have the option to use this same type if appropriate, especially if the format is backwards compatible.


Regarding the second option, the location is not really a property of the type or format. For example, an application can derive the correctly-formatted data from anywhere and pass it to generateRequest(). However, we can specify under what conditions a user agent should fire an "encrypted' event and with whatinitDataType value. (These are currently combined in one section, but #105 should make this clearer.)

We might say that such events should only be fired when a 'pssh' is encountered within 'moov' and 'moof' if those are the only currently supported locations. We could add it to the existing sentence that starts "Each time one or more 'pssh' boxes are encountered..."

As for limiting the events to 'moov', I'd like to hear from others that are familiar with how VOD content is currently packaged and what applications currently expect. It's possible that some limitation would reduce the number of "encrypted" events when adapting, but I don't know whether that's practical (i.e. if that's the only time we'd get such events in some cases).

However, we should also consider that any limiting of when to fire an event or vary the initDataType based on location/box hierarchy could complicate implementations (i.e. that simply walk the boxes).

ddorwin commented 7 years ago

@jdsmith3000, what do you want to do with this issue? Since it doesn't affect the main spec and we are unlikely to get to it this week, I'm going to move it to VNext. We can address it at any time, though.