Open steelejoe opened 9 years ago
Before we get into specific text changes, can this issue be rephrased in the form of a feature or use case you would like to be supported? I assume it is related to one or more of the cases in https://www.w3.org/Bugs/Public/show_bug.cgi?id=27093#c11. Perhaps most importantly, where would the "other keys it already has" come from?
Exactly. This is related to supporting the three use cases I outlined in that bug. To paraphrase that bug, the content key would be contained in the PSSH in encrypted form and another key already held by the CDM is used to decrypt the content key. Those other keys can come from many places, but a concrete example would be a key embedded in either the CDM or in some OS component the CDM has access to. Another example would be a long-lived intermediate or "master" key which the CDM acquires once and uses to decrypt a set of titles, along the lines of the "Multiple License Sessions" use case on our wiki.
There appear to be several different use cases here. I think it makes sense to explicitly define each and address them individually (but with awareness of all). While they may share the presence of keys in the PSSH box, it appears each may have different implications for the spec.
Updated with new title to reflect discussions at the March 2015 F2F. Original title was "generateRequest may result in keys being usable when no key request needs to be sent". This bug supersedes the older bug (27093) which has been closed. I will be filing an additional bug to capture the use cases individually.
This issue should be blocked on #52, not the other way around.
I was asked to provide more specific text proposals for the algorithm changes.
6.2 Methods
generateRequest 9.7.4: Let message be a license request for the requested license type generated based on the sanitized init data, which is interpreted per initDataType. should change to generateRequest 9.7.4: If the initData contains information that would allow the CDM to generate content key(s), then generate those keys and let message be null. Else let message be a license request for the requested license type generated based on the sanitized init data, which is interpreted per initDataType.
9.11. Run the queue a "message" event algorithm on the session, providing "license-request" and message. should change to 9.11. If message is not null, run the queue a "message" event algorithm on the session, providing "license-request" and message. Otherwise run the update key statuses algorithm.
6.6.2 Update Key Statuses
This can happen as the result of a load() or update() call or some other event, such as expiration. should change to This can happen as the result of a generateKeyRequest(), load() or update() call or some other event, such as expiration.
6.8 Session Storage and Persistence
The CDM should not store session data, including the Session ID, until update() is called the first time. Specifically, the CDM should not store session data during the generateRequest() algorithm. This ensures that the application is aware of the session and knows it needs to eventually remove it. should change to Applications should expect that sessions using a type for which the Is persistent session type? algorithm returns true may store data at any time while the session is open.
If we accept Joe's change (and I think we should) we should consider changing generateKeyRequest() to something more generic. "init()" goes well with "load()" and "update()".
+1 to Mark's suggestion.
I have one additional text proposal for the Definitions section:
Key Such keys must only be provided to the CDM via an update() call. (They may later be loaded by load() as part of the stored session data.) should change to Such keys can be provided to the CDM via the initData or an update() call. (They may later be loaded by load() as part of the stored session data.)
Adding a tie-in to the ongoing email thread -- https://lists.w3.org/Archives/Public/public-html-media/2015Sep/0003.html
Issue #52 is now resolved. There should be nothing blocking this issue from moving forward now. Looking back over concerns raised about this in various threads and in the comments here - this is what I come up with: 1) Possibility of interop issues 2) Need better definition for the use cases
I believe (1) is only an issue for existing EME/MSE players. For those players some work will likely need to be done, but since the specification has not been finalized that is not unexpected. Going forward I don't believe there is a large burden in requiring app developers to support the "no request" case.
For (2) -- the workflow is essentially the same as subsequent playback in the Persisted License use case. I have added a new use case Keys Available in initData to covr this use case. Hopefully this use case in combination with the proposed text changes above is enough of a guide for discussion.
My understanding from the telecon is that some believe this issue can and should be addressed independent of other features, such as key chaining and master keys (#53).
It is still unclear to me a) exactly what this issue now covers and b) that this change is generally useful in the absence of those features, out-of-band keys, or similar.
For the purposes of this discussion, I believe there are two categories of keys (or algorithms) that could be used to decrypt (or derive) a key from the initData:
Am I missing a category? In all cases, I am assuming that the secret key (or algorithm) is common for all instances of a key system since initData (e.g. PSSH box) should be generic to a key system and not individual users or devices.
I believe we agree that (2) is explicitly out of scope for this issue. That leaves (1). Is that correct? Are there other scenarios that are intended to be covered by this issue.
You are missing a category because (2) in your list is too broad. Issue #53 covers specifically long lived keys, not just any type of key chaining.
The proper list of key categories is:
The secret key or algorithm is only common to all instances of a key systems for (1). Temporary key encryption keys can be issued to a user+client combination. Both key categories (1) and (2) are in scope and covered by this issue. Using keys from category (1) or (2) would result in the problem we are trying to avoid, namely that a key request message is required to be sent by the spec when it is not required for the implementation to function.
As a side note -- I notice that the Milestone for this was moved to V.Next. I would like to see it moved back to the current V1.
Yes, I grouped all key chaining in #53. It turns out that "chaining" isn't even mentioned there. Perhaps a separate feature request issue should be filed to cover the topic of key chaining.
Do I understand correctly that (2) is key chaining using "temporary"
sessions (whereas (3) would use some type of persistent session)?
I maintain that this and related requests have a high opportunity cost, especially given the impact on core algorithms and assumptions and the questionable benefit to the web platform overall. Thus, vNext is appropriate.
Key chaining has been discussed and deferred multiple times, so I do not accept that it must be addressed in v1. (1) may actually work in the current spec text (after #52 was resolved). What is not supported is the optimization to avoid a message
event. Given the limited interest in and utility of (1) overall, I find it difficult to justify the effort required of the editors to define the behavior and re-stabilize the spec well as the associated delay of LC/CR.
Yes (2) is key chaining using "temporary" sessions and (3) is using "persistent" sessions of some type TBD.
Applying a milestone like "vNext" implies it has been decided this feature will not be part of the current spec and as far as I am concerned that has not happened yet. I realize we disagree on this point.
How much additional definition of the behavior is required? I believe I have been pretty explicit above. I am sensitive to the effort required by the editors, but I am not clear on how much additional effort this really is.
For an app developer, the changes would be minimal. The application would need to make sure the keystatuschanged handler is attached when the session is created, rather than relying on the message handler to be triggered. This is a good idea in any event, since the load() and generateRequest() already behave differently with respect to _message_s being sent. These changes would be compatible across all existing CDMs and would not require special per-CDM behavior.
First, we should clarify and agree on what exactly this issue covers and what we’re discussing. The summary relates to providing keys in Initialization Data, but the discussion includes at least three different use cases and much of the discussion (here and offline) focuses on whether at least one message
event must be fired, which I believe is a minor detail of the overall request.
I see the following possibilities. We should agree on one of them and redefine this issue and/or file new one(s).
generateRequest()
- perhaps simple but serves no purpose on its own
Second, while we could go into more details about the effort required for each, both now and on the path to REC, and the risks they pose, that would distract from the the broader point - and thar bar I believe we should apply to all such requests.
Specifically, there is no compelling evidence that a) the web platform or spec will be better as a result of adding this feature; b) this feature is required in this version; and c) the proposed changes would improve interoperability. In fact, (1) is an impediment to content interoperability.
Any additional cycles put into this issue now are cycles we don't use to move the agreed-upon features of v1 to LC/CR. I expect that a variety of possible optimizations and improvements will be incubated, considered, and weighed for the next version and would strongly encourage you (and others) to provide such data as part of the incubation.
I think the high level question is to what extent we allow CDMs to implement functionality which is not explicitly described in the specification, provided they follow the standard API.
For example, because the API allows CDMs to initiate and continue message exchanges at any time, CDMs can implement key rotation, key renewal, secure heartbeats or any number of other features during streaming which are not described in the specification.
Option (A) opens up the possibility for a very slightly larger space of such unspecified, but API-compliant and interoperable, features - specifically the various optimizations Joe has described. Those features are not different in character from the examples I give just above.
If we decide that all such features should be fully specified, we should do that for key rotation etc. as well and we should be explicit in the specification about when CDMs can / cannot send messages so that those features cannot be implemented in an unspecified way.
Alternatively, if we retain the space for such features, we should extend that space for the useful features Joe has described.
A third approach would be to extend the space now (in V1) as proposed here with the intention of fully documenting these features as well as renewal, rotation etc. in V2 when we would have time for a full requirements / design discussion for each of those.
Mark has stated the meta-question very well. Let's not specify CDM behavior to a level detail beyond what is required for this spec to be useful and interoperable.
To respond specifically to the 3 possibilities you mentioned -- Option (A) is what this issue is about. Option (B) was already decided by issue #52. Keys are not excluded from initData and therefore are allowed. The elements of Option (C) which are relevant to this spec are covered by issue #53.
Making the changes I have suggested to allow generateRequest()
to send no messages will allow for a performance boost in cases where the CDM already has the keys available. At least one CDM can take advantage of this today, but there is nothing preventing other CDMs from taking advantage of this as well. This is not a theoretical advantage, we are benefiting from it today. That seems like a compelling motivation.
I agree with how Mark has positioned this question as well. It’s reasonable for us to allow variations within the EME framework without specifying the details of each unless there is a clear case to be made that they will lead to interoperability issues. The narrow case of allowing available keys to be used without generateRequest()
shouldn’t cross this line. I also feel that the keys in initData topic doesn’t cross this line either. It’s a capability fully supported in the CENC specifications. We should strive to align with these specifications in our designs, and I think allowing streamlined implementations on messaging when doing so makes perfect sense.
@mwatson2 wrote:
For example, because the API allows CDMs to initiate and continue message exchanges at any time, CDMs can implement key rotation, key renewal, secure heartbeats or any number of other features during streaming which are not described in the specification.
While I agree that certain methods of key rotation, key renewal, and secure heartbeats are supported, I disagree that these examples support your point.
Those features are not different in character from the examples I give just above.
They are different because a) they affect the existing algorithms and assumptions, b) they affect or are an impediment to interoperable content, and c) there is no interoperable definition of how to support them nor an interoperable definition of what such content looks like.
In contrast, key renewal and secure heartbeats are implemented entirely within the license protocols without affecting the application or interoperability of the content. Not supporting those features does not in any way make it impossible to play the media. That said, I am open to defining these somewhere. Key rotation via requesting new licenses is also supported and follows all the existing algorithms and assumptions. On the other hand, key rotation via direct decryption of embedded encrypted keys is not currently supported for similar reasons as those in the previous paragraph.
A third approach would be to extend the space now (in V1) as proposed here with the intention of fully documenting these features as well as renewal, rotation etc. in V2 when we would have time for a full requirements / design discussion for each of those.
That is the opposite of the approach we should take. Once we have allowed everything, we are not going to be able to add restrictions. That means V2 would end up specifying whatever is already implemented, regardless of interoperability, security, privacy, etc.
V1 is working well in practice, so I don't see any need to break it open now, especially at the expense of likely creating future problems.
@steelejoe wrote:
Let's not specify CDM behavior to a level detail beyond what is required for this spec to be useful and interoperable.
More importantly, let's not relax it such that the resulting implementations and media are not interoperable.
To respond specifically to the 3 possibilities you mentioned -- Option (A) is what this issue is about. Option (B) was already decided by issue #52. Keys are not excluded from initData and therefore are allowed. The elements of Option (C) which are relevant to this spec are covered by issue #53.
I want to reiterate that (B) really depends on (A). To clarify, (B) means allowing and specifying the direct use of keys in the initData instead of sending a message. As I noted, this is related to your use case (1).
In your use case (3), you said that #53 covers "Long-lived key encryption keys." (C) is the "Temporary key encryption keys" you referred to in your use case (2), as specified by my reference to (2). None of the possibilities for this issue cover #53 (your use case (3)) because that is already a separate bug.
You said this issue is about (A), but (A) alone does not appear to have any useful value (because keys would never be obtained).
I maintain that the scope of this issue is undefined.
At least one CDM can take advantage of this today, but there is nothing preventing other CDMs from taking advantage of this as well. This is not a theoretical advantage, we are benefiting from it today.
To be clear, I believe a future version of the spec should consider such optimizations. However, doing so today comes at the cost of significantly blocking progress on the already agreed upon functionality ("v1"). (I'll provide more details in my next reply.)
That said, specifically which of the above use cases does said CDM support? I agree that it would be great if everyone could take advantage of this, but a) there is no specification for how to do that or how one would get these benefits on the same content and b) the point of specifying things like this is to ensure that (a) is true, implementations are interoperable, and the behavior can be evaluated in the open.
David, I believe your principle concerns here are stated here:
They are different because a) they affect the existing algorithms and assumptions, b) they affect or are an impediment to interoperable content, and c) there is no interoperable definition of how to support them nor an interoperable definition of what such content looks like.
My understanding of the proposal is that (a) and (b) do not hold. So, we have a difference of understanding as to the proposal which we need to resolve. I would agree that if (a) and (b) were true we should not do anything here. If (a) and (b) and not problems, then I think neither is (c).
Could you explain why you think this proposal affects existing algorithms and assumptions (specifically which ones) and are an impediment to interoperability ?
My assumption has been that we are talking about optimizations. CDMs that did not support the optimizations would still be able to play the content.
Let me be clear, I do not object to optimizations - as I said, I think this will be an important topic for a future version of the spec. What I do object to is opening holes in the spec to allow preexisting not-openly-documented mechanisms that the community has not evaluated (e.g. for interoperability, extensibility, consistency) or weighed against alternatives.
Simply allowing such mechanisms in the current spec runs the risk of blessing non-interoperable mechanisms and/or creating de facto standards, both of which may limit our ability to define interoperable mechanisms in future versions. In addition, I am concerned that content that supports these mechanisms may not work with other implementations (consistently or at all).
I will note that no one has provided compelling evidence that this is a use case that the web platform must support in EME v1.
I am also concerned about significantly stunting the progress and recent momentum of this spec towards LC/CR and eventually REC. Doing the work to evaluate and define such optimizations now will significantly delay LC/CR, and even opening holes would likely delay it as we look for and deal with unintended consequences and bugs. Opening holes may also delay later stages if we have to deal with Formal Objection(s) or Director concerns (more below). After nearly four years, EME is still not in LC. We must make some tough choices to close out v1 so authors and users can benefit from the interoperability of the important and commonly used set of features we have already defined.
As a concrete example, this discussion has already consumed the time I had intended to spend following up on two complex bugs in the v1 feature set and reviewing pull requests for others.
Beyond the specifics of this issue (however that is eventually defined), recent comments touch on philosophical questions about how much flexibility CDMs should have to do what they want. That's a debate we can have, but it's a bit late for that for v1 (see above). I believe there is clear direction in the current spec text and that our actions have been clear, especially over the last year or two. (Note: Even if we reached agreement on changing the philosophy, that change and this issue still would require changes to the existing spec and assumptions, delaying v1 (see above).
Unless the group agrees that we want to significantly delay CR to have a philosophical discussion, incorporate new use case(s), and restabilize the spec, we should table this discussion and get back to finishing the existing feature set.
While I do not want to further distract the work on v1 by starting that debate, below is some context that should give a good idea on where such a debate could end up.
Most W3C specs specify all functionality. EME is an outlier due to the nature of DRM robustness, but we should not abuse this exception. (Also, remember that the outstanding Formal Objection is related to interoperability.) In the words of the TAG EME Spec Review, "But just because the CDM's behavior is undefined, does not mean that EME as a whole becomes a free-for-all that can ignore how the web platform works."
In addition to defining APIs and behavior such that anyone can create interoperable implementations based on them, W3C specifications also address privacy, security, and other concerns of the web platform. The less we define, the less the we and the community can evaluate these properties - in the open with many eyes representing much experience - and subsequently address any such issues in the spec.
For more specifics on how this applies to EME, I invite you to read the TAG EME Spec Review and explain how the philosophy being advocated in this thread is consistent. I would hate to get further down the process path (i.e. Director review) only to have to come back and change the spec to address concerns of which we were already aware, especially for some minor use cases.
Just some of the many relevant points from that review:
As a consequence of this, the capabilities of a pre-existing DRM system are not useful for guiding discussion. The goal of EME should be a common-denominator API that can be used to interface with all DRM systems equally.
If a vendor is in favor of adding a given capability to EME---perhaps a feature their CDM supports---then they should not do so through a side-channel or extension point of the EME API, but instead through the normal standardization process for web platform features.
In general, given that CDMs are underspecified, their author-facing scope should be normatively limited as much as is possible while still giving the desired robustness guarantees.
Regarding interoperable content, Microsoft's recent blog post says, "A key underpinning that makes this work is the development of ISO MPEG Common Encryption. By using common encryption, web media services can set up uniform content libraries that are compatible with more than one DRM solution. A service can choose to support more than one DRM without having to encrypt content for each individual DRM."
I do not believe the same is true for content that uses the mechanisms being discussed here, or at least there is no clear definition of how implementations could interoperably support such content. In reality, the features being discussed appear to be related to content packaged for a specific DRM system.
Let me respond to your last point regarding the interoperability of content. The use case I am describing involves key acquisition and in no way impacts the actual encryption of the content. Content with a DRM-specific PSSH that happens to support this feature would be completely interoperable with all CDMs, assuming all other EME packaging requirements were met. They would simply not benefit from this feature and would be required to make a key request.
Thanks for providing a detailed response, this is really useful. I think one root of the disagreement here is that there are several principles you cite on which we do not have group consensus:
Given the above, the your application of a requirement for detailed disclosure and review of CDM mechanisms to this issue and not to others (rotation etc.) seems arbitrary. CDMs can presently do any number of things with embedded keys, internal state shared across sessions etc. etc. which will affect the number and timing of message exchanges. Common encryption guarantees interoperability (as Joe and the Microsoft quote attest), but people can do CDM-specific optimizations.
Personally, I think the TAG review was good and we should continue to push in the directions they recommend. We must certainly stick to the requirement for interoperability. But I'm not really in favor of arbitrary application of a particular interpretation: we should stick with the original assumption that CDMs control the number and timing of message exchanges. If we want to start constraining that we should try in V2.
@ddorwin
I will note that no one has provided compelling evidence that this is a use case that the web platform must support in EME v1.
We use this optimization today and it significantly reduces startup time for new streams and on channel change. Not having this supported in EME v1 would significantly slow those start times. Supporting this in v1 provides no interoperability issues, since it can be accomplished simply by the application waiting on key status instead of key message.
https://www.w3.org/wiki/HTML/Media_Task_Force/MSE_Ad_Insertion_Use_Cases
While I sympathize with Mark's framing of the meta-question, I worry about the priorities and interoperability. Also, Mark's framing doesn't take into account that having to repackage media files with additional PSSH boxes differs from other key system-specific variations in messages.
At present, I think the main impediments to interoperability are
I think we should prioritize getting all services to support all key systems over performance optimizations. For that reason, I'd prefer to see CDM vendors and streaming services deploy the W3C Common PSSH box format before effort is put into key system-specific optimizations.
As for whether methods or events should be renamed to be general enough as to not be misnamed if support for the optimization in question here is formally acknowledged now or in the future, I think we really need to stop renaming stuff. Slight misnomers are a lesser harm than breaking code written to the current spec.
In fact the common PSSH format [1] MUST be supported by key systems supporting Common Encryption. So, it should be the case that so long as content contains this common PSSH (or the application provides it) then we have achieved content interoperability across all key systems. If there is anything missing from the specification to ensure that, we should fix that now (is there?). If we could place requirements on the server side of the key-system then we would say that the server MUST respond to a request generated from a common PSSH with a response that does contain the requested keys. But I am not sure the server component is in scope.
So it seems to be that we have the kind of content interoperability that we need: any keysystem-specific key distribution models must be additive to the baseline model.
So, again, it seems arbitrary to allow some things and not others when there are clearly beneficial use-cases.
Regarding renaming, is it an option to rename things in the specification but recommend that implementations continue to support both old and new names for a while ? That would enable us to avoid confusing naming but not break anything.
We had a long discussion at the Sapporo F2F meeting of this issue: http://www.w3.org/2015/10/30-html-media-minutes.html#item21
@steelejoe - How do you recommend we proceed?
@ddorwin - I don't believe the Task Force have consensus to mark this issue as VNext or "feature request".
@paulbrucecotton: It is equally fair, if not more accurate, to say that we don’t have consensus in the WG to address this in v1. There is no standing rule that all issues or requests are v1 by default. On the contrary, at this point - after nearly four years without getting to LC and having exceeded the WG's lifetime - the default really should be the opposite, and we will need a very good reason and consensus to pull them into v1. (This is in fact how most such requests have been triaged. Of the two current exceptions, it appears one will be closed and the other is a registry entry.)
It is equally fair, if not more accurate, to say that we don’t have consensus in the WG to address this in v1.
I will not disagree with this assertion.
There is no standing rule that all issues or requests are v1 by default.
I will not disagree with this assertion.
On the contrary, at this point - after nearly four years without getting to LC and having exceeded the WG's lifetime - the default really should be the opposite, and we will need a very good reason and consensus to pull them into v1. (This is in fact how most such requests have been triaged. Of the two current exceptions, it appears one will be closed and the other is a registry entry.)
It is up to the Media Task Force to decide if and when it has the right set of features in EME and to decide when it wants to take that feature-complete specification to Last Call. In my role as Chair I have to help the TF find a consensus position on these matters.
My post on this issue was meant simply to clarify that from my view it did not appear that we had sufficient consensus to mark this issue as V.Next but I could have equally said that the TF was undecided on if issue was a V1 issue.
I hope this clarifies the intent of my post.
The spec allows for Key System specific Initialization Data to be provided (either directly in the PSSH or via the generateRequest() call). Some Key Systems support or rely on title keys being present in encrypted form within the Initialization Data. The core issue is that the CDM may be able to use keys present in the Initialization Data to decrypt the content without any additional key requests
This has implications for several of the algorithms.
6.2 Methods, generateRequest
"11. Run the queue a "message" event algorithm on the session, providing "license-request" and message."
Since keys may already be usable, the CDM could instead run the Update Key Statuses algorithm. E.g.
"11. If keys specified in the initData are not already available, run the queue a "message" event algorithm on the session, providing "license-request" and message. Otherwise run the Update Key Statuses algorithm."
6.6.2 Update Key Statuses
"This can happen as the result of a load() or update()"
This list should be augmented with generateKeyRequest.
6.8 Session Storage and Persistence
"The CDM SHOULD NOT store session data, including the Session ID, until update() is called the first time. Specifically, the CDM SHOULD NOT store session data during the generateRequest() algorithm. This ensures that the application is aware of the session and knows it needs to eventually remove it."
Since the keys may already be usable at this point, forcing update() to be called does not make sense.