w3c / encrypted-media

Encrypted Media Extensions
https://w3c.github.io/encrypted-media/
Other
180 stars 80 forks source link

"tracked" sessions: architectural concerns pending resolution with TAG #85

Closed ddorwin closed 4 years ago

ddorwin commented 9 years ago

Pull request #54 was merged without addressing architectural concerns about “tracked” sessions. Unresolved questions are pending a discussion with the TAG. The outcome could result in modification (or removal) of “tracked” sessions.

This issue is a placeholder for that discussion and outcome.

Resolving #82 and #84 could help accelerate conclusion of this discussion.

mwatson2 commented 8 years ago

@ddorwin wrote:

My concern about this particular "at risk" feature is that there are pending substantive changes

I don't believe the changes that are pending are substantive. They describe more explicitly what is already required by the existing text. This is why the issue is classified V1NonBlocking.

Now that we have merged the session close re-factor under #181, the changes for #171 are very simple.

hsivonen commented 8 years ago

@ddorwin, I'd like to understand your current objections to the CDM requesting the browser to store a key usage record on the CDM's behalf in origin-partitioned storage when the browser shuts down a CDM instance. Specifically, this would not involve generating another EME message (synchronous messaging concern) at CDM shutdown. Specifically, this would not involve storing key usage records during playback, so even in the absence of "secure" storage, there wouldn't be a risk of rollback--just the risk of failing to write a record at all if either the CDM or the browser crashes (or electricity to the computer is cut, etc.). Also, the user deleting the storage between the CDM shutdown and next initialization for the same site would result in a lost record.

Your comment from April 14 strongly hints that the crux of your concern is that Chrome loses the association between a CDM instance and the origin to which the storage should be partitioned before the CDM instance is actually shut down. Is that the case? Is there a reason why the CDM instance can't have a potential (origin-partitioned) storage location assigned to it at CDM instance initialization time so that knowledge of the storage location would survive the destruction of the Document object that was responsible for the CDM instance getting initialized?

ddorwin commented 8 years ago

@hsivonen, most of the issues in your first paragraph were addressed over the last year and led us to this point. For example, requiring delayed application shutdown to send messages (and thus the ability to enforce concurrent stream limitations) was dropped; and the current definition does not require tamper-evident storage (and avoids trivial rollback attacks). The latter, however, leaves the feature in a state where it requires very high assurances that the single record write will occur when the page is closed, including because the tab or browser is closed.

https://github.com/w3c/encrypted-media/issues/45#issuecomment-147826243 summarizes the state, before the discussion moved to the architectural issues. Specifically, the feature does not support enforcement like alternative mechanisms, and identifying abuse while dealing with the uncertainty over time (there will always be some percentage of key usage records that are never received and others that may not be received for days or weeks) on the server could be quite complex, especially across the broad spectrum of client device form factors and usage models.


tl;dr: Persisting state is not an explicit goal/purpose of EME, yet this feature requires something that even APIs with the explicit goal of persisting data do not support: ensuring persistence during destruction of an object or the entire page. Requiring client implementations to accommodate a limited-utility and orthogonal capability is unnecessary, especially when other options have not been duly considered.

Specific to Chrome, Chrome treats the CDM like any other part of the page, such as the media player backing the <video> and, where possible, stores CDM data using the same mechanism as other site data. This helps keep the EME implementation consistent with the rest of the platform and allows reuse of established mechanisms.

Even if a parallel storage mechanism for CDMs was added to Chrome, it would need to manage the CDM instance/process lifetime outside that of the browsing context, including the associated renderer process - something that is not and cannot currently be normatively described. This would be inconsistent with the current goals to treat EME and CDMs like any other part of the web platform.

Even if solutions for both were implemented in Chrome, this would only address the issue for the Chrome browser. Any Chromium-based user agents, including Opera and Chromium Embedded Framework, that use Chromium’s content layer but not the entire Chromium browser may have to independently address the issues above. It is important to ensure that implementers continue to have freedom to innovate and avoid imposing orthogonal requirements.


As has become apparent from recent discussions above, though, this is not just an implementation issue - the reason these potential implementation issues exist is that the required behavior is outside the bounds of the current web platform, including the extent of normatively-definable behavior. Thus, even if there was not an implementation problem, we would still have the problem of being unable to normatively describe the required behavior. I’ll address this in more detail on this in my next comment.

Given the lack of evidence that this is a generally useful solution (as far as we know, only Netflix has actually deployed it, and mostly on platforms that have tamper-evident storage), it seems premature to consider this feature a Recommendation at this time, especially given all the other issues and implementation burden it involves.

ddorwin commented 8 years ago

The spec for MediaKeySession currently (after #171) says:

If a MediaKeySession object becomes inaccessible to the page and the Session Closed algorithm has not already been run, the User Agent MUST run the MediaKeySession destroyed algorithm before User Agent state associated with the session is deleted.

This sounds like a destructor but maybe even broader. Is there precedent for such normative text related to destruction or inaccessibility of an object?


The MediaKeySession Destroyed algorithm says:

The following steps are run in parallel to the main event loop: …

  1. Use cdm to execute the following steps:
  2. Close the session associated with session.
  3. If session's session type is "persistent-usage-record", store session's record of key usage, if it exists. NOTE Since it has no effects observable to the document, this step may be run asynchronously, including after the document has unloaded.


So, when "a MediaKeySession object becomes inaccessible," the steps related to this feature "are run in parallel to the main event loop." Ignoring the destructor question, this might be okay if it was best effort. However, with an expectation of "at least 99%" reliability, the user agent would need to wait until the parallel steps complete before terminating the main loop. As far as I understand, there is currently no way to specify this behavior in the web platform (the current spec text likely does not provide the expected reliability), and there is no such precedent for delaying the teardown of the main event loop or browsing context. Such delays also seem to contradict trends in implementations towards quick shutdown.

The spec changes for #171 attempt to deal with this with a non-normative note that the persistence “may be run asynchronously, including after the document has unloaded.” This appears to be an acknowledgement of the issue above, but since there is no mechanism for normatively solving it (within the document lifetime), it relies on a non-normative suggestion for implementations of the specification to do work related to the browsing context outside the lifetime of that browsing context.

mwatson2 commented 8 years ago

@ddorwin wrote:

Persisting state is not an explicit goal/purpose of EME

I disagree with this statement. The goal/purpose of EME, from the beginning, was to provide access to Content Protection capabilities previously available only through plugins. Those pre-existing capabilities included persistent licenses and key release messaging. We documented the in-scope use-cases in our wiki which lists both persistent licenses and key release as "supported". Key release was in the very first EME draft proposed.

Whilst we have agreed on constraining the CDM features accessible by EME to a smaller set than those supported in plugins, it has never been proposed to exclude or deprioritize those features requiring persistent state (except insofar as they are optional).

Requiring client implementations to accommodate a limited-utility and orthogonal capability is unnecessary, especially when other options have not been duly considered.

The feature is optional, so noone is required to implement it.

We have had long long discussions of alternatives and I've provided detailed explanations of why we think license renewal is overkill for this problem, so I'm not sure why you say other options have not been duly considered.

So, when "a MediaKeySession object becomes inaccessible," the steps related to this feature "are run in parallel to the main event loop."

The following is not a major issue, but as commented earlier the "in parallel" here is really a noop. "in parallel" means only that pages cannot assume the steps will be complete before the next turn of the event loop. But in this case the effects of the steps can only be observed asynchronously (by loading a session and waiting for the release message) and in the page close case there may be no more turns of the event loop at all. Whether we say "in parallel" or not it looks the same as far as the page is concerned.

However, with an expectation of "at least 99%" reliability, the user agent would need to wait until the parallel steps complete before terminating the main loop.

I don't think this follows, or even makes sense. The main event loop in our specifications is an abstract thing which expresses the serialization of the execution of tasks for the page. There can, by definition, be no more tasks related to this object and may be no more tasks at all, so how does it make sense to talk about "before terminating the main loop" in this context ? The only thing that could mean is that you expect more tasks to be executed after these steps, but that is not proposed.

Furthermore, this is an optional feature and the specification does not specify a reliability target. The 99% figure I gave is our (Netflix's) "happy path" target, considering only cases of graceful session close / page shutdown and where local storage has not been cleared.

Finally, as noted here, a possible implementation is that the CDM is provided with the necessary origin-specific context at initialization time and has the ability to post the storage task to some separate page-independent queue for later execution, even after the page has closed (as could also be done with ping).

As far as I understand, there is currently no way to specify this behavior in the web platform (the current spec text likely does not provide the expected reliability), and there is no such precedent for delaying the teardown of the main event loop or browsing context. Such delays also seem to contradict trends in implementations towards quick shutdown.

The current text doesn't imply any particular level of reliability but does not constrain reliable implementation either. I can't see how the text constrains implementations (i.e., how it "does not provide the expected reliability"). AFAIK, there is no such thing in the web specifications as "teardown of the main event loop" - the UA decides when to stop executing tasks. The necessary storage scope can be captured earlier, independently of the browsing context, DOM, etc. So there is nothing that has to be delayed. The closest I can think of is if there was a requirement to execute a specific queued task or make some UI change or other page or user-visible chage, after the storage of the persistent release data, but there is no such requirement.

In general, I find that these objections confuse specification / platform issues with implementation issues. Of course there are many many aspects of browser implementation which are not addressed by the specification. Because some aspects of a feature are in that unspecified implementation realm is not a valid criticism of the feature - all features have this property - so long as the observable behavior is well-defined.

The implementation complexity of a given feature will vary from browser to browser because of the prior implementation choices they have made. Some browsers may choose not to implement this optional feature as a result. That is all fine in a competitive market. It's a dangerous path, IMO, to say that the interoperability benefits of a standard specification should be denied to all players because one player finds the feature more difficult to implement than others.

ddorwin commented 8 years ago

I think the next step is to get an updated response from the TAG (see my next comment). However, I want to address some points in @mwatson2's comment.

The goal/purpose of EME, from the beginning, was to provide access to Content Protection capabilities previously available only through plugins.

The purpose of EME, as stated in the first sentence of the Abstract, is to enable playback of encrypted content in HTMLMediaElements. This does not imply providing access to all content protection features available through legacy plugins. We have previously chosen not to expose other functionalities provided by such plugins (e.g. when they conflict with the principles or limitations of the web platform).

The feature is optional, so noone is required to implement it.

As has been discussed before, whether this feature is marked "optional" is meaningless when it is actually “required” - for the most basic access to content (online streaming) - by one of the leading users of the API.

The following is not a major issue, but as commented earlier the "in parallel" here is really a noop. "in parallel" means only that pages cannot assume the steps will be complete before the next turn of the event loop. But in this case the effects of the steps can only be observed asynchronously (by loading a session and waiting for the release message) and in the page close case there may be no more turns of the event loop at all. Whether we say "in parallel" or not it looks the same as far as the page is concerned.

Applications also assume that those steps will complete. If they do not, that is, as noted in this quote, observable by the application when it later loads the session.

The current text doesn't imply any particular level of reliability but does not constrain reliable implementation either. I can't see how the text constrains implementations (i.e., how it "does not provide the expected reliability").

Unless the feature or algorithm are labeled best effort - like other features to which it has been compared - the expectation must be the same as other web features and algorithms - essentially 100% reliable. I don't believe the current spec text would ensure that. For example, there is no thread join or other mechanism specified to ensure the parallel steps complete before exiting. Either the spec should include such text or it should state that the feature is best effort. The latter would be inaccurate given the intended usage and Netflix's reliability target.

The main event loop in our specifications is an abstract thing which expresses the serialization of the execution of tasks for the page.

The main thread/event loop is fundamental to the web platform and synchronization. "The event loop" is defined in HTML5 and referenced in the unload algorithms. Behavior of other threads ("in parallel" steps) is not defined, and they are definitely not guaranteed to run to completion (see my paragraph immediately above).

Finally, as noted here, a possible implementation is that the CDM is provided with the necessary origin-specific context at initialization time and has the ability to post the storage task to some separate page-independent queue for later execution, even after the page has closed (as could also be done with ping).

While that is one possible implementation, that does not address these issues. The CDM is not defined as a special entity outside the scope of the browsing context, and any observable behaviors must be defined in terms of the context as with all web platform specs.

In general, I find that these objections confuse specification / platform issues with implementation issues.

I disagree, as I have outlined here and above. In addition to veering into areas, such as object destruction/inaccessibility, that are undefined in the web platform, the current text does not guarantee data will be persisted reliably. I believe it is clear that a simple literal implementation of the spec algorithms would not reliably persist the data without some unspecified synchronization mechanism. That is a specification / platform issue that has nothing to do with specific implementation(s).

It's a dangerous path, IMO, to say that the interoperability benefits of a standard specification should be denied to all players because one player finds the feature more difficult to implement than others.

It’s a dangerous precedent to let portions of a W3C Recommendation exist outside the defined web platform. “Interoperability benefits” seem questionable when the specification does not fully define all required behavior. Also, authors would benefit from a fully-specified and implementable specification that is widely available. Forcing the feature through as-is will not result in a consistent platform for authors. It also seems premature since to finalize this feature in a Recommendation when, as far as we know, only one author has used it in production. (For example, see the potential issues for authors in the second paragraph of https://github.com/w3c/encrypted-media/issues/85#issuecomment-228917577.)

Finally, most of the “benefits of a standard specification” can be achieved without including this feature in the v1 Recommendation. Specifically, nothing prevents maintenance of a proposed specification or implementers from implementing or experimenting with such a feature while work continues to explore the necessary platform hooks. Such an approach will benefit the web as a whole rather than introducing a significant new browser behavior only useful to a tiny fraction of sites.

ddorwin commented 8 years ago

Since the last input from the TAG, the intended behavior has been further clarified (as outlined above). There appears to be disagreement on the implications and even the meaning of running steps "in parallel" and the "main event loop."

@travisleithead and @slightlyoff: Is the current text consistent with the TAG's existing understanding of the feature and feedback that "this feature doesn't seem to work well inside the currently spec'd web platform?" Does the TAG's "strong guidance is to move this to a V2" to allow investigation of the necessary hooks still apply?

mwatson2 commented 8 years ago

@ddorwin wrote:

As has been discussed before, whether this feature is marked "optional" is meaningless when it is actually “required” - for the most basic access to content (online streaming) - by one of the leading users of the API.

Our product plans should not really be relevant to this discussion, but for the record, Netflix is working on Chrome browser today without this feature, so it is not "required" in that sense.

Regarding the "in parallel" steps to store the record of key usage:

Applications also assume that those steps will complete. If they do not, that is, as noted in this quote, observable by the application when it later loads the session.

This assumption would not be observable, since there is another reason why the data might not be present in a subsequent session (the user has cleared it) and the application cannot distinguish this reason from some kind of write failure on page close.

Unless the feature or algorithm are labeled best effort - like other features to which it has been compared - the expectation must be the same as other web features and algorithms - essentially 100% reliable.

It is not required that this feature be 100% reliable. The level of reliability (essentially the frequency with which the write succeeds during normal page shutdown) is a quality-of-implementation issue. If it would help to note that in the specification, we could do so.

I don't see anything in the definition of "in-parallel" that implies those steps have a different status - in terms of whether they will run or not - compared to other steps. So, I still maintain that whether these steps are defined to be "in parallel" or not is unobservable. Still, they may certainly be carried out asynchronously with respect to other unload tasks and this is explicitly stated in a note.

mwatson2 commented 8 years ago

@ddorwin wrote:

Since the last input from the TAG, the intended behavior has been further clarified (as outlined above). There appears to be disagreement on the implications and even the meaning of running steps "in parallel" and the "main event loop."

I feel there is some disagreement as to how the web specifications are prescriptive. As I understand it, they are prescriptive in terms of observable behavior and no more.

So, when we say steps are run "in parallel", we are making a statement about the lack of serialization of observable effects with respect to other observable steps that are not "in parallel", but no more. The specifications do not say anything about when such "in parallel" steps run or what may cause them not to be run. There is nothing to say that the status of the main event loop (running or not) affects this. I don't think this silence can be interpreted as meaning all "in parallel" steps are best effort.

Equally, if steps are run in the main event loop, but their effects are not observable until later, implementations are free to execute the steps later, provided the observable behavior is unchanged.

Nevertheless, I agree the reliability of the feature in question is a quality-of-implementation issue and we could note as much in the specification.

travisleithead commented 8 years ago

Assuming there is no 100% reliability requirement as has been conceded earlier, and there is no user agent requirement to notify the page of completion (of the write) as confirmed earlier, such that the side-effects would only be observerable later (on the next page load or session) then I see this feature largely boiling down to a sendBeacon or a[ping]-style feature.

Using the "destructor" of a JavaScript object as a trigger is still a concern for me. In general, APIs should not be designed around the garbage collection semantics of script engines, nor make GC semantics visible. This is only a small concern for me because there are no observerable side effects that the page could learn. It's just the use of this odd behavior that concerns me. If there were another hook that could be used I'd feel much more comfortable, but as Alex and I have said already, this hook is probably years away from being specified.

Outside of that concern, if there remains an architectural problem around running the persistent record write task outside of the observable event loop, that problem should extend to sendBeacon and a[ping] as well. Certainly aspects of how these three features queue and dispatch their payloads outside the lifetime of a document are not precisely specified. Despite this, I'm not so sure there remains an architectural concern.

@travisleithead and @slightlyoff: Is the current text consistent with the TAG's existing understanding of the feature and feedback that "this feature doesn't seem to work well inside the currently spec'd web platform?"

"work well" - it seems to work well-enough as sendBeacon and a[ping] in my estimation. Is it "inside the currently spec'd web platform"? Well, not precisely. :-)

Does the TAG's "strong guidance is to move this to a V2" to allow investigation of the necessary hooks still apply?

Alex and I have previously said that the hooks needed for precise specification are years away. This is long enough that it extends beyond v1. The salient question is whether that precise specification is necessary, and given sendBeacon and a[ping], I'm not convinced that it is necessary, or we have to call into question those specs as well.

mwatson2 commented 8 years ago

@travisleithead wrote:

Using the "destructor" of a JavaScript object as a trigger is still a concern for me. In general, APIs should not be designed around the garbage collection semantics of script engines, nor make GC semantics visible.

The trigger is that the object "becomes inaccessible to the page", which is well-defined entirely in terms of the Javascript Objects that exist and the variable references to them (i.e. things visible to the page, independent of implementation). Now, of course, when the implementation detects this state depends on its GC implementation, but still this is not visible to the page because the fact of the stored information is not visible until later.

I recently discovered that the unload a document steps include a hook for unloading document cleanup steps defined by other specifications. If the unload a document steps always occur, than an alternative formulation would be to make that the trigger for all unclosed MediaKeySession objects and also specify that browsers MAY close any unclosed MediaKeySession objects that are not accessible to the page at any time. This wouldn't result in any observable behavior difference, but might be a better specification formulation.

travisleithead commented 8 years ago

Note, the TAG has closed https://github.com/w3ctag/spec-reviews/issues/73#issuecomment-236362924. After review in our Stockholm F2F meeting, we have found no architectural concerns with the feature as currently understood.

ddorwin commented 8 years ago

The following quotes are from the minutes of the referenced F2F.

@travisleithead said:

At least agreement that there's not a 100% requirement....Given not 100% reliability, and no side effects, I didn't see a super-strong concern.

@travisleithead, what is your understanding of the reliability requirement? @mwatson2 said, "we expect to receive secure release messages for at least 99% of sessions." The <= 1% messages not received includes crashes, data clearing, etc., so Netflix expects the user agent implementation to be over 99% reliable. In other words, the implementation should be designed for 100% reliability. Even if we assume 99% implementation reliability, are you saying there is a meaningful difference between requiring 99% vs. 100%?

@slightlyoff said:

As long as it's isomorphic to sendBeacon and we're not adding extra constraints on shutdown, that's ok.

I'm not sure those assumptions are true. While sendBeacon() sends the origin, which is cached in the synchronous part of the algorithm, it does not actually perform any origin- or browsing context-specific operations - certainly not in the asynchronous / "after the document has unloaded" steps. In contrast, this feature, requires asynchronously storing origin-specific data that is generated by a CDM instance tied to the browsing context.

Also that there is no expected bad outcome for users of UAs that do not or cannot implement sendBeacon or <a ping> with 99-100% reliability or that do not implement them at all. In contrast, the former could lead to users being denied access to content (due to false positive concurrent stream detection), and the latter could lead to users being entirely denied access to content or higher qualities of content.

@slightlyoff said:

The open question -- got no response -- what is the limit? If you have to do behavioral monitoring of your users, what's the amount of information you need to get information you're obligated to expose. Seems the answer is "as much as possible".

@mwatson2, can you answer that open question?

mwatson2 commented 8 years ago

@ddorwin wrote:

@mwatson2 said, "we expect to receive secure release messages for at least 99% of sessions." The <= 1% messages not received includes crashes, data clearing, etc.

What I said was that we expect 99% of sessions where either there is graceful shutdown of the session before the page is closed or where the user revisits later, without clearing data. So, the 1% does not include data being cleared or where the user hasn't revisited the site. I don't think we include crashes either, but I'd have to check that.

@mwatson2, can you answer that open question?

Yes, but since the TAG have closed their issue, I'm not sure how useful that's going to be. The title of this issue ends "...pending clarification from the TAG", which we now have, so I believe we should now close this one

FWIW - and I don't recall the question being asked before, so I'm not sure who Alex was waiting for a response from - the specification requires recoding of the key ids that were used in the session and the first and last (wall clock) time content was decrypted by the session. This is sufficient for the use-cases we've discussed.

ddorwin commented 8 years ago

On the first item, thank you for the clarification. However, I don't think that affects the point, implementation design targets, or questions.

On the second item, @mwatson2 has said that "The level of reliability... is a quality-of-implementation issue," but there must be some level that is required for the feature be useful. In order to facilitate interoperability, app compatibility, and implementations that are useful for authors, implementers need to know the necessary [minimum] level of reliability. Currently, we have 99+%. I'll let @slightlyoff clarify his question, but I think one aspect is whether such behavior analysis could succeed with meaningfully lower levels of reliability, such as that of sendBeacon and <a ping>.

mwatson2 commented 8 years ago

Aside from the fact that it does not need to be 100% reliable (or five 9's or similar), I don't agree that Netflix's current requirements are relevant here.

The feature is proposed to be optional, so if a particular browser does not meet the requirements of a particular site at a particular time, the site can behave as if the feature were not present (which it has to support anyway, because the feature is optional).

FWIW, several implementors have reached a level of reliability where we find the feature is useful, so it is certainly possible with reasonable effort.

ddorwin commented 8 years ago

Aside from the fact that it does not need to be 100% reliable (or five 9's or similar), I don't agree that Netflix's current requirements are relevant here.

The question was not necessarily about Netflix's current requirements. However, as the only author with implementation experience with this feature, Netflix's experience and usage is very relevant for the evaluation of the Candidate Recommendation.

The feature is proposed to be optional, so if a particular browser does not meet the requirements of a particular site at a particular time, the site can behave as if the feature were not present (which it has to support anyway, because the feature is optional).

Other than checking the user agent string, how would an site determine whether to "behave as if the feature were not present?" It seems like a design flaw for a web platform feature to a) have a critical dependency on reliability in order to be useful to applications yet b) have no requirement or guidance for reliability and c) provide no way to determine the level of reliability.

It is important to note that "behav[ing] as if the feature were not present" may (or likely) includes denying users access to content (or qualities of content) even though the implementation is fully capable of protecting the content and enforcing concurrent stream limitations.

FWIW, several implementors have reached a level of reliability where we find the feature is useful, so it is certainly possible with reasonable effort.

Two implementations use first-party OS-based DRMs that can write periodically rather than at teardown, and (as far as I understand) the third delays shutdown of the CDM and browser and stores the CDM data via a special path that is distinct from that used for other site data. We don't disagree that ~100% reliability is possible with such implementations. However, we don't think a W3C Recommendation should (effectively) force implementers down one of these paths in order to ensure their users have access to content on a handful of sites.

slightlyoff commented 8 years ago

To provide some color of the Stockholm consensus from the TAG meeting, what was debated was the extent to which the proposed feature pushes out the boat past what sendBeacon and <a ping> already provide. If the requirement is that logging be more reliable than those features, the TAG doesn't feel that the feature is in-line with what could be explained in the near future and is therefore a risk to compatibility, platform coherence, and layering.

To the extent that folks here are OK with spec-ing this in terms of the language that <a ping> and sendBeacon() use for their processing, the feature seems roughly OK.

Hope that helps.

paulbrucecotton commented 7 years ago

@ddorwin - Please change the milestone for this issue from V1 to VNext since the "persistent-usage-record" feature is being removed from EME V1. See ISSUE-353.

ddorwin commented 7 years ago

Moved to VNext per the above comment.