"tracked" sessions: architectural concerns pending resolution with TAG

ddorwin commented 9 years ago

Pull request #54 was merged without addressing architectural concerns about “tracked” sessions. Unresolved questions are pending a discussion with the TAG. The outcome could result in modification (or removal) of “tracked” sessions.

This issue is a placeholder for that discussion and outcome.

Resolving #82 and #84 could help accelerate conclusion of this discussion.

mwatson2 commented 9 years ago

The group did not agree that the issue needed to be raised by the TAG. Of course companies are free to do so and that might result in TAG advice for the group to further consider. But it is not the case that we are blocked pending discussion with the TAG.

ddorwin commented 9 years ago

@mwatson2: This issue was carefully worded to reflect the situation. While I am unable to find a path to your conclusion that it implies that "tracked" is "blocked pending discussion with the TAG," I want to clarify that you are right - "tracked" isn't blocked pending a TAG discussion. However, I also believe that this discussion 'could result in modification (or removal) of “tracked” sessions.' Do you disagree?

mwatson2 commented 9 years ago

My comment was just to clarify that the unresolved questions and concerns are Google's, not shared by the group, as this wasn't stated either way in the issue description.

mwatson2 commented 8 years ago

I offered on our call to provide information about the fraction of time we expect to / do receive the secure release information. I have a qualified answer: considering only streaming sessions where either: (a) the session is closed gracefully and the secure release exchange with the server completes, or (b) (a) does not hold, but the user later revisits the site and Local Storage information from the session has not been cleared we expect to receive secure release messages for at least 99% of sessions. And indeed in practice we achieve this in the field a desktop browser that has implemented secure release.

paulbrucecotton commented 8 years ago

As the Sapporo F2F meeting we have no update from the TAG in either EME ISSUE-85 or the related TAG Issue-73.

Paul will continue to chase after @travisleithead and Alex to get feedback on this matter.

paulbrucecotton commented 8 years ago

@travisleithead - Can you please give us an update on the TAG discuss of this EME issue?

travisleithead commented 8 years ago

@slightlyoff and I are still discussing this. I hope we can make some progress this week.

paulbrucecotton commented 8 years ago

@travisleithead and @slightlyoff: Can you give us an update on your progress?

paulbrucecotton commented 8 years ago

@travisleithead and @slightlyoff: Would it be possible for you to attend a Media TF meeting on Tue Dec 15 to discuss your progress on this issue?

travisleithead commented 8 years ago

I know recent conferences, holiday travel, and vacation have been a factor in getting @slightlyoff and I to make progress. I'm available to join the call Dec 15th but am afraid I won't have much to report. Regarding subsequent calls, after the 16th, I won't be available until January 2016.

slightlyoff commented 8 years ago

Apologies.

paulbrucecotton commented 8 years ago

The W3C TAG has filed its review of this issue as "Architectural view on run-after-app-close behavior" at: https://github.com/w3ctag/spec-reviews/issues/73#issuecomment-171536298

Please ask your questions or add comments here on this review or on the TAG issue 73. If required I will request that Travis and/or Alex attend an upcoming Media TF teleconference to discuss this matter.

/paulc

jdsmith3000 commented 8 years ago

The TAG response seems clear that web specs should not require synchronous operations post shutdown, and puts particular emphasis on issues that would be encountered when documents are abruptly destroyed. It does not, however, make judgements on the EME spec or the persistent-usage-record feature, and in fact puts feature impact specifically out of scope, as stated in the second paragraph:

In this response, we only seek to clarify the architectural question of requiring steps to run after application close; we make no value judgement of the feature in question or implementation strategies vendors might choose.

In addition to not passing judgement on EME features, the guidance also leaves implementation choices open to implementers:

Implementations are welcome to add triggers and hooks to run operations on shutdown of specific web platform environments, of course.

This raises two points we should consider:

Does the EME spec require synchronous processing on shutdown? It does not, at least not anywhere in the spec language. There have been discussions about the value of writing data when the session closes, but it's not been established as a spec requirement. A valid implementation could save timed data throughout playback to provide a useful record of key usage.
Choices made by implementers are not restricted. The TAG judgement leaves open design choices made by implementers. In that context, a given implementation might choose to write data on shutdown. At least one current implementation does this now. That is allowable, but doesn’t establish any of its implementation choices as requirements.

Given the TAG guidance, we shouldn’t make changes to EME that require post shutdown processing. Beyond that, I don’t think the judgement invalidates the persistent-usage-record feature, and also don’t think that pull request #54 should be reverted.

mwatson2 commented 8 years ago

I agree with Jerry's comments and would go a little further:

the discussion in the opinion on synchronous operations at shutdown refers to the execution of "arbitrary code" at page close or shutdown. I interpret this as referring to code supplied by the page. The execution of user agent code at shutdown is a different issue (and is clearly unavoidable, since it is the user agent which is shutting down the page).
another valid implementation of the specification requirements would be to persist the secure release data some time after shutdown, in exactly the way Beacon or send data after shutdown. The opinion says "These are reasonable models to follow".

... Mark

On Tue, Feb 2, 2016 at 5:46 PM, jdsmith3000 notifications@github.com wrote:

The TAG response seems clear that web specs should not require synchronous operations post shutdown, and puts particular emphasis on issues that would be encountered when documents are abruptly destroyed. It does not, however, make judgements on the EME spec or the persistent-usage-record feature, and in fact puts feature impact specifically out of scope, as stated in the second paragraph:

In this response, we only seek to clarify the architectural question of requiring steps to run after application close; we make no value judgement of the feature in question or implementation strategies vendors might choose.

In addition to not passing judgement on EME features, the guidance also leaves implementation choices open to implementers:

Implementations are welcome to add triggers and hooks to run operations on shutdown of specific web platform environments, of course.

This raises two points we should consider:

1.

Does the EME spec require synchronous processing on shutdown? It does not, at least not anywhere in the spec language. There have been discussions about the value of writing data when the session closes, but it's not been established as a spec requirement. A valid implementation could save timed data throughout playback to provide a useful record of key usage. 2.

Choices made by implementers are not restricted. The TAG judgement leaves open design choices made by implementers. In that context, a given implementation might choose to write data on shutdown. At least one current implementation does this now. That is allowable, but doesn’t establish any of its implementation choices as requirements.

Given the TAG guidance, we shouldn’t make changes to EME that require post shutdown processing. Beyond that, I don’t think the judgement invalidates the persistent-usage-record feature, and also don’t think that pull request

54 https://github.com/w3c/encrypted-media/pull/54 should be reverted.

— Reply to this email directly or view it on GitHub https://github.com/w3c/encrypted-media/issues/85#issuecomment-178951929.

mwatson2 commented 8 years ago

In the absence of further comments, can we close this issue ?

ddorwin commented 8 years ago

We interpret the text of the official TAG response differently, and below, I have described specific differences with the above interpretations. For the sake of resolving this as efficiently as possible, I propose that we ask the authors, @slightlyoff and @travisleithead, to clarify the intent of the text and accuracy of the interpretations.

@jdsmith3000 wrote:

It does not, however, make judgements on the EME spec or the persistent-usage-record feature...

The TAG opinion says the TAG “make[s] no value judgement of the feature.” That is, for example, whether it would be useful. Although the text is general, there is a clear conclusion on whether "a web-based feature should require executing steps in an environment that is already in the process of closing down."

Does the EME spec require synchronous processing on shutdown? It does not, at least not anywhere in the spec language.

Actually, that is exactly how the observable behavior is defined. My interpretation is that the TAG response recommends that specs do not define such behavior.

An approach to consider is to try and change the feature definition to avoid this behavior. However, as discussed before, one possible conclusion of that path is tamper-evident-storage, which is not always available, especially for non-first-party implementations.

Choices made by implementers are not restricted. The TAG judgement leaves open design choices made by implementers. In that context, a given implementation might choose to write data on shutdown. At least one current implementation does this now. That is allowable, but doesn’t establish any of its implementation choices as requirements.

In this specific case, the choice is a Hobson's choice. As the feature is currently defined, some implementations can make only one choice - one that the TAG discourages specs from requiring. I interpreted this text as allowing implementations choice in how they implement features, possibly not even web platform features. But such choices optional, i.e. to enable optimizations. I don't think this was referring to a case where something is the only possible solution for a large portion of implementations.

@mwatson2 wrote:

- the discussion in the opinion on synchronous operations at shutdown refers to the execution of "arbitrary code" at page close or shutdown. I interpret this as referring to code supplied by the page. The execution of user agent code at shutdown is a different issue (and is clearly unavoidable, since it is the user agent which is shutting down the page).

This is missing the context. "Arbitrary code" is immediately contrasted with "declarative, canned behaviors." Beacons and <a ping>, the existing web platform features discussed before that text fall into the latter category. The user agent can process these when the page loads and prepare the actions for when it unloads. In contrast, this feature requires the user agent to allow the CDM - a separate entity - to execute code on its behalf for the page when the page is closed.

- another valid implementation of the specification requirements would be to persist the secure release data some time after shutdown, in exactly the way Beacon or send data after shutdown. The opinion says "These are reasonable models to follow".

This is misleading and omits important statements from the same paragraph. The TAG response does not say persisting such data "some time after shutdown" is a reasonable model to follow.

Beacons and <a ping> have three very important properties, two of which are mentioned in the surrounding text, that this feature does not.

They have a declarative semantic for canned behaviors that can be replayed by the user agent.
They “are designed not to interfere or block navigation of a document nor shutdown of a browsing context.” Of particular note is that they do not modify exposed or persisted per-origin state.
They are (very) best-effort, as noted in the response.

These properties allow user agents to build a list of actions - or requests to replay later - and maintain that list independent of the page or origin. Thus, when the page is closed, the user agent can process the actions without the page's context and at a time of its choosing. In addition, it is acceptable - even expected - that some (potentially significant) percentage of the requests will not succeed.

In contrast, for this feature:

The user agent does not even know that it is supposed to perform an operation until the page is closed. (Because they might need to perform an operation, user agent implementations must wait for the CDM instance to tear down before destroying the page and its context.)

Since the operation is to generate and store store data for the origin that may later be read by the app/origin, the user agent must keep the page's browsing context alive to perform that operation.

We are told above it requires somewhere around 99% reliability.

There are a lot of details in the body of the TAG response, but I think the conclusion is clear:

In conclusion, the TAG does not believe that a web-based feature should require executing steps in an environment that is already in the process of closing down as described above. In general, the TAG favors designs that promote asynchronous or deferred actions; in contrast, requiring run-at-close steps as requested would likely be synchronous in order to reliably work in such a scenario, and therefore not appropriate for the web platform.

mwatson2 commented 8 years ago

Just a couple of points in response:

The user agent can process these when the page loads and prepare the actions for when it unloads. In contrast, this feature requires the user agent to allow the CDM - a separate entity - to execute code on its behalf for the page when the page is closed.

The relationship between user agent and CDM in an implementation is a software architecture issue. I don't see how we can say that there is a web-architecture-visible difference between the user agent executing code and a CDM executing code even though those may have different characteristics in any particular implementation.

The user agent does not even know that it is supposed to perform an operation until the page is closed.

This is true of <a ping> too, since it is not until the user follows the link that it is known the ping should be sent or which one should be sent of several on the page.

Since the operation is to generate and store store data for the origin that may later be read by the app/origin, the user agent must keep the page's browsing context alive to perform that operation.

It's not clear to me that the entire browsing context is required: just as the user agent may keep a queue of pings or beacons to be sent later with their individual destinations the user agent may keep a queue of blobs to be stored later with their individual origins.

mwatson2 commented 8 years ago

IIUC, we are expecting further TAG advice on this issue, or at least clarification on their previous advice. However, I don't think we should make this a dependency on our progress. So, I think this issue should be reclassified V1NonBlocking (the implication being that if there is no further progress on this issue, the specification is left unchanged).

ddorwin commented 8 years ago

Yes, we are waiting for clarification from the TAG. If the feature was not currently in the spec, I would agree that V1NonBlocking makes sense. However, one interpretation of the most recent TAG response is that the feature is not acceptable as defined and thus should be removed from the spec unless/until it can be made acceptable. (For example, through incubation for VNext, which I think is probably the best path forward.) A Proposed Recommendation (PR) should not include such known issues, so we must get this clarity before V1 gets to that point, meaning the V1 milestone is appropriate

jdsmith3000 commented 8 years ago

I agree that TAG clarification is still needed, but don't agree that in the absence of such clarification the feature should be pulled from the spec. Keeping this on V1 would force that if clarification isn't accomplished by our deadlines. Given the majority support for the feature in the TF, that does not seem appropriate.

I support moving this to V1NonBlocking based on that.

ddorwin commented 8 years ago

Reclassifying this issue as V1NonBlocking would indicate that the spec could reach V1 without TAG clarification. However, this issue does block V1 - the spec will not reach REC without a clear resolution of the related concerns raised about the current spec contents, and TAG clarification is the most appropriate next step towards a resolution.

jdsmith3000 commented 8 years ago

This would imply that any issue that requests TAG clarification should be considered blocking until such clarification is received. I'm asserting that is not appropriate, especially on an issue with general working group support. In this specific case, if TAG chooses not to provide additional clarification, then I believe the existing feature in the spec should remain. That would imply V1NonBlocking.

I understand there is a difference in interpretation of the previous TAG opinion, but my reading of it does not make it clear that the feature should be removed. There is considerable discussion of challenges for implementation. I believe that at least two CDMs have working implementations now, and that they are providing useful data.

If TAG informs us that the issue should be considered blocking until additional clarification is provided by them, then it should certainly be marked V1.

slightlyoff commented 8 years ago

We discussed this at the most recent TAG meeting and wrote a response: https://github.com/w3ctag/spec-reviews/issues/73#issuecomment-203990395

The TL;DR of which might be summarized as: this feature doesn't seem to work well inside the currently spec'd web platform, will probably take years to rationalize, and seems like a good candidate for a V2.

mwatson2 commented 8 years ago

Hi Alex,

Thanks, but I think the opinion is still based on a significant mid-understanding of the feature.

The feature does not require "low-latency notification" to the license provider. A much earlier version of the feature did, but it was removed more than a year ago (for some of the reasons discussed).

I do agree that specifications of such a behavior requires some significant new platform infrastructure - for example background task Workers of some kind that could handle the messaging. At the moment, though, I don't think anyone is requesting that.

In the meantime, I still don't see that there is any "web architecture" issue with the feature as specified since it requires only that the release information be persisted locally for later retrieval by the site. Writing a few KB to disk does indeed take some time and use some battery power but both are negligible. Even if that delay were a concern, the write-to-disk operation could be queued in memory to be performed later.

Furthermore, this feature has been implemented by three of the four major desktop browsers and is deployed and working with live services. It's an optional feature. Given this context, it's valuable for interoperability for us to specify it.

I would be more than happy to speak with the TAG about this if you feel there are still concerns with the clarification above.

...Mark

On Mar 31, 2016, at 8:41 AM, Alex Russell notifications@github.com wrote:

We discussed this at the most recent TAG meeting and wrote a response: w3ctag/spec-reviews#73 (comment) https://github.com/w3ctag/spec-reviews/issues/73#issuecomment-203990395

The TL;DR of which might be summarized as: this feature doesn't seem to work well inside the currently spec'd web platform, will probably take years to rationalize, and seems like a good candidate for a V2.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/w3c/encrypted-media/issues/85#issuecomment-203991233

travisleithead commented 8 years ago

In the meantime, I still don't see that there is any "web architecture" issue with the feature as specified since it requires only that the release information be persisted locally for later retrieval by the site.

Indeed, the spec as written lays out the requirements for implementation--albeit in a very ambiguous way in some instances especially in regard to the CDM. And given those requirements, implementations have built the feature. Is it interoperable? It's hard to prove. What would the specific tests be? Many of the requirements weren't distilled down into spec specifics (and rightly so) because they are currently hard to quantify. I think that's what Alex and I are getting at--those specifics could be figured out, but it'll likely take some time.

Furthermore, this feature has been implemented by three of the four major desktop browsers and is deployed and working with live services. It's an optional feature. Given this context, it's valuable for interoperability for us to specify it.

I don't think we would disagree with this. Writing down what you've got in a document that could eventually become a REC would be great, and we are not suggesting throwing away your feature. We are suggesting extracting this part of the spec and re-locating it into a v2, (or new extension document in the WICG would be great), in the interest of unblocking the rest of EME to progress.

Finally, as Alex is fond of saying "you can't fail a TAG review". We're here to help and offer our thoughts; you're not required to take our advice :) Sorry we took so long to respond.

mwatson2 commented 8 years ago

On Mar 31, 2016, at 12:57 PM, Travis Leithead notifications@github.com wrote:

In the meantime, I still don't see that there is any "web architecture" issue with the feature as specified since it requires only that the release information be persisted locally for later retrieval by the site.

Indeed, the spec as written lays out the requirements for implementation--albeit in a very ambiguous way in some instances especially in regard to the CDM.

Could you elaborate where you see the ambiguities ?

And given those requirements, implementations have built the feature. Is it interoperable? It's hard to prove. What would the specific tests be? Many of the requirements weren't distilled down into spec specifics (and rightly so) because they are currently hard to quantify. I think that's what Alex and I are getting at--those specifics could be figured out, but it'll likely take some time.

All features in the specification need tests to demonstrate interoperability in order to advance to REC. If there is reason to believe this is going to be more difficult for this feature than for others (is there ? why ?) then the correct path is to mark it as a feature at risk and then see if we have interoperable implementations, not to pre-emtively throw it out.

Furthermore, this feature has been implemented by three of the four major desktop browsers and is deployed and working with live services. It's an optional feature. Given this context, it's valuable for interoperability for us to specify it.

I don't think we would disagree with this. Writing down what you've got in a document that could eventually become a REC would be great, and we are not suggesting throwing away your feature. We are suggesting extracting this part of the spec and re-locating it into a v2, (or new extension document in the WICG would be great), in the interest of unblocking the rest of EME to progress.

What's blocking progress is a single group participant who wants this feature removed, despite obvious market demand and existing implementations. It's valid to raise architectural concerns, but so far I don't see that they've been demonstrated as having any merit.

...Mark

Finally, as Alex is fond of saying "you can't fail a TAG review". We're here to help and offer our thoughts; you're not required to take our advice :) Sorry we took so long to respond.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/w3c/encrypted-media/issues/85#issuecomment-204101825

travisleithead commented 8 years ago

Could you elaborate where you see the ambiguities ?

Sure. The one that stood out to me (and where I believe the root of the controversy over running code at shutdown seems to stem from--though again it's hard to pinpoint this) from my previous read of the spec are the entry conditions for the Close all Sessions algorithm. Close all Sessions -> Session Close -> write the persistent-usage record per step 2.2. This chain is clear, but the entry-points for Close all Sessions are where I see the ambiguity. They are requirements statements without any additional clarification or specifics. To enumerate:

In Section 5:

If the CDM instance represented by a MediaKeys object media keys becomes unavailable for any reason, then the user agent shall run the Close all Sessions algorithm on media keys.

and to a lesser extent, from 7.5.3 step 2.3:

If cdm is no longer usable for any reason, run the following steps:

and similarly step 2.4.3.5.3.clause1.2 of the same section:

If cdm is no longer usable for any reason then run the Close all Sessions algorithm on media keys.

The general statements "for any reason" are where I particularly noted the ambiguity and lack of detail. Hope this helps.

If there is reason to believe this is going to be more difficult for this feature than for others (is there ? why ?) then the correct path is to mark it as a feature at risk and then see if we have interoperable implementations, not to pre-emtively throw it out.

That also sounds like a reasonable path forward to me.

mwatson2 commented 8 years ago

Thanks. Those statements were mostly introduced to ensure we described the general site-visible behavior when the CDM became unavailable for some implementation-specific reason (e.g. if it is in a separate process and that process crashes). We could clarify that.

For the secure release feature it is only necessary that the session close algorithm is run whenever the video element is closed (src changed, element removed from DOM or whole page closed). We could also clarify this.

Just to be clear, the steps that involve interaction with the page do not need to be run if the page is closing / closed. We could either make that explicit or assume it is covered by the rule that all indistinguishable-to-the-page implementations are valid (not running these page interaction steps is indistinguishable from running them because even if they are run there is no page to interact with).

mwatson2 commented 8 years ago

Ok, I have finally had a chance to review the text.

There are two places in the specification which refer to persisting the "record of key usage". This persistence step is (IIUC) the root of the concerns behind this issue. Note that persisting the "record of key usage" has no immediate page-visible effects: no messages are sent to the page, there is no low-latency notification. The page-visible effect of this persistence is that if the session is later reloaded, the record of key usage can be retrieved.

The first reference to persisting the "record of key usage" is in the definition of the "persistent-usage-record" session type, which starts:

A session for which the license and any key(s) it contains shall not be persisted and for which a record of key usage shall be persisted when the keys available within the session are destroyed.

The second reference to persisting the "record of key usage" is in the Session Close algorithm, as noted above.

In fact, this second reference is redundant with the first, since during Session Close, for this type of session, the keys available within the session are destroyed.

The secure release functionality depends on the first requirement and is not dependent on whether or when the Session Close algorithm is called. Ambiguity in when Session Close is called is caused by the implementation-specific nature of the CDM "becoming unavailable", but this should not impact the analysis of the secure release feature (specifically, if Session Close is never called except though explicit page action, secure release will still function).

So, the first requirement asks for a single step ("a record of key usage shall be persisted") to be executed when a specific event occurs ("when the keys available within the session are destroyed"). When this happens depends on the CDM, but it can indeed occur when a page is being closed. In this respect, however, it is no different from <a ping>, which requires several steps to be performed at this time.

Those steps are followed by the following:

This may be done in parallel with the primary fetch, and is independent of the result of that fetch.

where (I assume) the "primary fetch" is the fetch for the new page. Presumably this implies that the <a ping> steps can be executed in parallel with the closing of the old page (and therefore need not delay closing of that page). This is possible because there is no feedback to the page as a result of the <a ping>.

Per the TAG advice to align with existing features such as <a ping> and beacon, I suggest we introduce a similar possibility for secure release by adding the following text:

Persistence of the record of key usage may be performed asynchronously with other operations such as closing of the browsing context.

This admits implementation options where, for example, the user agent queues the data to be persisted for later execution, similar to <a ping> and beacon.

mwatson2 commented 8 years ago

Paul asked me to explicitly re-iterate the reason why I stated that the latest TAG opinion was based on a mis-understanding. This is the reference to "low-latency notification", which I interpret to mean an immediate notification to the page at the time that the page is closing.

The feature does not describe or require any such notification.

The original TAG advice suggested following the model established by <a ping> and beacon. Per my comment above, <a ping> explicitly describes steps to be carried out at a time when the page is closing (specifically, when the user follows the link), providing flexibility for the UA to carry out these steps asynchronously if desired, because no interaction with the (closing) page is required.

I suggest we provide the same flexibility here and hopefully that will address this issue.

ddorwin commented 8 years ago

It is not just the persistence that is a problem - the user agent doesn't even have (or necessarily know to expect) the data when the page is closed. The data comes from the CDM instance, which is tied to the lifetime of the MediaKeys and/or MediaKeySession object.

Thus, even if implementations could (see below) parallelize the persistence, they would still need to either delay teardown or allow the implementation of the MediaKeys and/or MediaKeySession object to extend beyond the lifetime of the Document.

Furthermore, the required persistence for this feature is much different than fetching a preconfigured request. Per the spec, all "Persisted data MUST always be stored such that only the origin of this object's Document can access it." In other words data must be stored per-origin and for the origin that created it. Thus, the persistence of the "record of key usage" depends on the Document and its origin. The user agent should not need to refer to the origin of the Document after it has been destroyed, and it sounds like specs don't currently have any way of specing such behavior. In addition, it is quite reasonable and even beneficial [1] to the user for an implementation to use the same storage mechanisms it uses for other site data. Such mechanisms are likely to depend on the document, browsing context, and/or objects related to them, all of which are being destroyed.

As discussed above, this feature is quite different from ping and beacon in several important, though perhaps subtle, ways. Specifically, in contrast to the above, for ping and beacon:

The user agent has all the data it needs well before the page is torn down.
- Ping: It has the data during layout.
- Beacon: The data is provided with the call.
- This feature: The data is not provided - and whether there will even be data is not known - until after objects are destroyed as a result of the page being destroyed.
The operations are initiated before page teardown
- Ping: Initiated when the user clicks the link.
- Beacon: Initiated when sendBeacon() is called.
- This feature: Initiated after objects are destroyed as a result of the page being destroyed.
No context-specific operations are performed during or after the document has unloaded.
- Both ping and beacon simply queue requests that are unrelated to the document once they are processed.
- Note that the sendBeacon() Processing Model specifies that only the instantiation and fetching of the request "may be run even after the document has unloaded."
- This feature: The origin-specific persistence is not initiated until after objects are destroyed as a result of the page being destroyed.
The reliability expectation is very best effort.
- This feature: Expected to have "at least 99%" reliability.

In conclusion, this feature is not like ping or beacon, and specs don't currently have any way of specing such behavior. Thus, we recommend we follow the TAG's "strong guidance… to move this to a V2 (VNext) where it can receive the attention it really needs to be well-layered in the platform." Specifically, we recommend opening a new issue to define the "persistent-release-message" feature in a way that is consistent with the web platform (including whatever additions to the platform are necessary), marking that issue VNext, removing the current text from the V1 draft, and closing this issue.

We believe this is the best path forward for all parties because:

Such attention and specification ensures maintainability in the web platform and implementations (vs. a one-off EME-specific mechanism)
Thus enabling interoperability and broad support across clients, which benefits users, authors/content providers, and the web platform.
It unblocks and avoids delays in V1.
It does not affect most existing applications and can be removed (and re-added) without disrupting the rest of the spec.

[1] "User agents SHOULD present the interfaces for clearing Distinctive Identifiers and Key System stored data in a way that helps users to understand this possibility and enables them to delete data in all persistent identification and storage features, including HTTP session cookies [COOKIES] and web storage, simultaneously."

travisleithead commented 8 years ago

@mwatson2 Thanks for the extra details. Here's what I took away:

Session close steps may be executing a bit too broadly as currently speced.
In the case where the persistent usage record is to be written and the original document has unloaded, there are no page-observable requirements (events, etc.) that would require keeping the page alive.
Session close and the requirement to write the record, are distinct activities (one does not depend on the other).
"when the keys available within the session are destroyed" (i.e., the MediaKeySession is being destructed) is when the usage record is to be written.
(this is contested) writing the record is like <a ping> or sendBeacon, so it has precedence in the web platform.
TAG's "low-latency notification" meant synchronous work being done on page shutdown.

For (1) (3) @mwatson2 noted some additional spec clarifications which would be helpful.

On (6), I don't believe I was misinformed about the nature of the problem, but I can't speak for @slightlyoff. Regardless, thanks for ensuring we captured that feedback.

From what I understood @ddorwin contests (5) above, saying that the data relationship is wrong. Rather than page->async process as is done for <a ping> and sendBeacon, in persistent usage record, the data is flowing in reverse: async process->page where async process is the CDN.

@ddorwin also proposes some ways forward for the WG. In relation to how to proceed spec and process-wise, as the TAG, we'll politely excuse our self and get out of your way :)

My understanding of the MediaKeySession object is that it is a proxy object for the actual CDN-singleton, shared among potentially many open browsing contexts. In this way it can manage keys and licenses for any and all related sessions. In this way, per origin, it acts much like the SharedWorker object whose lifetime may extend beyond that of the original opening document. It must have a somewhat complex relationship with multiple documents, and the concerns @ddorwin raised about tracking whose document's origin is relevant appears to be a legitimate concern. As the TAG noted before, some additional spec work is necessary.

Personal aside: I imagine @ddorwin's concern noted above could be assuaged by describing an in-memory push model (periodic data flow from an active or closing page->CDN with relevant document/origin info for the persistent usage record) rather than defining it as an event caused by the CDN where data flows back into the page/user agent to be written. This would make the system behave much more like <a ping> et al., than otherwise. Just a thought.

mwatson2 commented 8 years ago

@ddorwin

It is not just the persistence that is a problem - the user agent doesn't even have (or necessarily know to expect) the data when the page is closed. The data comes from the CDM instance, which is tied to the lifetime of the MediaKeys and/or MediaKeySession object.

There must be some process by which the CDM is shut down and/or a specific session closed. As part of that process the CDM needs an opportunity to provide the data to be persisted. MediaKeys and/or MediaKeySession objects are the means by which pages interact with the CDM. They can be destroyed as soon as page interaction with the CDM is no longer available. Whether the CDM continues to exist after that in some implementation-internal form is an implementation choice.

The user agent has all the data it needs well before the page is torn down.

Ping: It has the data during layout.

Beacon: The data is provided with the call.

This feature: The data is not provided - and whether there will even be data is not known - until after objects are destroyed as a result of the page being destroyed.

In the ping case there remains one bit of data which is not available until the link is followed: the very fact that the rest of the information must be sent. Knowing the rest of this information in advance conveys no advantage at all - it is useless until it is time to send it.

More precisely, the information is available as soon as the CDM is informed that the session is to be closed. This is not necessarily "after objects are destroyed".

HTTP requests are associated with an origin as well. Deferring the ping or indeed the CDM data persistence does involve recording the origin and perhaps other information needed to execute the task. The origin associated with a MediaKeySession does not change, so this could be provided to the CDM at instantiation time.

The considerations seem very implementation specific, wheras our specification should be decided on issues that are implementation-independent (any implementation that has the same observable behavior is equally valid).

There is clearly a market demand for this optional feature as it is implemented and in use in live systems today in multiple browsers. I'm certainly prepared to believe it could be improved, but there is substantial interoperability benefit to specifying it now.

mwatson2 commented 8 years ago

@travisleithead I agree with your items (1)-(5), but have one minor clarification:

"when the keys available within the session are destroyed" (i.e., the MediaKeySession is being destructed) is when the usage record is to be written.

This is when the data to be persisted is available. It would be fine to actually write it to disk some time later (admitting a small possibility that it is lost due to a crash happening before that time).

On (6):

TAG's "low-latency notification" meant synchronous work being done on page shutdown.

Can you explain a bit more what you mean ? There is all sorts of work to be done on page shutdown (drawing to the screen, freeing memory, freeing OS or hardware resources such a video decoders, microphone, camera, etc., queuing tasks - such as a ping - for asynchronous execution). What characterizes the kind of work-at-shutdown that is problematic (other than CPU load or expected duration) ?

From what I understood @ddorwin contests (5) above, saying that the data relationship is wrong. Rather than page->async process as is done for and sendBeacon, in persistent usage record, the data is flowing in reverse: async process->page where async process is the CDN.

I don't believe the specification requires the CDM to be an "async process". One can imagine implementations where there is a simple synchronous call to the CDM to close the session and return the data to be persisted. Implementation choices regarding internal threading models that do not affect page-observable behavior shouldn't qualify as a "web architecture" issue.

I would agree that if multiple implementors argue that a given feature is especially complex to implement, that is a valid reason for reconsidering the feature. But in a competitive market a single implementor should not have a veto on new features based on their own implementation considerations.

My understanding of the MediaKeySession object is that it is a proxy object for the actual CDN-singleton, shared among potentially many open browsing contexts. In this way it can manage keys and licenses for any and all related sessions. In this way, per origin, it acts much like the SharedWorker object whose lifetime may extend beyond that of the original opening document. It must have a somewhat complex relationship with multiple documents, and the concerns @ddorwin raised about tracking whose document's origin is relevant appears to be a legitimate concern.

I do not think the spec takes a position on whether there is a single CDM for all origins / documents or a separate instance for each origin / document. Certainly both are possible. (Even if the spec did take a position on the definition of "CDM", an implementation might still take either approach in terms of software components).

A software component providing CDM services for multiple origins needs to know the origin for each session for many reasons, not least to prevent information leakage. So, it certainly would already have the necessary context to scope the persisted data we are discussing here.

Personal aside: I imagine @ddorwin's concern noted above could be assuaged by describing an in-memory push model (periodic data flow from an active or closing page->CDN with relevant document/origin info for the persistent usage record)

IIUC, the document / origin information doesn't change once the session is established, but if it did then the implementation approach you describe could certainly be used without needing anything in the specification.

If the (software component providing the) CDM has the necessary context and if it outlives the page, then it needs only access to a global scope "writeDataForOrigin" service.

I do understand that one concern is that in one implementation approach there is a thread-of-execution that is dedicated to a single page and thus will end when that page is closed. If the CDM's only access to persistent store is through that thread-of-execution and the CDM is executing in a different thread-of-execution, then it is necessary for there to be cross-thread message passing to and from the CDM. Whether this is a problem is a matter of opinion, but the main point is that this is only one implementation approach and there are many many others: it is hardly a "web architecture" issue if it is so specific to a single approach.

travisleithead commented 8 years ago

TAG's "low-latency notification" meant synchronous work being done on page shutdown.

Can you explain a bit more what you mean ? There is all sorts of work to be done on page shutdown (drawing to the screen, freeing memory, freeing OS or hardware resources such a video decoders, microphone, camera, etc., queuing tasks - such as a ping - for asynchronous execution). What characterizes the kind of work-at-shutdown that is problematic (other than CPU load or expected duration) ?

This was meant only to restate your comment about the possible mis-understanding of the TAG, and to let you know that I heard and acknowledged it :) Nothing more.

mwatson2 commented 8 years ago

@travisleithead

TAG's "low-latency notification" meant synchronous work being done on page shutdown.

Can you explain a bit more what you mean ?

This was meant only to restate your comment about the possible mis-understanding of the TAG, and to let you know that I heard and acknowledged it :) Nothing more.

Do you mean that you agree with my characterization as "immediate notification to the page at the time that the page is closing" ? i.e. "synchronous work" means work done by the page Javascript, rather than all those things I mentioned that the UA does ?

travisleithead commented 8 years ago

I agree that the spec as far as I could tell does not have any synchronous requirements to write the data as part of some existing synchronous action. Tearing down of the browsing context is likely some async thing (which is not specified anywhere, unlike navigation) and adding a synchronous task to an asynchronous task makes the whole thing still asynchronous.

mwatson2 commented 8 years ago

I've made a proposal in #171 to clarify some of the points that arose in the above discussion, including making more explicit the procedure in which licenses / keys are destroyed, which needs to happen both for temporary and persistent-usage-record sessions.

I'l like to note as well that a possible implementation of this feature is for the CDM to be supplied with the storage context information (i.e. origin) on session creation and then for the CDM to be able to post a task to a global task queue on session destruction, this global task queue being independent of the page. The task contains the information to be stored and the storage context and can be executed asynchronously to the page close process.

ddorwin commented 8 years ago

The TAG’s "strong guidance" is to make this a VNext feature. It has been three weeks since the TAG’s concerns were reaffirmed.

Since then, related conversations have sparked two additional bugs: #171 and #180. Much of #171 and all of #180 require resolution before V1 only if we ignore the TAG’s recommendation and define behavior outside “the currently spec'd web platform.”

In the interest of helping us appropriately prioritize and make progress on issues blocking V1, we humbly request that we reach a resolution for #85 ASAP. We concur with the TAG’s recommendation, and we continue to recommend we follow the TAG's "strong guidance… to move [the "persistent-usage-record" session feature] to a [VNext] where it can receive the attention it really needs to be well-layered in the platform."

mwatson2 commented 8 years ago

The TAG’s "strong guidance" is to make this a VNext feature.

Except that this advice was based on a mis-understanding, per the discussion above.

mavgit commented 8 years ago

This is a feature we are planning to use, so I would like it to be kept in V1.

jdsmith3000 commented 8 years ago

@mavgit Can you provide a bit more detail on how this feature meets your needs and the importance you place on it for EME V1?

ddorwin commented 8 years ago

The TAG’s "strong guidance" is to make this a VNext feature.

Except that this advice was based on a mis-understanding, per the discussion above.

@mwatson2, I believe TAG addressed concerns around potential misunderstanding on April 18th. Would you accept the TAG's guidance if the TAG (re)clarified it understands the feature?

mwatson2 commented 8 years ago

@mwatson2, I believe TAG addressed concerns around potential misunderstanding on April 18th.

No, that note stated their understanding as "TAG's "low-latency notification" meant synchronous work being done on page shutdown" and subsequent discussion with @travisleithead did not dispute my characterization of this as synchronous work by the page. Also see this comment from @travisleithead.

The feature does not require any such "low-latency notification" on which the opinion was based.

Would you accept the TAG's guidance if the TAG (re)clarified it understands the feature?

They would have to explain why they saw an architectural problem, given a correct understanding.

paulbrucecotton commented 8 years ago

Can we get agreement to mark this feature "at risk" when EME transitions to Candidate Recommendation?

This will: a) give us more time to demonstrate the number of implementations we have of this feature, b) give us more time to discuss the matter further with the TAG, and c) still leave open the possibility of removing the feature when we transition to Proposed Recommendation.

mavgit commented 8 years ago

@jdsmith3000 This is used in support of download count or device count use cases associated with usage limits. Here is a reply from my security team:

The service permits clients to submit a secure assertion to a DRM/security endpoint that license(s) in connection with a business identifier are not authorized by way of confirming that licenses are not present in local cache. In current implementations, the application calls a release API passing in a business identifier. One or more licenses are deleted from local cache if one or more licenses are bound to a business-oriented attribute matching the specified business identifier. Since licenses may have previously expired or removed from cache, actual license deletion would only occur in the event matching licenses are present. Regardless of whether license deletion executes, the license must no longer exist in local cache in order for the client to proceed into generating its assertion. Provided the client is able to guarantee licenses in connection with the business identifier are not present in local cache, the client generates a digitally signed assertion identifying the business identifier and optionally license identifiers. The secured assertion is submitted into a DRM/security endpoint. The distribution service is then able to disassociate the client with specified licenses in support of download / device count use cases associated with usage limits.

ddorwin commented 8 years ago

@mwatson2 wrote:

No, that note stated their understanding as "TAG's "low-latency notification" meant synchronous work being done on page shutdown" and subsequent discussion with @travisleithead did not dispute my characterization of this as synchronous work by the page.

@mwatson2, can you explain how your characterization of "synchronous work by the page" (on shutdown/teardown) is different from "synchronous work being done on page shutdown?"

@paulbrucecotton wrote:

Can we get agreement to mark this feature "at risk" when EME transitions to Candidate Recommendation?

We agree that marking the feature "at risk" is the minimum required to ensure we do not put the schedule at risk if we do not resolve this before CR. However, keeping the feature in the spec through CR means we must continue to spend cycles debating related issues - at least one of which relates directly to the current discussion - before June 9th. Therefore, as before, I believe the prudent path forward is to file a VNext issue to track this feature, fork/branch the current spec text to enable continued development outside V1 and its schedule pressures, and remove the text from the V1 draft.

paulbrucecotton commented 8 years ago

@ddorwin wrote:

Therefore, as before, I believe the prudent path forward is to file a VNext issue to track this feature, fork/branch the current spec text to enable continued development outside V1 and its schedule pressures, and remove the text from the V1 draft.

If we fork/branch at CR then ALL changes that are made to the CR branch that are pertinent to the VNext branch must be made there as well. And since we have lots of V1NonBlocking and V1Editorial issues we plan to make during CR since will mean more work for the Editors and WG during CR when we should be doing testing.

I believe a better plan is to fork/branch at PR just before we decide what "at risk" features need to be removed. This causes NO additional work for the Editors during CR.

ddorwin commented 8 years ago

@paulbrucecotton, my comments were not in relation to the main "VNext branch", your proposed plan, or "at risk" features in general. I did not mean that such a fork/branch would be the VNext branch.

Forking/branching before removal was simply a suggestion for continuing to iterate on text that is no longer in the mainline. There are other options as well, including creating a branch from a historical commit and applying a git revert. However, such mechanics are really irrelevant to the proposal, so I'll simplify it as follows:

File a VNext issue to track this feature, remove the text from the V1 draft, and continue discussion and development outside the constraints of the V1 schedule pressures.

My concern about this particular "at risk" feature is that there are pending substantive changes for which the same unresolved concerns apply and others that only need to be addressed by June 9th if this feature is still in V1.

mwatson2 commented 8 years ago

@ddorwin asked:

can you explain how your characterization of "synchronous work by the page" (on shutdown/teardown) is different from "synchronous work being done on page shutdown?"

By "work done by the page", I mean execution of page Javascript.

The more general "synchronous work being done on page shutdown" could mean any work done by the CPU at the time the page is closed, before it is considered closed. This is hard to define, because a definition of "synchronous" in this context would mean that the work in question is done "before the page is consider closed" and I am not sure how we define "the point a which the page is considered closed" ?

In any case, firstly, I can't think of an implementation-independent definition of this point in which secure release requires work to be done before that and where there isn't already plenty of other work that has to be done (by the CPU). For example, if we are concerned about work that needs to be done before "all resources associated with the page are released", say, then there are obviously things like releasing hardware resources that have to be done and queuing storage of the secure release message is trivial by comparison and very similar to queuing an <a ping>.

54 https://github.com/w3c/encrypted-media/pull/54 should be reverted.

w3c / encrypted-media

"tracked" sessions: architectural concerns pending resolution with TAG #85