MUST clear requirement for deviceId

martinthomson commented 8 years ago

I apologize for not properly following the giant cluster of issues around #322, #326, #328, and #330, but I don't believe that the main reason I opened #322 was properly addressed.

The offending text is this:

All enumerable devices have an identifier that MUST be unique to the page's origin. As long as no local device has been attached to a live MediaStreamTrack in a page from this origin, and no stored permission to access local devices has been granted to this origin, then this identifier MUST NOT be reusable after the end of the current browsing session.

As noted in #322, this is not what browsers currently do. Nor is it what I believe that they should do. They should do what Harald says Chrome does. The other parts of the text are fine, but this isn't right.

I would be OK with "the user agent MAY generate a new identifier for every new realm created for the origin, see [[section with better details]]."

stefhak commented 8 years ago

I may have gotten this totally wrong, but isn't the reason for mandating that deviceId is not reusable to avoid it being used for fingerprinting? @martinthomson what precisely is the problem with the current text?

martinthomson commented 8 years ago

The current text requires that the deviceId is cleared at a very precise time. See the definition of browsing session.

The requirement to clear at that precise moment is a) a bad idea (see #322) and b) not what anyone does in practice. @jan-ivar just reminded me of this, asking if I would object to him landing a change that made Firefox compliant. That change is a lot of fairly complex code and I think we'd be better off taking Harald's view on this: treat it just like a cookie.

stefhak commented 8 years ago

@jan-ivar also pointed out in #322 that the current spec text is a result addressing input given during the last call. This was settled June/July last year, and I think it has been in the spec since. That the spec has said has said for a long time does not mean it is right, but personally I'm still a bit confused on why we would need to change (especially if it means we open up a difficult privacy discussion again).

martinthomson commented 8 years ago

The text was added before the definition of browsing session was clear.

stefhak commented 8 years ago

Ok, that's a change. I don't have a strong personal opinion, but it feels awkward to at this stage make changes that could open up a privacy discussion we've once closed unless we have very good reasons to do so. I hope for more input (also from others) here.

stefhak commented 8 years ago

I just did a small test with Chrome Canary, by using the developer console. It seems that Canary does generate new deviceIds if the tab is closed and a new tab on the same origin is opened.

alvestrand commented 8 years ago

So this is about finding a good name to use for "the scope over which persisting temporary identifiers makes sense"? I don't have a good sense for either what that scope is or what its name is.

What's the term "realm" tied to these days? @martinthomson I see you use it above.

alvestrand commented 8 years ago

Looking for input from Edge, since we know what Firefox and Chrome currently do.

ShijunS commented 8 years ago

Edge implementation follows the current description in the spec. We don't persist any deviceId across browsing sessions when there is no persisted/stored permission for the specific origin. I think that makes sense for mitigating risk of fingerprinting, so would like to have this part of the spec stay as is.

martinthomson commented 8 years ago

@ShijunS is that "Browsing Session" as colloquially understood (the time that the browser window is open), or as defined in whatever spec that I can't find specifies it (the time that the origin is a top-level browsing context in any tab*)?

Note that I don't want to prevent Edge from doing as it does, but I don't want to force everyone else to follow suit when (at least in my opinion), the cost is high and there is no improvement regarding fingerprinting risk or privacy by taking the second interpretation.

@alvestrand https://gist.github.com/dherman/7568885#realms seems to do a good job of formally defining realms.

ShijunS commented 8 years ago

Adding more details per @martinthomson's question - When there is no persisted/stored permission for the specific origin, we will create unique deviceId for each device when a webpage enumerate or access the devices for the first time. As long as the user doesn't navigate away, the deviceId will be kept the same for the same device. We discard the Id's at navigation, or when the tab / browser window is closed. If users (a) navigate back to the same origin, (b) open new tabs with the same origin, or (c) re-launch the browser to the same origin, we will provide another set of deviceId's. Hope this helps!

jan-ivar commented 8 years ago

This sounds like what my patch does FWIW (except I handle (b) slightly differently, in that I only drop pre-grant deviceIds when the last open tab of an origin closes.)

martinthomson commented 8 years ago

The question I was looking to answer was whether I could open tab A, get a deviceId, and then open tab B to the same origin and get the same deviceId. I'm not clear on whether that would work based on https://github.com/w3c/mediacapture-main/issues/359#issuecomment-218928754. Also, if I navigate away, then hit the back button, what then? Lots of other page state is maintained, would this be cleared? Would it then be inconsistent with the rest of the page state?

FWIW, it's fine if a browser decides to expire device identifiers on whatever timescale they deem appropriate. But while I think that the availability of that choice is important, I don't think that clearing in this specific fashion helps. Indeed, I would prefer if browsers did not clear like this, but I am happy to keep this a point of browser choice: we each make different trade-offs in this space.

stefhak commented 8 years ago

@ShijunS @jan-ivar it is not clear to me whether the part in the spec that says if any local devices have been attached to a live MediaStreamTrack in a page from this origin ... this identifier MUST be persisted across browsing sessions.is met by your respective implementations. I.e. in situations when there are no persisted/stored permissions, but the user has at least once allowed a page of this origin to use a camera or microphone.

ShijunS commented 8 years ago

@stefhak, in our implementation, when there is no stored permission, another tab of the same origin will always have to ask for user permission again, and will have to get its own deviceId.

jan-ivar commented 8 years ago

@stefhak Firefox meets this. Try it here https://jsfiddle.net/Lqo4paed/

When I test Chrome it appears to meet it as well. When I test Edge it does not. @ShijunS that seems like a bug, though perhaps orthogonal to this discussion.

ShijunS commented 8 years ago

@jan-ivar, FWIW, I'd expect you have a stored permission with Chrome, correct?

jan-ivar commented 8 years ago

@ShijunS no. I had earlier, but removed it under "Manage Camera Settings..." / "Camera Exceptions", by hitting the [x] next to https://fiddle.jshell.net:443.

ShijunS commented 8 years ago

@Jan-Ivar, from one perspective, I'm not sure if it is a critical user scenario if user will have to manually remove the stored permission. Meanwhile, if a user manually removed a stored permission, I'd expect the User Agent should take responsibility to mitigate the risk of fingerprinting.

Re Chrome implementation, it seems to me the deviceId is persisted, no matter whether the stored permission is removed or not, and no matter whether there is an active MediaStreamTrack or not. That seems true even when the browser is restarted. Maybe @alvestrand can comment on the intention of the Chrome implementation now and down the road.

jan-ivar commented 8 years ago

@ShijunS The user scenario (of not having permission persisted) is only marginal in Chrome, because Chrome implicitly persists permission. In Firefox and Edge, this user scenario is the most common.

As @stefhak mentions the spec is clear that deviceIds must be persisted to the origin after the first gUM grant now or in the past, not just the first persistent grant. Without this, deviceIds would be quite useless except in Chrome.

In other words, if a Firefox or Edge user grants their camera to site foo.com just once (even if they don't choose "Always Share" in Firefox) then foo.com should see persistent deviceIds from then on and ever (until the user clears cookies or hits forget site or something like that), because those users may never grant permission permanently, and still expect sites to remember their preferred camera from last time.

Looks like Edge has a bug here. Chrome matches Firefox here and is compliant.

This is a side-track though. Lets get back to pre-grant deviceIds and their lifespan.

stefhak commented 8 years ago

Thanks @jan-ivar, I was just going to make an input, but you did it before me (and formulated it much better than I could).

stefhak commented 8 years ago

I think we could have the same situation regarding privacy protection as we have in the spec now if we did two things

made the change Martin proposed, i.e. the user agent MAY generate a new deviceId for every new... rather than MUST and
added text saying the deviceId MUST NOT be persisted if the user does not accept cookies (for this origin)

I'm not sure what the point would be though.

alvestrand commented 8 years ago

Request to all participants: can someone write a test for this behavior?

stefhak commented 8 years ago

Could a test for this be something along the lines of?:

Two origins are used, X and Y.

Preamble

Clear all cookies for X and Y

Phase 1

Open one tab at X and one at Y, enumerateDevices for both, log results as X1 and Y1, and compare (make sure they have different Id's)

Phase 2

Close X-tab, open a new tab at X, enumerateDevices, log result as X2, and compare (make sure the Id's differ for X1 and Y1)

Phase 3

Do gUM at X-tab for one of the devices, have the user reject access

Phase 4

Close X-tab, open a new tab at X, enumerateDevices, log result as X3, and compare (make sure the Id's differ for X1, X2 and Y1)

Phase 5

Do gUM at X-tab for one of the devices, have the user accepct access (but not give persisted permission if that option is available)

Phase 6

Close X-tab, open a new tab at X, enumerateDevices, log result as X4, and compare (make sure X4 == X3)

Phase 7

Close X-tab, clear all cookies for X, open a new tab at X, enumerateDevices, log result as X5, and compare (make sure the Id's differ between X1, X2, X3 (==X4) and Y1)

martinthomson commented 8 years ago

Close X-tab, open a new tab at X, enumerateDevices, log result as X2, and compare (make sure the Id's differ for X1 and Y1)

I don't believe that this should be a requirement. X1 and X2 may be the same as far as I am concerned. Same for X3.

alvestrand commented 8 years ago

if the IDs don't differ between X1 and X2, how can you test that they were not persisted?

martinthomson commented 8 years ago

The event that I'm looking for in between the test is clearing cookies, or something like that.

To be clear here, I am looking to have state for an origin be consistent and coherent. That means that all stores of persistent data have the same bounds on their persistence. If I can set a cookie for a week then X1 should also persist for a week. Same goes for local storage, indexedDB, and all the myriad other things that we persist.

I also separately believe that we should be making this stuff more visible, but that's a browser UX issue.

stefhak commented 8 years ago

If we would change to the model where deviceIds have the same life cycle as cookies, we IMO would also have to add that deviceIds can never be persisted as long as cookies are not allowed for the origin (as discussed in https://github.com/w3c/mediacapture-main/issues/359#issuecomment-219725515). I can understand the desire to unify the treatment of different things that are persisted, but I'm not sure this is the time a place to do that.

alvestrand commented 8 years ago

The original text was introduced in #218 and #219. We need to have a resolution that is consistent with that.

martinthomson commented 8 years ago

I don't see how that follows since the point of this is to address an inadvertent mistake in #219. The meaning of "browsing session" wasn't well understood at that point.

BTW, we can model "cookies not allowed" as "cookes are cleared after every iteration of the event loop".

alvestrand commented 8 years ago

@martinthomson I don't see a definition of "browsing session" that lasts until the next time cookies are cleared anywhere. Can you reformulate Stefan's test plan to show how it would look in the scenario you're proposing?

martinthomson commented 8 years ago

I agree, "browsing session" isn't defined like that. "Browsing session" doesn't relate at all to when cookies are cleared.

Here's a shorter reformulation:

Two tabs to X and Y -> X1 and Y1 are different.
Cookies are cleared for X -> X2 is different again.

Note that under my model, there is no way to test that persistence is longer when gUM permission has been granted. Browsers might clear identifier persistence on a shorter timescale, but that would depend on browser-specific behaviour (just as running the same test with cookies blocked would be different).

stefhak commented 8 years ago

If we were to treat deviceIds as cookies, then I guess we must have a way for the end user to check if the UA has persisted deviceIds for the site in question (and must enable to user to clear that), in the same way as the user can check (and clear) what cookies are stored for a site.

alvestrand commented 8 years ago

@martinthomson the concern addressed in #218 (as I read it) was that X1 is an invisible cookie - it will persist without the client being able to detect it, unlike cookies, which leave some trace on the client's machine.

Other definitions of "browsing session" that could be used would be:

Tab lifetime (that's what Stefan's test is suggesting)
Browser lifetime (substitute "exit browser" for "close tab" in Stefan's test)

stefhak commented 8 years ago

@dontcallmedom do you have any input on definition of "browsing session"?

And would there be a problem regarding using the Browser lifetime in e.g. Firefox OS or Chrome OS (i.e. would the browser always be "live" in such systems)?

stefhak commented 8 years ago

I did some testing according to https://github.com/w3c/mediacapture-main/issues/359#issuecomment-220522156. The results are quite depressing, so I'm sure I make some mistake.

Chrome Canary (53.0.2749.0):

Phase 1: X1 != Y1 - Pass. However, site X had once in the past been granted access, and now displays labels (it should not)

Phase 2: X2 == X1 - Fail

Phase 3: Site X hade once been granted access. In spite of clearing cookies (and even after restarting the browser) site X has permanent access, no way for user to Reject. Anyway, since X2 == X1 this would probably not matter much.

No point in further testing.

Firefox Nightly (49.0a1 (2016-05-25))

Phase 1: X1 != Y1 - Pass.

Phase 2: X2 == X1 - Fail. Even quitting and re-starting Firefox gives X2 == X1, clearing all cookies, quitting and restarting does not help either.

No point in further testing.

alvestrand commented 8 years ago

@aboba will try out the same thing for Edge (or supply stefhak with a VM).

martinthomson commented 8 years ago

@alvestrand:

@martinthomson the concern addressed in #218 (as I read it) was that X1 is an invisible cookie - it will persist without the client being able to detect it, unlike cookies, which leave some trace on the client's machine.

Isn't that a browser problem though? We could show this as a cookie in the cookie list and allow users to delete the value that has been persisted, or even block it. We could, but we don't.

martinthomson commented 8 years ago

Oh, and I meant to add. The same applies to DOM localStorage, and indexedDB (these have some exposure in Firefox, and I suspect that the same is true in Chrome, but you have to be willing to dig).

jan-ivar commented 8 years ago

@stefhak Firefox wfm when I close the last content-tab or quit-and-restart. How did you clear cookies? To isolate, add a

Phase 0: Open one tab at X and one at Y, enumerateDevices for both, log results as X0 and Y0, clear cookies, and enumerate again (X1 and Y1) and make sure X1 != X0 and Y1 != Y0.

Clear browsing data... (Chrome) [1] and clear your recent history (Firefox) [2] + cookies wfm. The individual-cookie-views don't work, even if I clear all, because they're not visible cookies.

I get: Phase 0: Chrome pass, Firefox pass Phase 1: Chrome pass, Firefox pass Phase 2: Chrome fail, Firefox fail (Firefox w/my patch pass) Phase 2 (quit+restart): Chrome fail, Firefox pass

The labels are a red herring, because persistent permissions are separate from cookies in both browsers.

[1] chrome://settings/clearBrowserData [2] about:preferences#privacy

jan-ivar commented 8 years ago

I just noticed you had Phase 0 as Phase 7...

alvestrand commented 8 years ago

@martinthomson they're all browser problems. If we need a public agreement that browsers behave consistently, it becomes a spec problem. That's what a spec is, isn't it?

stefhak commented 8 years ago

@jan-ivar I think I used the methods you describe to clear persisted stuff for Chrome. On Chrome Canary I did chrome://settings/clearBrowserData "from the beginning of time" with the "Cookies and other site and plugin data" box checked. Would the fact that Chrome Stable is running be a problem? For Nightly I did "remove individual cookies" followed by "remove all". If I go the "clear recent history route", what boxes must be ticked? "Cookies" only?

martinthomson commented 8 years ago

@alvestrand, we have an agreement that UX is not the domain of specs. We also allow browsers sovereignty over choices regarding permissions and how they are managed. I don't see why this requires specification to the degree proposed.

stefhak commented 8 years ago

After clearing the way @jan-ivar describes for FF Nightly I get: Phase 1: pass Phase 2: pass (Phase 3: not a test) Phase 4: fail (but pass if Nightly is quit and re-started) (Phase 5: not a test) Phase 6: pass Phase 7: pass

stefhak commented 8 years ago

@ShijunS tested on Edge with the following result:

Edge (Win10 November Update)

Phase-1: X1 != Y1. Pass.

Phase-2: X2 != X1 and Y1. Pass.

Phase-3: done

Phase-4: X3 != X2, X1, and Y1. Pass.

Phase-5: done

Phase-6: X4 != X3. Fail. Note: Based on current spec, I believe X4 != X3 is the expected behavior in this test. There is no active mediastreamTrack on another X-tab. And, as an implementation precaution or limitation, Edge currently doesn’t allow sharing devices across tabs. So, if there is an x-tab using the device, the second X-tab won’t be able to access the specific device anyway.

Phase-7: X5 != X1, X2, X3, X4, and Y1. Pass

stefhak commented 8 years ago

@ShijunS a comment to Phase-6 X4 != X3: The spec currently says

"All enumerable devices have an identifier that MUST be unique to the page's origin. As long as no local device has been attached to a live MediaStreamTrack in a page from this origin, and no stored permission to access local devices has been granted to this origin, then this identifier MUST NOT be reusable after the end of the current browsing session.

However, if any local devices have been attached to a live MediaStreamTrack in a page from this origin, or stored permission to access local devices has been granted to this origin, then this identifier MUST be persisted across browsing sessions."

I read the last part as X4 == X3 is the right behavior in the test (where X had access to a local device, but the user closed the tab but later opened a new one at X).

stefhak commented 8 years ago

New test with Chrome 53.0.2754.0 canary (64-bit) gives: Phase 1: Pass Phase 2: Fail, clearing "Cookies and other site and plugin data" => Pass (X2 !=X1) (Phase 3 not a test) Phase 4: Fail (need to clear "Cookies and other site and plugin data" to get Pass) (Phase 5: not a test) Phase 6: Pass Phase 7: Pass

stefhak commented 8 years ago

Summing up the recent tests it seems that (as I read the spec) the following is the situation:

FF Nightly spec compliant except for clearing deviceId on browser (rather than browsing) session
Edge spec compliant except for not persisting deviceId if a site has once gotten access to input device
Chrome Canary spec compliant except for clearing deviceId only on clearing cookies and site data (rather than on browsing session boundaries)

ShijunS commented 8 years ago

@stefhak, regarding the Comment, there seems to be different understandings of the spec. Let me share my view and hopefully we can clarify together.

The spec defined two conditions in this case: a. if "any local devices have been attached to a live MediaStreamTrack in a page from this origin", or b. if "stored permission to access local devices has been granted to this origin"

Here are two options to read condition-a

A local device "has been" attached to a live MediaStreamTrack, and is still in use by a page from this origin.
A local device "had been" used in the past at least once by a page from this origin.

Option-1 is independent from condition-b, so it makes sense to me for the spec to define this as condition-a or condition-b.

Option-2 would have to need more clarifications, for examples...

This seems a superset of condition-b at least in typical user scenarios, so makes condition-b redundant.
It is not clear when this condition can be revoked, for example, when user clears cookies or histories, etc.
It is not clear how users would know that they are still providing the fingerprints even after they have purposely revoked a stored permission to use their local devices by the specific origin.

w3c / mediacapture-main