Open ehsan opened 5 years ago
I agree that the web should support ways to accomplish all of these! Regarding logins, for example, something like https://github.com/mikewest/http-state-tokens by @mikewest is a path forward; there are a very large number of web payments efforts in progress, with a variety of privacy implications; and so on.
So I am particularly worried about the ads-funded-web-sites use case because it seems under-served relative to the other cross-site-data uses I can think of.
The other one worth highlighting is fraud and abuse prevention. This is of course important in the web economy, but lots of non-ads people care about this topic as well. And indeed the Privacy Pass work that I link to comes from CloudFlare's initiative to make CAPTCHA-based abuse prevention compatible with Tor's privacy needs — but the underlying crypto is broadly useful.
Let me also say that the various "It may be OK to..." APIs listed in this doc are examples, but surely is not a complete list. If you think of other core use cases that are compatible with this privacy model and would need new APIs, then please do propose additions.
So I am particularly worried about the ads-funded-web-sites use case because it seems under-served relative to the other cross-site-data uses I can think of.
This misses the point. This proposal claims to be a "Privacy Model for the Web". Programmatically serving ads based on users data is not necessary component of "the Web"...at least not a component that users of the web would consider a necessary part of their experience. A publisher's ability to monetize content is a necessary component, and programmatic advertising is one way to achieve that (and also happens to have been abused in very privacy-invasive ways). Framing the proposal around the necessity of programmatic advertising influences the requirements in perverse ways. For example:
"(It seems like an open question whether or not this identity should be double-keyed, i.e. sharded by 3p as well: consistency across 3p's would make the identity easier to use, but that must be balanced against the risk to the central threat of joining across per-site identities.)"
Is consistency of a user's first-party identity across third parties helpful for federated login use case? Or the payments use case? It mainly seems helpful for an advertising use case, but is presented with the assertion that it's "easier to use" because this model is framed around the advertising use case. Is the delegation of access to identity to a third party something that's important for use cases other than programmatic advertising?
We're agreed that "a publisher's ability to monetize content is a necessary component". I don't personally know of a way other than programmatic advertising to support a web anything like what we have today, which is why I want to do the hard work of making that ecosystem compatible with a better privacy model. But I certainly don't want to force any web site to support themselves in any particular way — if there are other tools sites need to flourish, then I want to make a web that supports those too.
I readily acknowledge that I know more about ads than about many other 3p use cases, and I'm sure that came out in my writing. I'm surprised by your particular choices of examples, though:
Consistency of per-first-party identity is useful any time two 3p's want to be able to talk to one another about a visitor to a site. If the payment provider 3p and the abuse prevention 3p want to share information about their fraud-likelihood estimates of a visitor, then of course it's easier if they can refer to that visitor by the same name.
Delegation of access to identity seems core to measurement needs ("How many different people visited my site last week?"), not to advertising.
But honestly, browser primitives for delegation of access to identity is more about being sanitary than about any particular use case, Ads-related or otherwise. I think that 1p's should have control over what the 3p's on their pages can and can't do — and of course the web is indeed moving in that direction with iframe feature policies. But lots of 3p's in today's web end up running code in a 1p context. I want to avoid 3p's clamouring to get into the 1p context simply because it allows implicit identity access.
I'd like to keep focusing on the main topic of this issue if that's OK.
The problem that I have with the current special treatment of advertisements (and really targeted advertisements not just advertisements) in this proposal is that it is put on a high pedestal above and beyond all concerns and it is informing issues like the requirements for the identity on the web, the privacy model for the web, etc.
Whereas logically it should be the other way around, we should establish for example what identity principles are in the interest of users and match user expectations, then figure out what those principles would mean in terms of technical modifications to the Web platform, and then look at how those modifications would impact existing use cases. Of those existing use cases targeted advertisements is just one. Where we find mismatches between them we need to look into what solutions potentially exist and which ones match with the principles we agreed on initially. It may very well turn out that some of the existing building blocks of targeted advertisements would not work at all when we modify the Web platform to match the user's expectations. If we come to such ultimate conclusions, then it is implementing what matches user expectations is what we would ultimately need to pick.
It makes it very difficult to collaborate on the technical parts of this proposal if we don't agree on the very basic principles of how designing for privacy works.
Hooray, this is great:
we should establish for example what identity principles are in the interest of users and match user expectations, then figure out what those principles would mean in terms of technical modifications to the Web platform, and then look at how those modifications would impact existing use cases.
I agree completely!
The model in my doc comes from considering the underlying identity principle "Identity is partitioned by First Party Site", or if you'd like "Nobody off the browser can reconstruct the user's multi-site browsing history." Everything else is an exploration of what sort of web might be built on that foundation. I was trying to convey that with the title "A Potential Privacy Model for the Web: Sharding Web Identity".
The Mozilla and WebKit privacy policies also have an underlying identity principle, and we should develop out answers to what they imply for the end state of the web. And I'm sure there are other contender principles as well.
There are a lot of ads-supported web sites today, and I do think we need to understand the implications for viability of such sites when we analyze the consequences of each of the choices. That is essentially my job right now.
we should establish for example what identity principles are in the interest of users and match user expectations, then figure out what those principles would mean in terms of technical modifications to the Web platform, and then look at how those modifications would impact existing use cases.
I agree completely!
The model in my doc comes from considering the underlying identity principle "Identity is partitioned by First Party Site", or if you'd like "Nobody off the browser can reconstruct the user's multi-site browsing history." Everything else is an exploration of what sort of web might be built on that foundation. I was trying to convey that with the title "A Potential Privacy Model for the Web: Sharding Web Identity".
So are you saying that the identity principles that you are basing your work on is only that "nobody off the browser can reconstruct the user's multi-site browsing history"? If this is indeed the only principle you have been using then I guess I can see how for example some of what's stated in this section of the doc matches that.
FWIW the text there says things like "This recognizes that composability is central to the Web — for example, it is unreasonable to expect 1p's to implement their own analytics or ad servers" or "A 3p who builds a per-1p user profile should be in the same position as any other company with whom the 1p chooses to share its user data", so you can probably see how a reader who doesn't share your mental context can reach a conclusion that this document has been written this way in order to make the existing advertising technology used on the web work.
However I disagree the notion that this principle is one that sufficiently describes user expectations. For example, I do not see how this model matches the user expectations that research such as [1] or [2] has demonstrated (note: these are only two examples of such research, just to help broaden the spectrum of thought/discussion here -- not meaning that this is the boundaries of user expectations in any way).
[1] https://www.niemanlab.org/2018/04/jason-kint-here-are-5-ways-facebook-violates-consumer-expectations-to-maximize-its-profits [2] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3306006
Please note that in order to help understand the ideas outlined in this privacy model I'm using more information than only what is presented here in the text of the proposal (e.g. the claim in the blog.google post around "large scale blocking of cookies undermine people’s privacy" too.)
The Mozilla and WebKit privacy policies also have an underlying identity principle, and we should develop out answers to what they imply for the end state of the web. And I'm sure there are other contender principles as well.
There are a lot of ads-supported web sites today, and I do think we need to understand the implications for viability of such sites when we analyze the consequences of each of the choices. That is essentially my job right now.
I'm all for assessing the implications of the viability of advertisement supported sites in a world where all browsers come with privacy protections by default, and I would love to see something that actually compares the current world against that world, but so far I don't see that in the "52% study" here.
But while that is a very interesting topic to discuss, my main topic of interest is how to modify the web platform in order to make it match the expectations that users have around their privacy. Of course the impact of those modifications should be studied on ad-supported websites, but that is all but one item from the long list of potential impacts on various use cases that may be negatively impacted from any proposed modification.
I remain unconvinced that lifting advertisements to such a high pedestal for it to prescribe the parameters of an acceptable solution is in the interests of the users of the Web at large and of Firefox in particular.
So are you saying that the identity principles that you are basing your work on is only that "nobody off the browser can reconstruct the user's multi-site browsing history"?
Ah, I didn't intend to say that. In this model, we start with the invariant that you can't link identity across first parties (without some relevant user action). Then there remain lots of further decisions to make, many of which involve trade-offs between different things that people who use the web want and expect.
Please note that in order to help understand the ideas outlined in this privacy model I'm using more information than only what is presented here in the text of the proposal (e.g. the claim in the blog.google post around "large scale blocking of cookies undermine people’s privacy" too.)
Yes, sorry that I didn't go into any detail about this. I said "Browsers impose limits (on cookies, fingerprinting, and other state) with the goal of preventing the joinability of these per-1p identities." Of course there is a huge amount of work bound up in what browsers will need to do to avoid replacement with covert and opaque tracking techniques.
I remain unconvinced that lifting advertisements to such a high pedestal for it to prescribe the parameters of an acceptable solution is in the interests of the users of the Web at large and of Firefox in particular.
I feel like the "can't link identity across first parties" basic principle stands on its own, and that the advertising use cases come into play in the trade-offs section. The prominence I've given them is of course influenced by the "52% study", exactly because I think such a huge revenue hit to most of the world's web sites would have a dramatic effect on things that people who use the web want and expect.
(FYI I've added a link to the follow-up whitepaper with more details about that experiment.)
I feel like the "can't link identity across first parties" basic principle stands on its own, and that the advertising use cases come into play in the trade-offs section. The prominence I've given them is of course influenced by the "52% study", exactly because I think such a huge revenue hit to most of the world's web sites would have a dramatic effect on things that people who use the web want and expect.
If monetary payouts going to publishers is the way we're going to prioritize which use cases that are impacted negatively by publishers should be prioritized first and foremost, besides advertising, may I ask what other use cases have you studied to determine this similar impact that finally lead to advertising being chosen as the winning one?
Why is advertising receiving a special status here? There are many other use cases in the web that will (presumably) be impacted with the implementation of this rough set of ideas. The proposal actually mentions some in passing only, but it gives a special status to advertising. It seems that the focus of this work is on how to keep the existing behavioral based targeted advertising systems working with the adoption of the Privacy Model.
Why aren't other use cases such as logins, payments, shopping, reading content, communication with people, etc. etc. not receiving any special treatment similar to advertising? Wouldn't it be better to ensure none of these use cases are negatively impacted and not just advertising?