mikewest / privacy-budget

Other
204 stars 25 forks source link

Asymmetrically unfair to competitors of Google #2

Closed gitcnd closed 4 years ago

gitcnd commented 5 years ago

Some companies, like google, enjoy the luxury of having their analytics scripts run within the context of billions of global web pages. In addition, they run search engines; they know what you looked for and how, the know what you click on, and when you get there, they see what you do. Their income is based on marketing this intelligence to their advertising customers.

Smaller analytics businesses do not enjoy such power. Worse - the larger companies take increasingly unfair steps to prevent the search intelligence they collect (e.g. search terms), from being available to their competitors; usually in the name of "privacy".

This privacy-budget further erodes the business of small ad intelligence and anti-fraud businesses, while having little to no impact on large players.

The best solution I can think of, is to compensate for this loss. There needs to be a companion standard to this one, which returns back to those whose businesses have been negatively impacted, some free and useful insight to restore to them the position they were in beforehand; for example - an API made available by google, allowing those newly disadvantaged companies to gain privileged access to the analytics data that google has collected relating to the visitor currently on the 3rd party site. For example (this is not an exhaustive suggestion) - if the visitor arrived at an online store, having just included the word "torrent" in their search - the store should have the right to know what google knows about that search that lead them to the store, so the store can be prepared for the vastly increased chance that this visitor is bad (unlikely to purchase, or, likely to be fraud if they do).

michaelkleber commented 5 years ago

Hi Chris, sorry for the delay in responding.

This Explainer is part of Chrome's broader "Privacy Sandbox" effort, so please take a look at its Privacy Model explainer for some additional context.

The ideas in the Privacy Budget are about preventing fingerprinting techniques, which might otherwise let companies tie together a person's browsing behavior across many sites.

If you're objecting that you want to be able to tie together one person's activity on multiple sites, then feel free to open an issue on the Privacy Model itself. We expect lots of discussion over whether that is the right goal.

For the "torrent" example you give, I don't think the Privacy Budget idea has any impact one way or the other. For an API that's about helping fraud prevention even with less cross-site data available, see the Trust Token API explainer.

gitcnd commented 5 years ago

I note you completely ignored the topic... kinda strange... does your employer ban it?

michaelkleber commented 5 years ago

Hi Chris,

Can you try to explain how you think this API would be hurting you?

I'm not trying to ignore your question, I just don't understand what you object to here.

This work is about preventing the use of fingerprinting to track users across different sites. If you can explain why you feel the need to track users across sites, then we can talk about what alternative techniques there might be for your goals. That's what the whole Privacy Sandbox effort is about.

gitcnd commented 5 years ago

Hi Michael,

I was not specifically talking about me - I'm talking about companies who compete against Google in the online advertising and services marketplace who make use of visitor intelligence to support their advertising and/or anti-fraud operations.

Honestly, I can't understand how or why you do not understand my report. If I was not entirely clear, I am accusing this effort of yours as "anti trust" behavior by google against its competitors, and I suggested a mitigation you could put in place to make your efforts legal and honest.

Anti-trust is when a large corporation like Google, does something that reduces competition in the marketplace. Anti-Trust is illegal.

Google has an extremely robust analytics infrastructure that tracks users across different sites. Do a "view source" on this very page right now, and you will see Google Analytics code (UA-3769691-2) embedded in this page. Google has it's own tracking code embedded into approximately 52.9% of ALL WEB PAGES on the entire internet. It does not need to use fingerprinting, because it has such a crushing dominance in alternative mechanisms. In addition, Google operates search engines, and has the luxury of being able to track user search terms as well. Also in addition, Google operates email services, and has the luxury of being able to run advertising scripts against incoming email of it's customers (without the permission of the sender of the email), and use the intelligence gained from this to target advertisements to the reader of the email. In my experience, when I email my customers who use google services, your advertising system knows who I am and what I am writing about, and it places advertisements for my competitors on the screen of my customers as they read my emails. I asked Google not to do this, and they refused. Because I rely on Google searches for my income, I can not escalate my complaints, out of fear that Google will destroy my income if they decided to retaliate against me.

To be clear: The tracking by google of internet users is staggeringly effective, and the power of your company makes it impossible for victims of this infrastructure to complain without suffering.

COMPETITORS of google do not enjoy the pleasure of having their own tracking and analytics scripts running inside the context of 52.9% of web pages like google does, so to compete against google, they are forced to rely on alternative mechanisms. Many years ago, one of these mechanisms - "browser referrer headers" which global search analytics businesses relied upon, was "disabled" by Google. Specifically, google began (and still do to this day) intercepting clicks on search results, and putting in place mechanisms that deliberately robbed target web sites of the search intelligence formerly contained within those HTTP headers. This effectively destroyed all global search analytics business who were not search engines themselves, and now Google is the dominant global supplier of this business intelligence. Google claimed that the prevention of search-term release to target sites was for "Privacy", however, they still to this day sell their own products and services using this intelligence, so they decided that the border where "Privacy" ends happens to co-incidentally be where their own products and services can profit from selling access to that data.

This new effort of yours is the latest in the line of "Privacy" initiatives that has drawn a line in the sand and decided that everything Google does that tracks users is perfectly fine, but, things that everyone else does need to be eradicated all all cost - where "all cost" is of course cost to google competitors, and no cost to Google itself.

If what I've said is not already abundantly clear - let me explain it another way:

Google is using it's dominant power and influence with internet standards and its ownership of browser technology, to take clear and specific steps that prevent OTHER COMPANIES from tracking internet users, while at the same time NOT PREVENTING their own infrastructure from performing even more invasive and disrespectful (e.g. advertising my competitors directly to my customers) tracking of its own.

Perhaps you are simply a privacy-aware programmer who has chosen to "look away" from what the analytics division of your own company is engaged in, and you've been distracted by the shiny coolness of the "anti hacker arms race" and the compelling excitement of trying to design mechanisms to block that stuff. Might I respectfully suggest that you take a step back and look at the big picture.

Finally - since I'm certain there's no chance whatsoever that my words will prevent your progress, my suggestion for how to move forward without engaging in further illegal anti-trust behavior was to provide mechanisms for all those disadvantaged companies who you are destroying, to have free access to relevant tracking intelligence that your company is gathering. To illustrate my point, I provided an EXAMPLE (this is not an exhaustive list - it's one single example, to help you understand what I'm trying to say). Pretend an anti-fraud company exists, and is using fingerprinting to detect the return of identified "bad actors" who've been arriving in the past. As you almost certainly know, 90%+ of all malicious activity is the same few people, repeating their crimes. Fingerprinting is an extremely effective way to detect this, and to tarpit or poison these pests. So now, in the post-fingerprinting world, that company is now going to need a new way to continue to operate, now that you have destroyed their livelihood. One possible (and my example) way to do that, would be to share search terms with that company: it might not be able to fingerprint pests anymore, but, it now can use AI and keywords and other mechanisms based on the user search terms to understand when visitors are likely to be a pest (e.g. their search included the word "torrent"). You're in Google, so you can probably think of loads of more way to help those business you're about to kill, to continue to survive. That is your legal obligation under anti-trust law after all.

Sincere kind regards, Chris.

michaelkleber commented 5 years ago

Hi Chris,

Let me be extremely clear here, in case there is any doubt: Google really will need to play by the same rules as any other site.

This explainer is talking about how to prevent fingerprinting, but the overall Privacy Sandbox model is about all forms of recognizing the same user across different sites. So even a big player that knows a lot about me won't be able to use that information when I go to a new site, except in a few very limited and browser-controlled ways.

Since you specifically mention Google Analytics, I looked at their support page that talks about cross-domain tracking. It's very clear that GA uses first-party cookies to recognize users, which generally means that it only knows about a user's activity within a single domain. They do have a way to tie together two domains if you own both of them — that's basically the use case that Chrome wants to support with some standardized mechanism along the lines of First Party Sets. But none of this would let Google use what it knows about you (from search or email or whatever) when you're on a non-Google-owned site.

I understand that browser privacy changes have cut into information that you used to be able to get about people's activities from before they got to your (or your clients') sites. And we are very interested in offering ways for you to accomplish your goals, especially goals like fraud prevention, even in this new more-private web — that's what Trust Tokens are about, for example. But we need to find ways to achieve those goals that do not involve recognizing individual users across different sites.

gitcnd commented 5 years ago

No Michael - Google is gigantic - it needs to play by a different set of rules to "any other site" - because it ALSO needs to comply with anti-trust laws in order not to diminish the market of all the smaller players that Google's market dominance allows to be crushed.

Google does not need cookies as I am wearing out my patience trying to explain to you: your scripts are embedded in more than half of all web pages. Your servers directly receive visitor IP addresses, referrer and analytics site tracking tag data, fingerprintable TCP O/S data, user agent, language, and other static headers, and a swathe of javascript-collected analytics data as pretty much the entire global population goes about their daily internet browsing.

Point me to the exact wording in your "Privacy Sandbox Model" where your proposal blocks the ability for your employer to track me using any and all of the aforementioned techniques (I am particularly looking forward to learn how you propose to hide my static IP from google services!!). When you cannot, perhaps you might like to explain why you're engaged in "wiping out" the fragments of techniques that everyone else has to use, while doing nothing about the vastly more intrusive techniques that google has the unique privilege to exploit - and (assuming you do) why you believe your destruction of those tracking technologies is not an anti-competitive behavior by google, and violation of anti-trust laws.

Privacy is for everyone. If you want to offer it, you've got to give up violating it yourself first mate - particularly when your employers' existing violation of my privacy is so vastly more egregious than the trivial fingerprinting minority you're working on.

And, for the record - I do not believe that "privacy" is a right, any more than the TSA believes you should be allowed to carry whatever you want onto an airline without being checked. There is a point at which you can go too far, where the so-called "benefit" to an insignificant minority of privacy-warriors is so vastly outweighed by the damage that's done through the facilitation of fraud, abuse, and crime newly made possible by those same techniques.

Someone needs to work out the balance. I know the idea of "do nothing because it causes less harm" is hard for you to swallow (especially if "doing something" happens to be your current job!) - but do the math. Who do you think you making this for, and what proportion of them are disgusting criminals? How truly certain are you that your employers intent is genuine? Do they really care about privacy, or is this a deliberate attack against Google competitors?

Either way - it's a moot point. Taking this away is clear anti-trust behavior - so you'll need to work out how to deal with that before you move forward.

ghostwords commented 5 years ago

It's very clear that GA uses first-party cookies to recognize users, which generally means that it only knows about a user's activity within a single domain.

Google Analytics uses pixels to send visitor ID-linked events back to Google. The kind of cookie here is irrelevant to Google's ability to link Google Analytics data together.

michaelkleber commented 5 years ago

I'm not a lawyer, so if your goal is to talk about anti-trust regulation, you'll probably have more fun finding a different venue.

Google does not need cookies as I am wearing out my patience trying to explain to you: your scripts are embedded in more than half of all web pages. Your servers directly receive visitor IP addresses, referrer and analytics site tracking tag data, fingerprintable TCP O/S data, user agent, language, and other static headers, and a swathe of javascript-collected analytics data as pretty much the entire global population goes about their daily internet browsing.

Here is the relevant quote, from this very explainer: "Some fingerprinting surfaces, such as UA string, IP addresses, and accept-language header, are passive in that they are available to every website whether they ask for them or not. For the purposes of privacy budget accounting, we will have to assume that each of these are being consumed by the site and therefore eat into the budget."

So yes, all the things you are talking about really are in scope. This explainer and its larger context are about how to make the web work without cross-site identity joining. This involves designing ways to stop it. Even stopping Google from doing it.

I've said this a lot of times, and you keep not believing me, so I don't know how to have a more productive conversation.

michaelkleber commented 5 years ago

@ghostwords "Google Analytics uses pixels to send visitor ID-linked events back to Google" — no really, the "visitor ID" is an ID that only refers to you while you're visiting this particular domain! When you are on a different domain, you have a different visitor ID.

That is exactly the privacy model that we're discussing with here: "Identity is partitioned by First Party Site"!

gitcnd commented 5 years ago

If you work for google, you were instructed not to break the law. You don't need to be a lawyer to know that anti-trust is illegal, and if you don't feel skilled enough to make a call about whether or not your work is a crime, that's why Google has lawyers who you can talk with to find out for sure. Law is not hard to understand. Breaking things for competitors that serve Google an advantage is anti-trust, whether you like that or not. It's not an insurmountable problem, you just have to make sure that you "un do" the harm you did somehow. Either provide an alternative, or deliberately hurt yourself the same amount so it's not anti-trust.

I observe you did not, as I specifically asked, point the exact wording where you hide my IP address from google. Of course, since you don't.

Please do not reply to any more of this thread until you supply the wording for how you hide my IP address from google. Serverside fingerprinting is even worse than client-side, so until such time as you solve that, the topic of this thread remains 100% accurate.

bslassey commented 4 years ago

https://github.com/bslassey/ip-blindness proposes a way to hide IP addresses from HTTP applications.

gitcnd commented 4 years ago

Interesting idea - not one that could ever work though: the contemporary ubiquitous requirement for post-intrusion forensics guarantees that any alleged privacy would only be temporary and would be reversible after-the-fact.