patcg-individual-drafts / topics

The Topics API
https://patcg-individual-drafts.github.io/topics/
Other
617 stars 229 forks source link

Privacy Risk: Aggregation and Household Tracking #15

Closed silver5753 closed 2 years ago

silver5753 commented 2 years ago

It sounds to me like this will make it easier to track households, and through that: individuals.

I'll give an example: Companies A, B, and C have access to the topics API on different sites (along with passive "PII" like IP address, browser version, etc.). They then send this data to the open auction (RTB stream). At this point, companies M and N bidding on the auction can aggregate the topics from A, B, and C. They also group it with other topics from the user's home IP address (for example, from another device, roommate's device, etc.). Side note: all this info together should make it relatively "easy" to keep tracking the household through daily IP changes: multiple devices + browsers + top topics.

Now, another party, Company X wants to know whether they should bid on an ad placement from a specific IP. They ask companies M and N what that IP's top topics are, for a price. The result is that they get a list of the top 10+ topics of all users at that IP. For a larger price they can filter by device info. I'll make that last point clearer: any advertiser/third-party can have a pretty good understanding of any residential address' topics ("browsing history"). Not just the top 5.

Also, if a household's aggregate data is "unique enough" from other households. Even a user in the household who opts out of tracking can be classified/targeted based on passive data.

Picture1

jkarlin commented 2 years ago

I believe you're pointing at the fact that there are other pieces of fingerprintable information in the ecosystem today that can be used to track users, and the Topics API doesn't sit in isolation from them. IP Address and UA being primary examples.

I agree, and that's why it's important that we reduce the overall fingerprinting surface of the browser in parallel as we deprecate third party cookies. Examples of work in this area include UA reduction, Safari's Privacy Relay, and Chrome's IP Blindness proposal.

silver5753 commented 2 years ago

That's correct. I was referring to the fact that no proposal to replace cookies will be isolated from the rest of the ecosystem. My specific point was that any "slightly unique" additional information, such as Topics API, might isolate a single browser, but will not be enough to stop fingerprinting an entire household.

The problem with sending explainable "topics" is that aggregators can sell the information to any buyer at any time (real-time bidding advertisers, law enforcement, political parties, etc.). Identifying what a household is interested in becomes a tabular data query. In this regard, the FLoC proposal was more privacy-friendly in that it at least obscured the information to human eyes and made it harder for non-industry members to ingest. Topics API might even be more concerning than keeping third-party cookies: today only massive aggregators can track user preferences and it's expensive, whereas with Topics API any medium fish will be able to track users (with minimal additional fingerprinting) and then sell the info.

Introducing new information (Topics API) while deprecating 3rd party cookies is not the same as simply deprecating 3rd party cookies. UA reduction has made very little headway (example: my phone still sends the version and brand: https://wtfismyip.com/headers). Safari blocks third-party cookies altogether, as do most other browsers. I would be happy to read about the IP Blindness proposal, what I found online was some intro page from a year ago with no updates.

jkarlin commented 2 years ago

That's correct. I was referring to the fact that no proposal to replace cookies will be isolated from the rest of the ecosystem. My specific point was that any "slightly unique" additional information, such as Topics API, might isolate a single browser, but will not be enough to stop fingerprinting an entire household.

Not without further work on reducing IP address information, I agree.

In this regard, the FLoC proposal was more privacy-friendly in that it at least obscured the information to human eyes and made it harder for non-industry members to ingest.

The same issue existed with FLoC. Aggregators would do the work of translating cohorts to human-readable meanings and sell that in your scenario.

Introducing new information (Topics API) while deprecating 3rd party cookies is not the same as simply deprecating 3rd party cookies.

While the API (and other APIs that the Privacy Sandbox is considering) releases some amount of user interests to callers, we feel that it is a step forward in user privacy compared to where we are today, while still allowing the open web to monetize and thrive.

silver5753 commented 2 years ago

The same issue existed with FLoC. Aggregators would do the work of translating cohorts to human-readable meanings and sell that in your scenario.

Only FLoC aggregators with the resources to do so could have done this. By relaying the information ungrouped and unobscured Topics API reduces the cost of, and therefore increases the availability of, user data to any buyer (politicians, governments, etc.).

While the API (and other APIs that the Privacy Sandbox is considering) releases some amount of user interests to callers, we feel that it is a step forward in user privacy compared to where we are today, while still allowing the open web to monetize and thrive.

Contextual advertising is a wonderful thing: advertise based on the page being browsed. The web can still be monetized without infringing on user privacy and personal preferences.

jkarlin commented 2 years ago

Contextual advertising is a wonderful thing: advertise based on the page being browsed. The web can still be monetized without infringing on user privacy and personal preferences.

Acknowledged, and thanks for the perspective. Given that this API is trying to provide for interest-based advertising however, I'm not sure how much more there is to discuss here.

silver5753 commented 2 years ago

Feel free to factor in the household tracking I highlighted as the issue with the proposed implementation. More data = easier fingerprinting