patcg-individual-drafts / topics

The Topics API
https://patcg-individual-drafts.github.io/topics/
Other
607 stars 203 forks source link

Concerns about the impact of Topics on the broader ad tech ecosystem #84

Open AramZS opened 2 years ago

AramZS commented 2 years ago

Hello folks, I reviewed the current iteration of Topics API and wanted to make sure my feedback was passed along on this repo.

Briefly, I am concerned that:

Some initial suggestions for consideration:

I want to make two things clear: 1. I am not particularly concerned with discussing the privacy promise in this particular context. I think Topics provides a significant privacy improvement in multiple ways over the existing system and I do think it is effective as a potential off ramp in that way. I applaud the work done in this regard. 2. I have no doubt that Topics is accurate. I think that the likelihood that Topics can give accurate indications of user interest at the level of, if not better than, the current 3rd party cookie systems is high.

A longer article covering all the issues that concern me is here: https://aramzs.github.io/web-standards/2022/08/04/topics-api-review.html

I'm glad to answer any questions!

dmarti commented 2 years ago

Topics API is also likely to have impacts outside the world of adtech, by incentivizing a variety of behaviors that can create a worse web experience for users.

Topics API leaks valuable audience data from one site to another. As the proposal currently stands, any site that a user can be tricked into visiting, or that their device can be manipulated into visiting for them, is in a position to collect ad revenue based on a Topic that the user brings with them. Topics API gives people an additional incentive to create deceptive sites and drive traffic to them by deceptive or harmful means. (This is similar to some schemes that are common with today's third-party cookie situation.)

It may be possible to ameliorate the problem by gatekeeping which sites Topics API may be called on, so that sites that users choose to visit could use it among themselves, but sites that users distrust and bounce from would not be rewarded. Some possibilities:

There are several promising efforts in progress to try to reduce the payoffs available to deceptive or otherwise harmful sites. Without significant improvements, though, Topics API is likely to reward any activity that can somehow get a user to visit a site: spam, adware, black hat SEO, malware, and more.

More here: https://www.adexchanger.com/the-sell-sider/googles-topics-api-picks-on-smaller-publishers/

jkarlin commented 2 years ago

Hi Aram, thanks for your considered feedback, especially since it comes with suggestions! Some thoughts:

There should be ways to open Topics transmission to URLs without requiring a DOM entity on-page as a script, or iframe, or some other thing that can impact performance meaningfully.

We’re working on this, and have been from the start #1. See PR #81 for the current proposal for providing Topics with headers.

Sites should be capable of blocklisting specific Topics from appearing to Topics-participating-systems on their site without blocking the whole Topics system.

Interesting. What is the use case for this that you have in mind?

Events in the real world may turn what are otherwise relatively innocuous topics into dangerous vectors for manipulation and the system should provide a mechanism by which it can be turned off universally around particular events.

As in, it should be possible to revoke certain topics at the browser level? Yes, this is something that I believe the browser should be able to do. You could effectively consider this a change in taxonomy.

Any revoked topics currently stored by servers or in 1p storage in the browser will still exist however. But at least we can prevent them from being used going forward.

We need to figure out better access rules for Topics. While I understand the objections around FLoC's general access rules I think this goes too far in the other direction, and thus is likely to cause many of the ecosystem-level harms I am concerned about in a way that FLoC would not have.

If the ecosystem harm is performance related, we’re working to fix that with our exploration of adding Topics to headers as mentioned above. If the harm is about which entities have access to topics, Topics is maintaining status quo in order to not spread user information further than we already do with cookies, and I don’t see how that can be viewed as ecosystem harm.

Do take a look at the recent discussion in https://github.com/patcg-individual-drafts/topics/issues/82#issuecomment-1209735897, though, in case that helps with your concerns. It points out a way that the ecosystem could adopt Topics which should counteract both the calcification and on-page performance risks you raised, without the privacy downside of the old FLoC visibility free-for-all.

npdoty commented 2 years ago

I'm confused by the explicit goal of maintaining the status quo. It seems to me that the current status quo incentivizes getting your code to run on as many pages as possible so that you can surveil user activity around the web -- and that we have accepted that this status quo is harmful to performance and to privacy, unacceptable and in some cases illegal.

By maintaining the concept of a "witness", this design still encourages getting your code to run on as many pages as possible, and it seems likely that that code will both trigger the Topics API, but perhaps also try to identify the user and build a profile on them using existing methods. That incentive seems like a benefit to larger trackers and a harm to performance and privacy.

AramZS commented 2 years ago

@jkarlin Thanks for your response, I took a look and have some answers to your questions:

Hi Aram, thanks for your considered feedback, especially since it comes with suggestions! Some thoughts:

There should be ways to open Topics transmission to URLs without requiring a DOM entity on-page as a script, or iframe, or some other thing that can impact performance meaningfully.

We’re working on this, and have been from the start #1. See PR #81 for the current proposal for providing Topics with headers.

Thanks! I'll check it out!

Sites should be capable of blocklisting specific Topics from appearing to Topics-participating-systems on their site without blocking the whole Topics system.

Interesting. What is the use case for this that you have in mind?

I think it is inevitable that some Topics will be considered high value and some lower value. It might even change over time depending on what is going on in the real world. I think that, for the reasons I describe in the longer piece, most top level domains will not really have a choice over if they can turn on Topics or not if they wish to compete in the marketplace for digital advertising.

If that's the case, then I think as part of the process of blocking low value, unaligned, or TLD-unsafe ads, and in order to help control their value in the ad tech ecosystem, the publisher or TLD should be able to block particular terms from appearing and potentially poisoning their CPMs in the predictive markets that will form around Topics. Like any other indicator a Topic will shift the flavor of ads and the CPMs over time as it becomes weighted due to re-occurrence in the ad tech systems that record it. Blocking a Topic can allow domains to participate in Topics without risking the overall flavor of the type of ads they get.

Events in the real world may turn what are otherwise relatively innocuous topics into dangerous vectors for manipulation and the system should provide a mechanism by which it can be turned off universally around particular events.

As in, it should be possible to revoke certain topics at the browser level? Yes, this is something that I believe the browser should be able to do. You could effectively consider this a change in taxonomy.

Any revoked topics currently stored by servers or in 1p storage in the browser will still exist however. But at least we can prevent them from being used going forward.

This is great! I think it would be good to include more specifics in the proposal about how this might work and why a browser might choose to do so. I also think it would also be good if there was a way to make pause states clear to users.

We need to figure out better access rules for Topics. While I understand the objections around FLoC's general access rules I think this goes too far in the other direction, and thus is likely to cause many of the ecosystem-level harms I am concerned about in a way that FLoC would not have.

If the ecosystem harm is performance related, we’re working to fix that with our exploration of adding Topics to headers as mentioned above. If the harm is about which entities have access to topics, Topics is maintaining status quo in order to not spread user information further than we already do with cookies, and I don’t see how that can be viewed as ecosystem harm.

Do take a look at the recent discussion in #82 (comment), though, in case that helps with your concerns. It points out a way that the ecosystem could adopt Topics which should counteract both the calcification and on-page performance risks you raised, without the privacy downside of the old FLoC visibility free-for-all.

I think that #82 could help with the calcification of participants problem, but I don't think that Topics--as proposed--maintains the status quo. (I don't think of maintaining the status quo as particularly desirable, but putting that aside.) The issue here is that Topics will inevitably become the most trusted indicator for many buyers. Access to them will become extremely valuable--because they will be perceived as highly accurate--and prediction markets will inevitably grow around them.

Ironically, I'd argue that the status quo is currently a lot like FLoC. 3p access and fingerprinting tech means that basically any participant on the page can guess at a user fairly well using the fingerprint and their capacity to view the user in the current domain context and across other domains and via data joins. A new participant or a single publisher can conceivably use the same indicators as a huge publisher or a decade-old ad tech company to make new unique useful guesses about a users' interest, knowing that they basically have the same signal and can innovate on how they interpret it. Topics doesn't allow for that, there's only one signal and access is mediated through scale.

As I've already stated, I'm not a particular fan of the current system, but I agree with what you're saying, I see Topics as trying to maintain the status quo. My concern here is that it isn't really doing that. If it is going to change that status quo, I'd prefer it be for the betterment of the ecosystem, but I don't really see it as being likely to do that either. Maybe delegated Topics access as described in #82 can help, but it is still dependent on particular gatekeepers being willing to cooperate in a way that the current system is not. There is still a push towards calcification of participants that Topics would drive.

I'm really not sure what the resolution could be. I think that FLoC, which was a lot closer to the status quo, had its issues and I understand the objections and feedback it got, but at the same time, the objections people made to FLoC came out of a failure to understand how the ad tech ecosystem works... which, in reality, is a lot like FLoC. And some of the objections came out of a good understanding of how the ad tech ecosystem works and a desire to see it change and not be maintained. I understand that Topics is trying to walk a pretty fine line here but I'm not sure it is going to manage it.

At the end of the day my biggest concern on this question is how Topics further intermediates between publishers and users and advertisers. Inevitably this pushes power further in the direction of ad tech middlemen... a locus of the ad tech ecosystem that already has too much power and too much willingness to abuse it. It seems inevitable that Topics would, for example, solidify the Ad Tech Tax if not increase it; since scale players will force themselves into involvement on-page in exchange for Topics term access at the bid level and intermediaries will build predictive markets of Shadow Topics as an excuse for further value extraction.

FLoC wasn't perfect on this either, but at least its approach didn't make things significantly worse in this space. I'd like to see an iteration of Topics that can address the concerns of FLoC critics and the concerns I see likely to arise among publishers. I'm not sure how however. I wonder if maybe top level domains could get the capacity to audit Topics in non-real-time or some other way that will allow them to counteract bad predictive systems and to compete by building their own that are equally effective? I feel like this is a lesser exposure than FLoC was that might handle some concerns, but I'm not sure.

This is why I didn't really have a suggestion here, I'm honestly not sure what a solution could be that addresses all concerns. I think we might have to acknowledge that any solution is going to create change, the question then becomes, who do we want to prioritize with those changes for? I don't see the publishers/top level domains in the prioritization of constituencies in this particular design.

jkarlin commented 2 years ago

Note the goal is not to maintain the status quo, it's to be more private than the status quo. Maintaining the status quo, in my earlier context, was just to say that we're not sharing access to cross-site user data beyond the level that cookies do. That is, access is sharded by domain.

I think a lot of what you're saying centers around:

Ironically, I'd argue that the status quo is currently a lot like FLoC. 3p access and fingerprinting tech means that basically any participant on the page can guess at a user fairly well using the fingerprint and their capacity to view the user in the current domain context and across other domains and via data joins.

If the participant has the capacity to see the user in the current domain context and across other domains then they can see the topics. Which seems fine? Or are you suggesting that every player on a page today has access to the same data due to data joins? I don't believe that's what typically happens today. Cookie matching is frequently done to help identify users so participants can build their own cross-site database about a user. But if the participant wants access to somebody else's personalized view of a user that they built up over time then I imagine they'd have to pay for that.