patcg-individual-drafts / topics

The Topics API
https://patcg-individual-drafts.github.io/topics/
Other
620 stars 230 forks source link

Update permissions policy to support separate permissions for retrieve and observe #92

Closed dmarti closed 1 year ago

dmarti commented 2 years ago

Currently there is one permissions policy for Topics API:

Because it is now possible to retrieve topics without modifying state (https://github.com/patcg-individual-drafts/topics/pull/80) there are now two applicable permissions: one for retrieving existing topics, and one for observing topics. Keep the existing Permissions-Policy for compatibility, and document that it will allow both retrieve and observe, and also add new policies that cover retrieve and observe only.

Related: #54

Also possibly related: #118 (If URL path and page content can be used as input to the classifier, make it a separate permission)

Related: https://github.com/patcg-individual-drafts/topics/issues/206 (a title Permissions-Policy option could also fix that issue)

(edited to match "observe" flag in https://github.com/patcg-individual-drafts/topics/pull/80, thanks to @patmmccann for comment)

zhengweiwithoutthei commented 2 years ago

+1 this is important for the use case where SSP retrieves Topics on behalf of buyers in a way mentioned in #82 at the same time stop buyers from observing Topics from publisher's pageview. Buyers should not expand their footprint beyond their reach (i.g. advertiser sites). Meanwhile, it also help buyers from diluting the topics they observed on advertiser websites which may be found with more commercial value than the topics of publisher sites.

dmarti commented 1 year ago

Added a link to a related issue.

patmmccann commented 1 year ago

I think Permissions-Policy: browsing-topics-observe might be generally more simple to understand than -modify

dmarti commented 1 year ago

@patmmccann Thank you, edited.

patmmccann commented 1 year ago

An additional use case here: many publishers are aware their topics are mis-assigned. ATPs and advertisers will prefer they don't contribute poor topics to the pool

dmarti commented 1 year ago

SSPs are able to select whether to train or not train on the page's topics when calling Topics API (see discussion at #225). This issue would give the option to the publisher as well.

michaelkleber commented 1 year ago

As we discussed in yesterday's Topics call, I am worried that this proposal would be bad for incentives and data quality: It seems like it would create a "prisoner's dilemma" environment in which every individual site might feel like they benefit from disallowing browsing-topics-observe, but then the entire Topics mechanism would collapse.

Right now (post-#80), there is indeed a party that can decide to "retrieve without observing". But that party is the API caller, and an API caller cannot decide to just "retrieve without observing" all the time: the per-caller topic filtering mechanism means that an API caller who never observes topics will also never retrieve topics.

An individual site, however, doesn't have that incentive structure: their decision to disallow observation would only hurt other sites and would never hurt themselves.

If you have suggestions for a way to give sites an incentive structure similar to what API callers already have, then I would be happy to discuss more. But we can't restructure the API into a prisoner's dilemma.

dmarti commented 1 year ago

The API caller isn't going to never observe topics -- they're more likely to choose to observe or not depending on the site in order to curate the set of topics that they want to sell. The SSP now has the option to selectively observe on sites with better-paying topics to increase their revenue across all the sites they appear on.

The publisher is unlikely to choose the "never observe" option either -- otherwise they wouldn't be able to work with an SSP that expected sites they work with to contribute Topics API data. But the publisher might have a percentage of users whose interest in their topic they believe they can best monetize in other ways, and choose to hold them out from training while allowing the SSP to observe the rest of their traffic.

All of this is going to get A/B tested in a variety of combinations, and it's hard to predict now which options will be chosen by which parties and what the terms of Topics API training agreements between publishers and SSPs (and possibly other parties) will look like. Since everyone is still experimenting with this, it seems like giving SSPs and publishers the same level of optionality is more likely to get to a win-win result than restricting the option to only one set of players.

michaelkleber commented 1 year ago

But the publisher might have a percentage of users whose interest in their topic they believe they can best monetize in other ways, and choose to hold them out from training while allowing the SSP to observe the rest of their traffic.

The supported way for the publisher to do that is to disable the Topics API for those users — both retrieving and observing.

The publisher is unlikely to choose the "never observe" option either -- otherwise they wouldn't be able to work with an SSP that expected sites they work with to contribute Topics API data.

If publishers and SSPs want to agree on some standards for reasonable "retrieve without observing" behavior, they can absolutely do so. The pub could signal to the SSP for which users they want this behavior, and each one could check that the other was following that agreement. The publisher could also use the existing Permissions Policy to allow Topics API access only for the SSP they have an agreement with.

All of this is possible without changing the API in a way that introduces prisoner's-dilemma dynamics.

dmarti commented 1 year ago

The SSP can choose the retrieve without observe option, but the publisher can't. The SSP can make the observe/no-observe decision just before calling Topics API, and the SSP can retrieve topics from the browser whether or not they choose to allow the browser to observe.

Right now the prisoner's dilemma is already set up, but it's a version where one "prisoner" (the SSP) can watch the interrogation of the other "prisoner" (the publisher) before making their own choice.

I understand that there was an original goal of having Topics API enforce reciprocity, but the problem here is that now reciprocity is required of some market participants but not others.

michaelkleber commented 1 year ago

Right now the SSP can choose "retrieve without observing", but as we discussed over in #225, doing so only affects that same SSP's future utility.

Letting a site force "retrieve without observing" would let the site do something that doesn't affect them, but degrades everyone else's future utility.

One is a trade-off that anyone can choose to make, the other is a prisoner's dilemma. That's why the API will support one but not the other.

dmarti commented 1 year ago

Let's plan to discuss at the next scheduled call. https://github.com/patcg-individual-drafts/topics/issues/115

patmmccann commented 1 year ago

Letting a site force "retrieve without observing" would let the site do something that doesn't affect them, but degrades everyone else's future utility.

One is a trade-off that anyone can choose to make, the other is a prisoner's dilemma. That's why the API will support one but not the other.

I think you misunderstand your role in the market; contracts and publisher - ATP relationships should govern this behavior, not you.

The API implementors on this chain made the decision to allow all publishers to be opted into a topics network without consultation by their publisher ad server. The GAM team highlights this as an advantage in their webinars, telling publishers to not worry, they'll handle everything. As @michaelkleber pointed out in discussion, his colleagues on the ad server team said it would be infeasible for them to achieve scale without this ability to absorb publishers into their network, so the Chrome team changed their plan to make topics opt-in; topics are now opt out. No other SSP has the capability of opting in all their publisher clients, as other SSP only run js on page after they win an auction. The outcome of your design is publishers have a forced choice with only one real path: participate in the topics network tied to the dominant publisher ad server, or the publisher will fail to monetize. This is a network that quality publishers likely want to avoid leaking data to, as it includes a large array of extremely low-quality publishers, publishers that even GAM wants to avoid polluting their topics network with the training from. We're not looking for publishers to just free ride, we're looking for an outcome where they don't feel coerced into the Google ad server topics network by its (API design-induced) sheer scale, as this problem has played out in the past to negative market outcomes with other google products. Implementing skipObservation on the request instead of the response, as well as allowing permission policies by publishers on who can or cannot observe, seems like one step in this direction. We certainly understand your hesitation to let publishers free-ride in a topics network with no incentive not to, but your decision to hand to AdX alone an internet-wide topics network on day one and have other SSPs build one from scratch is what got us into this mess. If AdX, or any other topics network provider, doesn't like publishers' free riding, they can handle that with contract negotiations, not browser coercion.

michaelkleber commented 1 year ago
  1. I am very much in favor of "contracts and publisher - ATP relationships" governing this behavior; indeed that has been exactly my position. So maybe we're in agreement already? But from the rest of your tone, I must be misinterpreting.

  2. Can you explain more about "other SSP only run js on page after they win an auction"? I was under the impression that the use of the Topics API from within prebid.js made it entirely feasible for other SSPs to call it pre-auction. If you're pointing out a shortcoming with the API design, and if it's not remedied by the introduction of the HTTP header API surface, then maybe it would be a good topic for live discussion in our recurring calls.

  3. You said "allowing permission policies by publishers on who can or cannot observe, seems like one step in this direction". That's great, because the Permissions-Policy allowlist mechanism already works that way! So again, perhaps we are already in agreement?

patmmccann commented 1 year ago

@michaelkleber we feel backed into a corner a bit with respect to the AdX topics network. Its sheer scale and a recent publication by DV360 that they will rely on it to achieve similar levels of spend as before cookie deprecation suggests publishers might be stuck using it or have their monetization tank. We want AdX to be able to include topics they have seen elsewhere on the internet on bid requests on our sites, but we don't want to leak data to the dregs of the internet, many of which are listed as adx topics participants here: https://storage.googleapis.com/adx-rtb-dictionaries/sellers.json and include sites that steal our copyrighted content, sites sanctioned by the us state department, etc. The breadth of the AdX network is astounding and includes nearly all sites labeled MFA by Jounce ( https://www.google.com/search?q=jounce+mfa+1%2F3&rlz=1C1CHBD_en-GBUS1034US1034&oq=jounce+mfa+1%2F3&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIHCAEQIRigATIHCAIQIRigATIHCAMQIRigAdIBCDMxNjJqMGo0qAIAsAIA&sourceid=chrome&ie=UTF-8 ) We feel our best recourse is to participate in most topics networks with reasonable participant safeguards and disallow AdX topic observation.

To your point 3, this is not my understanding of something we can currently achieve via permissions policies, am I mistaken?

I apologize for my frustrated tone in my previous comment; I read your comments as 'tough cookies, we're intentionally choosing ad tech over publishers, go away;'

patmmccann commented 1 year ago

I was under the impression that the use of the Topics API from within prebid.js made it entirely feasible for other SSPs to call it pre-auction

Only in an opt in sense, AdX has opted all its publishers in, Prebid SSPs must persuade publishers one at a time. Are you suggesting we just change prebid architecture to ram it down eveyone's throat as AdX did? That would be unprecendented but as you point out, may restore parity.

michaelkleber commented 1 year ago

I understand that you don't like the thing AdX announced. But surely you can see that your first response amounts to "We want an ad network to get data from other sites to help us, but not get data from us to help other sites." This is exactly the thing that might lead to nobody allowing Topics observation, and therefore could make the API useless for everyone — and also it makes Immanuel Kant sad.

We feel our best recourse is to participate in most topics networks with reasonable participant safeguards and disallow AdX topic observation.

Good news: this is indeed possible with Topics as it's implemented today! The Permissions Policy allowlist mechanism offers a way for a site to declare a list of origins that are allowed to use a permission, and deny it to anyone else:

Permissions-Policy: browsing-topics=("https://*.goodnetwork.com" "https://*.reasonablenetwork.com" "https://*.networkwithsafeguards.com")

This should, I think, let you pick who gets to use Topics on your sites, based on whatever criteria you want. I'm certainly not making any suggestions about what criteria those should be, what different ad networks should choose as defaults in a post-3rd-party-cookies world, etc. That is squarely in the realm of "contracts and publisher - ATP relationships", which, as we both said above, should not be the browser's role.

patmmccann commented 1 year ago

Good news: this is indeed possible with Topics as it's implemented today!

I think we're talking about different things, We want to give partner entities, say Pubmatic, permission to gather topics they observed on other sites to help them have a better win rate on our sites without committing to being a topics trainer. This is not possible via permissions policies, correct? This is a case where the SSP and us would both prefer the previous trained segments that were in their bid request, but we might not want to go so far as joining the network. Using Pubmatic only as an example here bc they are the furthest along other than adx towards assembling a network.

michaelkleber commented 1 year ago

Ah sorry, indeed I misunderstood. Permissions Policy will let you say e.g. "only Pubmatic can use the Topics API on my site", but the question of whether Pubmatic does topics retrieval or topic observation (or both) is directly between you and them.

In this scenario, where you've vetted the ad networks to which you give Topics permissions, it seems reasonable for one part of that vetting process is asking: "Do they have a setting that lets me tell them to retrieve-without-observing on my site? Do they honor that flag when it is set?"

patmmccann commented 1 year ago

"Do they have a setting that lets me tell them to retrieve-without-observing on my site? Do they honor that flag when it is set?"

This is exactly the feature request, we'd like to be able to enforce honoring it, just as the SSP can decide not to train, we'r looking for balance, the publisher should be able to prevent train as well.

michaelkleber commented 1 year ago

Right, I understand that, but (as we discussed above at great length), publisher sites using Permission Policy to force that behavior leads to the Topics data likely having no value to anyone. In the example you suggest, this would be a discussion between you and Pubmatic, and presumably if every one of their publishers decided to do the same as you want to, Pubmatic would just choose to drop Topics entirely, because it would have no observations. That's a fine decision for any one caller to make, but of course we don't want to do it for the ecosystem as a whole — at that point we might as well just not have the API.

dmarti commented 1 year ago

Retrieve without observe has free riding problems if any party is able to do it.

Yes, publisher free riding would be a problem for the API, but so is free riding by third-party callers. An SSP can decide to retrieve and observe on sites with the highest-paying topics (or topics that are pieces of high-value topic sets), and also decide to retrieve but not observe on sites with lower-paying topics.

Right now the SSP can choose "retrieve without observing", but as we discussed over in https://github.com/patcg-individual-drafts/topics/issues/225, doing so only affects that same SSP's future utility.

This is true, and an SSP will be able to increase its future utility by choosing to do retrieve without observe on a site with low-value topics while selectively observing on sites with higher-value topics.

ML on the SSP side will be able to "learn" which sites to observe on in order to maximize revenue, so all SSPs will end up offering relatively few possible sets of high-value topics rather than topic sets that reflect raw browsing behavior by users.

It seems like the workable choices are to limit free riding by all parties or none.

michaelkleber commented 1 year ago

This is the same flawed analysis that you raised in #225. As you said on that issue, "the SSP should be able to call Topics API in an optimized way, in order to collect and present the topics data for a particular user that they believe to be most likely to attract a high bid."

I feel like this conversation is going around in circles. The feature you two are asking for would make the API's data quality worse, and would decrease the value of advertising. Why should anyone be in favor of "Please add a button that lets my site make Topics worse for all other sites", other than someone hoping that Topics will prove useless?

dmarti commented 1 year ago

Yes, we need to avoid wishful thinking here and try to figure out something testable about how Topics API actually works in practice. (None of the players are the villains here, everyone is just acting to best achieve their own goals within the actions allowed by the API)

Can we start by agreeing that

  1. Topics API was originally based on reciprocity (topic data in for topic data out), and
  2. When a party can retrieve topics from the browser without contributing topics it creates an opportunity for free riding?
patmmccann commented 1 year ago

You're right, we're going around in circles a bit. One last effort to persuade you, or at least a response to this point for the record:

The feature you two are asking for would make the API's data quality worse,

Better curated is not worse.

and would decrease the value of advertising.

This is likely not an accurate prediction; particularly if the results of the API are available in more places with this feature. It is easy to imagine a marquee publisher with very valuable data, eg Expedia, refusing to allow training ever. If the results of the topics api are to be included in Expedia programmatic requests, this feature missing is a show stopper. In that sense, this feature would improve the value of advertising. This is largely a more simple example of our objection to the shady topics network run by AdX which mid-value data is leaked to very low-value sites.

michaelkleber commented 1 year ago

It seems to me that both of your comments come down to thinking about what the difference is between "free riding" and "curation".

Per-caller topic filtering means that a caller choosing whether or not to observe topics falls into the "curation" bucket. The per-site permissions policy that you ask for here falls into the bad bucket.

dmarti commented 1 year ago

@michaelkleber Yes, that's right if the set of sites and 3rd party callers is fixed, which it would be on any given day. We might be just thinking about different time periods here. In the medium term, we can anticipate

The extent to which callers can curate/free-ride affects the content creator and SSP decisions about what sites will be built and monetized.

michaelkleber commented 1 year ago

@dmarti I see your point about incentives, but aren't web site creators already much more incentivized to create sites about things that advertisers value, because those sites get more advertising revenue on purely contextual targeting? It seems like you're objecting to a hypothetical future second- or third-order effect that is utterly swamped by the first-order effect that has been the case for decades.

dmarti commented 1 year ago

@michaelkleber Yes, there is some incentive to make sites on ad-friendly topics, but the cost of producing a review, how-to, or news site on a topic relevant to advertisers tends to be orders of magnitude higher than the costs of producing a minimal low-effort deceptive site capable of getting ad revenue. What kinds of sites get made depends on the ability to monetize them (and, because of supply and demand, a high expected profit for making a deceptive site can lower the expected profit for making a legit site)

You are also right that a version of this problem has been in place for decades. Third-party cookies established the rules of a game. Some of the possible moves in the third-party cookie game are:

Right now the Topics API game appears to be designed in a way that is more biased against legit publishers and in favor of other parties than the third-party cookie game was. This is a risk because it will mean losing some of the marginal legit publishers when parties stop playing the old game and start playing the new one.

The rules of the Topics API game are not going to exactly match the rules of the third-party cookie game, and it's unrealistic to expect them to be fair to all parties. But it seems less risky to try to make the new game less biased against the publisher than third-party cookies have been.

michaelkleber commented 1 year ago

Once again, I flatly disagree with your unjustified assertion that Topics is "more biased against legit publishers" than 3p cookies. Publishers have the ability to control what parties can or cannot use Topics on their sites. This is in stark contrast with 3p cookies, where any party that ever touches the publisher page — even parties unknown to the pub, who merely manage to run an ad on the site — end up able to learn detailed targeting information.

But irrespective of our disagreement about the nature of the problem, your proposed solution is completely unworkable. You are asking for a control that will let any site make Topics worse for everyone else. I hear you say that your goal is to make Topics worse for only sites that you think are your enemies, but you have not even proposed any mechanism for that, only a mechanism that would hurt everyone who uses the API.

dmarti commented 1 year ago

@michaelkleber You're right if you're only considering the time scale where the set of sites and callers is fixed. Over longer time intervals, though, SSPs and publishers can enter and leave the market, and Topics API tends to build market power for a few, widely present callers, or possibly one. (see https://github.com/patcg-individual-drafts/topics/issues/73)

If publisher free riding is unworkable, then SSP free riding is also unworkable. Both tend to reduce the accuracy of available data, just in different directions. (It's symmetrical -- SSPs can "curate" out the topics from low-value sites, publishers would be able to withhold some audience members with high-value topics.)

It would be useful for someone who is in neither the SSP or publishing business to study the effects of different free riding rules on the market here.

michaelkleber commented 1 year ago

Again, the per-caller filtering of topics mean that the SSP and publisher stories are very much not symmetrical.

It would be useful for someone who is in neither the SSP or publishing business to study the effects of different free riding rules on the market here.

I agree — something like an independent academic exploration of this topic would be very interesting. Please re-open this issue if you find anything of the sort.

michaelkleber commented 1 year ago

(oops sorry, closing as "not planned", pending new information to consider)

dmarti commented 1 year ago

There is a summary of this issue in: Privacy Sandbox Progress Report Q3 Reporting Period - July to September 2023.

The CMA has also shared stakeholder feedback that publisher sites should be able to retrieve topics whilst opting out of training, or alternatively third-party callers should also be denied this option. While third-party callers can indeed decide to retrieve topics without observing, third-party callers cannot do this all the time, because the per-caller topic ltering mechanism means that an API caller who never observes topics will also never retrieve topics. In contrast, allowing publisher sites to retrieve topics while opting out of training would enable publisher sites to negatively impact the utility of the Topics API for the ecosystem as a whole, without it negatively impacting the utility of the Topics API for the site itself. Please see this GitHub issue for a more detailed discussion of the topic.

One needed clarification here is that the SSP free riding problem is not where an SSP chooses to always retrieve without observing; free riding happens when an SSP selectively chooses whether or not to observe. Free riding SSPs would likely be willing to observe on sites where the resulting topics would increase the value of a user's topic set, and can make the decision to observe or not at any time before calling.

SSP free riding is actually more harmful to the ecosystem as a whole than publisher free riding. Publisher free riding only limits the availability of some high-value topics to some bidders in some situations. SSP free-riding harms multiple web constituencies:

We understand that SSP free-riding is easier to implement for performance reasons than requiring pure reciprocity from SSPs. However, SSP free riding represents an ongoing risk to users and other stakeholders. Giving some optionality to publishers would put publishers of sites with high-value topics in a better position to influence behavior of SSPs in a more constructive direction. Please re-open.

*Legit sites keep their domains for a long time, often decades, but pirate sites churn through lots of domains on a scale of weeks. Users visit multiple domains to find a new source of pirate content when a site they previously used was shut down. If an SSP always observes on every domain, this domain seeking behavior will likely populate much of the Topics API data for those users, which gives SSPs that monetize pirate sites a strong incentive to observe selectively.

See also: meeting notes from 23 Oct 2023

dmarti commented 6 months ago

Related issue to address this problem at the attestation level: #266