patcg-individual-drafts / topics

The Topics API
https://patcg-individual-drafts.github.io/topics/
Other
601 stars 194 forks source link

Toward an oligopoly of “Topic providers”? #73

Open lbdvt opened 2 years ago

lbdvt commented 2 years ago

“Only callers that observed the user visit a site about the topic in question within the past three weeks can receive the topic”.

This restriction incentivizes websites to integrate with third parties that have a large footprint in terms of such observation, to maximize the chance of getting user topics. As these third parties integrate with more websites, so grow their “observation” capacity. This self-reinforcing loop is likely to lead to a few (or one?) “Topics providers” integrated with a large number of websites on any given market, with a head start for ad tech companies that already have a large footprint.

Such a restriction does not exist with third-party cookies. Cookie matching enables ad tech companies to collaborate. For example, Ad Tech company A, which makes observations on site 1, can leverage these observations on site 2 without being directly integrated on site 2, provided it has a cookie matching in place with another Ad Tech company integrated on that second site.

“The exception to this filtering is the 5% random topic, that topic will not be filtered.”

This exception, which introduces a fixed amount of noise, reinforces the “Topics provider” position. The larger the observation capacity, the better the signal-to-noise ratio.

For example, let’s take an Ad Tech company integrated with a set of sites that observes the 5 top topics for 10% of the users on this set (which can already be a challenge). When calling the Topics API, the Ad Tech company will get:

For this 10% of users:

For the 90% of remaining users:

The result is a signal-to-noise ratio (true Topic to random Topic) below 2:1 (34% of topics returned by the API will be a random Topic).

On the other end of the spectrum, an Ad Tech company with a very large integration that could observe the 5 top topics for 70% of its users would have a signal-to-noise ratio of 13:1 (only 7% of topics returned by the API will be a random Topic).

This “Topic provider” position provides advantages: it ensures first access to a user topics, and guarantees on-page integration.

What are your thoughts on this? How could such a market concentration outcome be avoided?

dmarti commented 2 years ago

Related issue: #38. One possible way to address centralization risks would be to increase the probability that a random topic will be returned to a caller that is present on more sites. ( https://github.com/patcg-individual-drafts/topics/issues/38#issuecomment-1136612789= )

jkarlin commented 2 years ago

Such a restriction does not exist with third-party cookies. Cookie matching enables ad tech companies to collaborate. For example, Ad Tech company A, which makes observations on site 1, can leverage these observations on site 2 without being directly integrated on site 2, provided it has a cookie matching in place with another Ad Tech company integrated on that second site.

Is there not then a strong incentive to cookie match with a large third party? I disagree that this is fundamentally changing the desire to have a large footprint. Sites will just need to make that explicit (rather than server-server cookie matching) by including a domain that has a larger footprint (e.g., both A and B call the API or A+B form a common domain) on their site. Issue #55 is a discussion of how that kind of explicit integration might work.

“The exception to this filtering is the 5% random topic, that topic will not be filtered.” This exception, which introduces a fixed amount of noise, reinforces the “Topics provider” position. The larger the observation capacity, the better the signal-to-noise ratio.

This is an interesting point. We’d originally required that the random topics be filtered (required witnessing) as well, but then realized that wouldn’t help with plausible deniability or ensuring that all topics had some representation, so we decided that they shouldn’t be filtered. That decision is coming into question and @dmarti has also pointed out that we could have a constant signal-to-noise ratio by increasing noise as the caller’s footprint increases. All of this should be added to the discussion about what to do with noise.

lbdvt commented 2 years ago

Is there not then a strong incentive to cookie match with a large third party? I disagree that this is fundamentally changing the desire to have a large footprint.

As of today a small DSP, integrated on a few advertiser sites, can rely on its own collection of user signals on these sites to define audiences and buy at scale on any SSP, provided it has a cookie match in place with these SSP, without the need to be directly integrated on publishers' pages.

With Topics, this kind of integration cannot happen as to get signals related to site A user visits on a site B, an AdTech company must be integrated on both A and B.

Sites will just need to make that explicit (rather than server-server cookie matching) by including a domain that has a larger footprint (e.g., both A and B call the API or A+B form a common domain) on their site. Issue https://github.com/patcg-individual-drafts/topics/issues/55 is a discussion of how that kind of explicit integration might work.

Indeed, and by doing that sites will select only third parties with a large footprint to be integrated as Topics provided on their pages, as others will have a limited reach that cannot be compensated by a mechanism similar to cookie matching. Hence a dynamic toward market concentration of Topics providers.

michaelkleber commented 2 years ago

Hey @lbdvt, please take a look at https://github.com/patcg-individual-drafts/topics/issues/82#issuecomment-1209735897 that I just left on another issue — I think you might really like it as an answer to your "oligopoly" concern. DSPs are used to cookie-matching that put the SSPs in control, but Topics works much better if the flow moves in the opposite direction, where the DSPs get to decide who can receive the information from advertiser-page visits!

lbdvt commented 2 years ago

Thanks @michaelkleber for this reply!

1/ Indeed, https://github.com/patcg-individual-drafts/topics/issues/82#issuecomment-1209735897 provides an interesting way to have a DSP work with multiple SSPs: As a DSP, if I call the SSPs I work with when a user visits a page from an advertiser I work with, those SSPs will be able to get my topics for my users when they visit a publisher page they work with.

This "topics matching mechanism" sovels, I think, part of the "oligopoly of topics providers" question.

2/ I think that the "fixed amount of noise" part of the question, introduced by the 5% random topic, remains, unless I missed something?

Indeed a (SSP, DSP) couple with a small number of sites visited per user will still get a higher signal-to-noise ratio than a (SSP, DSP) couple with a large number of sites visited per user .

3/ And it also brings an interesting new market dynamic.

As of today, through cookie matching:

With "topics matching":

It creates some kind of "data coop". As a DSP when partnering with an SSP I bring my user topics, but I also get the user topics from other DSPs partnering with that SSP. As for any data coop, various actors will have different incentives to participate depending on what they bring and what they get.