patcg-individual-drafts / topics

The Topics API
https://patcg-individual-drafts.github.io/topics/
Other
589 stars 168 forks source link

Different topics on different domains #277

Open eysegal opened 8 months ago

eysegal commented 8 months ago

Hi, I thought the purpose of Topics is to show which topics users are reading across sites without disclosing the user id. However on my personal browser, I see I get different topics in different sites. Why is that? Is it just based on the context of the current page I'm visiting?

jkarlin commented 8 months ago

Each week you get 5 new topics. And each site you visit that calls the API will get one of those 5, and it's sticky (that site keeps getting that same topic for the remainder of the week). The reason that we distribute it this way is explicitly to make it harder to track users across pages. Because site A will see topic 1 and site B will see topic 2 for the same user, it's harder for A+B to collude and determine that it's the same person based on topics than if both sites saw the same topic.

eysegal commented 7 months ago

Thank you for your answer. But the point of Topics is to compensate over cross-site tracking. If I get "News" on a general news site and "Investing" on a financial site it doesn't help advertising, because when I'm on the sites I know what is the context of the current site. What I need is what the user is reading about on other sites (as we do with cookies).

jkarlin commented 7 months ago

Which topic you get on which site is randomly chosen from the top 5 from the previous week's browsing. The chosen topic is not otherwise correlated to the page you're currently visiting.

eysegal commented 7 months ago

Ok, so it's only one topic once a week? And weird, it's seems that it is correlated, but maybe it's by chance.

omriariav commented 7 months ago

@jkarlin, have you considered returning more than one topic in case the current site topic the user is visiting is equal to the topic that the browser is about to return? According to the Nov 21st update - it looks like Topics will return two of the five top topics, but what about having at least one of them different than the current website topic?

jkarlin commented 7 months ago

@jkarlin, have you considered returning more than one topic in case the current site topic the user is visiting is equal to the topic that the browser is about to return?

The API returns up to 3 topics (one for each of the past 3 weeks). So we technically do return multiple per call. Note that even though there are 3 topics returned, the max rate that one site can learn new topics is one per week.

Ensuring that the returned topic is different from the current site's topic is relatively expensive, as it would require running our classifier on the current site when determining which topics to return. It could be done but we prefer to only run the classifier once per week.

omriariav commented 7 months ago

Thanks @jkarlin for the answer As for returning a topic that is different from the current site - since the current website was already classified within the epoch, can we pull this data locally?