Feedback on taxonomy v2 ramp up plan

leeronisrael commented 1 year ago

As previously announced, we have ramped up the updated taxonomy (version 2) to 50% of Chrome Beta, Dev and Canary. Assuming metrics look good, then the next step will be to ramp up in Chrome Stable. Beginning the week of 8/28, our current plan is to ramp up to 1% for two weeks then begin the ramp up to 100%.

During the ramp up, users may have a mixture of topics from the v1 and v2 taxonomy. Developers will be able to differentiate between taxonomy versions by inspecting A) the topic id (>349 is from taxonomy v2) or B) the taxonomyVersion property in the response object.

We are interested in feedback from the ecosystem on our ramp up plan. For example, should we instead ramp up to n% and hold for some fixed number of weeks, to give developers time to update their integrations? If so, what is n?

patmmccann commented 1 year ago

I think you should just go 100% on the new taxonomy with little to no ramp-up time. It should be simple to switch for any implementor. As there aren't many commercial products built upon the api yet and everyone is in testing mode, better to rip the bandaid off before it gets even more painful.

lbdvt commented 1 year ago

I agree with Patrick's suggestion of a quick ramp-up.

piwanczak commented 1 year ago

+1 to Patrick's suggestion. Faster ramp up is preferable

bmayd commented 1 year ago

Following up from comments in today's lightly attended meeting.

I think this issue may be assuming that classifier versioning and taxonomy versioning are the same thing and I suspect that isn’t the case and that we need to consider each, and how each is deployed, separately.

When we use Topics API, we’re not employing a taxonomy, we’re employing a classifier which produces signals expressed as taxon IDs. We use these signals to understand the relationship between the classifier and the larger context in which we seek to apply it by correlating the signals it produces with other context attributes. What the actual signals are doesn’t really matter so long as we can tell whether a signal received was derived in the same way as signals we may previously have encountered or not. Said another way, we need a unique signal for each output of each classifier version so that we can distinguish materially different outputs. Whether use an entirely unique value or a unique combination of classifier version and taxon ID doesn’t really matter so long as we can distinguish known signals from new ones.

I guess for completeness we’d need a combination of taxon ID, classifier version and some notion of how they mapped to a taxonomy version. The latter, however, seems likely to create confusion as what one classifier calls "Arts & Entertainment" may be very different for a given use case than what a different classifier version applies that label to. We probably shouldn't indicate they are the same, though it isn't clear how best to indicate they are different. Also not clear what a taxonomy version indicates, presumably some combination of description changes and additions and removal of entries.

patmmccann commented 1 year ago

not clear what a taxonomy version indicates, presumably some combination of description changes and additions and removal of entries.

@bmayd it is this

https://github.com/patcg-individual-drafts/topics/blob/main/taxonomy_v2.md

vs this

https://github.com/patcg-individual-drafts/topics/blob/main/taxonomy_v1.md

This issue is unrelated to model version, which appears to iterate [improve?] without warning.

bmayd commented 1 year ago

Thanks Patrick. I was following up on a conversation we had in the meeting today and a suggestion I then that taxonomy version is a lot less important than classifier version. The latter is generally what matters from an operational perspective. The lack of classifier version information and a well defined classifier update process makes depending on Topics unnecessarily risky.

Taxonomy version, on the other hand, conveys relatively little useful operational information in the absence of specific classifier version information and doesn't provide nearly enough intelligence to support relying on Topics for production use.

bmayd commented 1 year ago

@leeronisrael Can you say something about how the taxonomy update will be communicated to users and how it will impact choices users make, like blocked topics?

leeronisrael commented 11 months ago

Thanks for the question, @bmayd. A few things to note:

Today, users can block topics reactively (i.e., after they’ve been assigned to them). We announced our plans to introduce proactive topic blocking in this June blog post.
If a user blocks a topic, then all its descendants are also blocked. For example, if a user blocks “Autos & Vehicles” then all descendants such as “Classic Cars” and “Boats & Watercraft” are also blocked.
When the updated taxonomy rolls out, all topics blocked by a user will continue to be respected. For example, if a user blocked “Autos & Vehicles”, the updated taxonomy includes a new descendant, then that descendant will also be blocked.

We plan to communicate the update to users through our help center. You can see the existing article for Topics here.

bmayd commented 11 months ago

@leeronisrael thanks for the responses.

Regarding:

When the updated taxonomy rolls out, all topics blocked by a user will continue to be respected. For example, if a user blocked “Autos & Vehicles”, the updated taxonomy includes a new descendant, then that descendant will also be blocked.

Will users get notified if the updated taxonomy includes new Topics that are being blocked by existing choices? Using your example, a new "Luxury Sedans" Topic is added and I have "Autos & Vehicles" blocked, but I'm interested in luxury cars -- would I know it is available, but blocked because of an existing block?

Quick questions about this from the blog post:

This means users will be able to curate the set of available topics they are interested in by removing selected topics.

Does "available topics" refer to the entire taxonomy or the Topics that have been assigned to the user?

User controls updates will be available by early next year.

Why the long timeline on this? I assume users will be more likely to enable and support Topics if they have a greater sense of control and getting that engagement during initial adoption phases means folks are less likely to just turn it off entirely.

Also, has thought been given to allowing users the ability to turn off publication of Topics so they can learn what topics are applied to them prior to deciding if they want to allow them to be used by advertisers and maybe some history of Topics assigned?

leeronisrael commented 11 months ago

Thank you all for the feedback. We’ve heard from some in the ecosystem that a faster ramp up is preferable, while others have requested a few weeks to evaluate the impacts of the v1 vs. v2 taxonomy. Given this, here’s our plan:

We are currently at 1%, monitoring metrics
We expect to ramp up to 10% by the first week of October
We will hold at 10% for roughly 4 weeks
We expect to begin ramping up to 100% in early November

dmarti commented 11 months ago

@bmayd Giving users a sense of control (edit: if it's based on real understanding of what a topic set means to the caller) is a good idea. The raw list of topics, though, doesn't provide the user with enough information to understand how they're being assigned to a "cohort" of inferred group memberships by the caller. (The set of topics returned for a user is likely fed to ML on the caller side.)

There is an ML explainability problem here (https://github.com/patcg-individual-drafts/topics/issues/221#issuecomment-1649975686) that seems to be more challenging for users than the available information can help them with—users would need to see their inferred group memberships in addition to their topics lists.

bmayd commented 11 months ago

Giving users a sense of control is a good idea. The raw list of topics, though, doesn't provide the user with enough information to understand how they're being assigned to a "cohort" of inferred group memberships by the caller. (The set of topics returned for a user is likely fed to ML on the caller side.)

@dmarti I have a hard time agreeing with your first sentence given what follows it. The implication of a UI that allows a user to review and disable Topics is that users can make meaningful decisions about topics based on taxonomy labels, but considering that:

taxonomy labels are someone's attempt to describe their interpretation of a taxonomy ID output by a classifier
the classifier is encoding a constellation of input signals from potentially many sources as that single ID
for a given ID, different ML models may generate dramatically different, potentially diametrically opposed, decisions because they are picking up on different dimensions of the signals encoded into the ID

I don't see how a user can meaningfully exercise control by including or excluding Topics and giving them a sense of control will end up being counterproductive.

dmarti commented 11 months ago

@bmayd Thank you, that's a good point. Giving users a sense of control is a good idea if that sense of control is justified based on understanding what they're controlling--what personal attributes or group memberships are encoded by their topic sets. I agree with you on all the points in your comment.

leeronisrael commented 10 months ago

The updated taxonomy (version 2) is now ramped up to 10% of traffic. We are currently planning to ramp up to 100% in roughly 4 weeks.

patmmccann commented 10 months ago

@leeronisrael I just noticed on https://github.com/patcg-individual-drafts/topics/blob/main/taxonomy_v2.md all the new topics are out of range of the old topics https://github.com/patcg-individual-drafts/topics/blob/main/taxonomy_v1.md. This seems to suggest the taxonomies actually have no collisions where the same id has two different meanings. If so, that's great news, and should suggest the transition will be even easier.

leeronisrael commented 10 months ago

This seems to suggest the taxonomies actually have no collisions where the same id has two different meanings.

That's correct! Topics included in both v1 and v2 retain their v1 IDs. Newly added topics are enumerated begging at 350 (v1 had 349 topics).

patcg-individual-drafts / topics

Feedback on taxonomy v2 ramp up plan #236