jonasz / product_level_turtledove

8 stars 0 forks source link

Proposed Changes #8

Open appascoe opened 4 years ago

appascoe commented 4 years ago

As discussed last week, here's my list of proposed changes to product-level TURTLEDOVE, along with some graphs and discussion.

Proposed Changes

  1. We argue that in ecommerce it is the product, not the creative (a collection of products), that would be the most natural conceptual unit behind Turtledove mechanisms. I would strike this. Products are a natural fit for retail advertisements, but may not be suitable for B2B services, general branding, etc. NextRoll itself has many clients that continue to use static creative. I would consider this proposal as supplemental to TURTLEDOVE, with an admittedly important use case.

  2. I would like to add that requests to ../.well-known/fetch-ads aren't necessary when the browser is on the advertiser's site. We can eliminate minimum audience thresholds by writing web bundles at this time. Moreover, this speed to send web bundles to the browser is critical for such product advertising use cases as the "identity recommendation" algorithm which recommends recently viewed products. (It's also useful because ads delivered to users who have more recently been to an advertiser's site are significantly more valuable, so eliminating the delay preserves publisher revenue.) ../.well-known/fetch-ads is still useful as it provides an opportunity to update recommendations or the creatives intended to show in the browser hours or days after the browser has been to the advertiser's site.

Below, I've attached graphs that demonstrate how minimum audience thresholds can drastically affect the performance of other types of clients beyond larger retail players. This is another reason they should be a fallback plan than the primary mechanism for ad/product selection.

  1. ...but conceptually, we just need to get rid of very rare products and then just run the old algorithm. Very rare products are important for ad delivery in small campaigns that are run by merchant platforms. NextRoll is a partner with both Yelp and Rakuten, and the graphs below demonstrate the effects of including and excluding them in our analyses. We should try to solve for these advertisers' use cases.

  2. and shuffles them before they are passed to the "template" web bundle. I don't think this is feasible for performance reasons. I understand the desire to avoid ordering-based pranks. However, there are a couple of reasons I object to this. One is that Facebook shared results that removing user information from their ranking algorithm dropped publisher revenues by 40-50%. Second, is that at NextRoll, we blend recommendation algorithms to hedge bets, though some products are clearly superior to others; in the situation of a carousel ad, the first item seen can significantly affect engagement with the ad (from scrolling through to other products, to overall CTR).

I'd expect ordering-based pranks, such as assembling text, should be handled by a sufficient audit through an SSP. Audits are part of the ecosystem today to increase trust, and I see no reason this should change.

Graphs and Discussion

At NextRoll we duplicated the graphs in this proposal with our own data. As I said, we're partners with marketplace platforms such as Yelp and Rakuten (our two largest partners). Their merchants purchase advertising through the marketplace, and we're one of the channels these marketplaces leverage. As such, NextRoll frequently runs campaigns with modest audience sizes and very small product catalogs. The behavior and performance of these marketplaces and campaigns are so radically different than an advertiser who comes to us directly, that we frequently need to separate them out for analysis. The graphs below should demonstrate this (Rakuten identified as "R" and Yelp identified as "Y"):

image (1)

image (2)

image (4)

plot2_a-1

As can be seen, our retained CTR drops off a cliff for many advertisers with an audience threshold of 30 when marketplaces are included. Note that our fraction of retained events actually improves when they're including. Essentially what's happening is there are lots and lots of merchants with such modest deliveries, that they don't account for a significant number of events in our dataset, but this large number of advertisers would be adversely affected by an audience threshold of 30.

This is just one of the reasons why we here at NextRoll are arguing so much for the ability to add web bundles while on the advertiser's site without any audience size thresholds. These small clients should be able to participate in the overall digital ad ecosystem so they can attempt to grow their brands. This comes at no cost to privacy due to first-party cookies sticking around.

jonasz commented 4 years ago

Hi Andrew,

Thank you for the suggestions and the data. I am happy to see, PLTD would allow you to retain a high fraction of clicks and impressions at reasonable product-level audience size thresholds.

Still, I understand it is an important use case for NextRoll to be able to serve rare products in ads.

To comment on your points 2., and 3., I think it is important to understand the product-level proposal as: "policy mechanisms should be applied to products, not to collections of products", and not as a discussion in favor of minimal audience thresholds, or in favor of delays in calls to fetch-ads.

Product level Turtledove opens up the possibility to drop audience thresholds altogether, as described at https://github.com/jonasz/product_level_turtledove#auditability, and we'd be happy to explore this idea further. I was wondering, would that satisfy your use case?

  1. and shuffles them before they are passed to the "template" web bundle. I don't think this is feasible for performance reasons. I understand the desire to avoid ordering-based pranks. However, there are a couple of reasons I object to this. One is that Facebook shared results that removing user information from their ranking algorithm dropped publisher revenues by 40-50%.

Note that the product-level proposal only mentions shuffling the items after they've been chosen for the particular user (so it affects the ordering, not the selection of items). Are you sure this is what FB did in the study you're referencing? In our experience the drop in revenue would be orders of magnitude lower. Of course the drop is there and, if possible, at RTB House we'd prefer not to shuffle, but it's not a deal breaker for us.

To answer the point 1. - Of course, the product-level approach should be understood as an extension to TD; "single-product", "no-product" and "static" ads would still be supported in PLTD. Perhaps the confusion comes from the definition of "ecommerce" and "product"? We meant it as defined at https://en.wikipedia.org/wiki/E-commerce.

appascoe commented 4 years ago

Hey, Jonasz,

Sorry for the delay. I was on vacation all last week.

As for points 2 and 3, I don't much see the difference between the number of users attached to a product and the number of users attached to an interest group. My understanding is that the same differential privacy policies would have to apply to either. I suppose my point is that writing the interest groups and/or the set of products to target a user with while they're on the advertiser site should be part of this proposal as it resolves the issue with rare products (and, as an additional benefit, small interest groups) with no effect on privacy.

As for the section on auditability, I must admit, it's not clear to me what the mechanism is to eliminate thresholds? Could you please elaborate further?

With the study FB did, unfortunately I don't have a URL to share. It's something that was mentioned in a Web Advertising Advisory Board meeting. At least, my understanding was that the solely changed their ranking algorithm to not leverage cookie data and saw this steep drop. I don't know if this was ranking over a small selection of ads, or a total ranking on all available ads. Perhaps @benjaminsavage could elucidate.

Point 1 isn't a huge sticking point for me. I just think that products being a "natural unit" is more applicable to some clients as opposed to others.

jonasz commented 4 years ago

Hi Andrew,

As for the section on auditability, I must admit, it's not clear to me what the mechanism is to eliminate thresholds? Could you please elaborate further?

The audience thresholds are just a means towards a goal - the goal being to make it harder for advertisers to produce ads that violate user comfort, e.g. by displaying personally identifiable information in them.

I think the same goal could be achieved by a different strategy:

The audit and labelling could rely on:

I think the audit-based approach could be much more effective than the minimum audience thresholds (and so there would be no reason to forbid rare item recommendations). Moreover, it'd allow for a flexible implementation of various other policies, if desired ("no graphic content", "no landing pages with malware", etc.).

Would that work for you?

@michaelkleber, I'd be also very curious to hear your thoughts on that. Would you see any concerns with adopting audit based approach and allowing rare item recommendations?

michaelkleber commented 4 years ago

I do not believe that browsers want to get into the role of looking at potential ads and evaluating their contents for likelihood of violating user comfort.

appascoe commented 4 years ago

Yeah, I'm not sure how the auditing is supposed to work to solve the problem here. Making the ads publicly accessible before serving, while perhaps technically doable, is not realistically feasible for an audit. Many ads are shown within minutes, even seconds, after leaving an advertiser's site. As such, this can't be a preventative measure, only a punitive one.

Beyond that, what percentage of ads would we expect to be unsafe? I suppose I was asking this question back in the issue on TURTLEDOVE: https://github.com/WICG/turtledove/issues/36#issuecomment-649717026 . (My point there was that unless this is a widespread UX issue, the penalty to publisher CPMs is very high.) But here, if these UX issues happen rarely, and we're talking about drawing from a pool of all programmatic ads shown on the internet, you're going to need a significant sample size to even detect the issue. That's a lot of trained reviewers, potentially.

Employing a feedback mechanism for the user sounds very reasonable, though.

jonasz commented 4 years ago

Product validation latency and costs of such validation effort are of course open questions. I think reasonable tradeoffs could be reached here, but that'd require a more detailed discussion.

However, if the browser teams prefer to avoid taking that role on themselves, I guess there is no need to pursue that idea further.

Thanks for the feedback!

benjaminsavage commented 4 years ago

Here is the blogpost about the study we performed: https://developers.facebook.com/blog/post/2020/06/18/value-of-personalized-ads-thriving-app-ecosystem/

That study removed personalization the selection of which ad to show, and thus applied to selecting from a large corpus of possible ads. The effect of removing personalization from just product selection in a multi-product ad would have to be studied in a separate study, but I would anticipate a significant impact to ads performance. Many merchants have extremely large catalogs of items, and showing random products will tend to show far less relevant content that people would be far less likely to engage with.