google / ads-privacy

Apache License 2.0
333 stars 74 forks source link

Bidding accuracy in Dovekey #14

Open jonasz opened 3 years ago

jonasz commented 3 years ago

Hi,

In Dovekey the ability to bid accurately is severely limited by the necessity to materialize bids in the key value store. In fact, we don't see a way to implement a satisfactory system under such a limitation.

The bid value depends on many (more or less independent) factors, like interest_group, slot_id, number of recent ad impressions, time from the last visit to the advertiser's page, device type, and more. In real-life systems, the list of such signals can easily grow to over one hundred, each signal having between a few and a few hundred million values.

If we want to account for those signals, the key space very quickly grows well beyond what could be physically materialized in a key-value store. (The order of magnitude would be gigantic, our intuition is closer to 100^100 than to anything that can be managed in a physical system.)

I was wondering, have you considered a design where the bidder, during a single auction, can specify multiple keys, and combine the values into the final bid in a custom way? So instead of a single query with a long key:

let bid_value = dovekey.get('ig=WeReallyLikeShoes-athletic-shoes,slot_id=151513513,recent_impressions_1d=8,recent_impressions_5m=3,time_last_seen=17h,device_type=iphone7,...')

We would get

let bid_value = dovekey.get('ig=WeReallyLikeShoes-athletic-shoes,slot_id=151513513') * dovekey.get('recent_impressions_1d=8') * ...

That seems like a simple way to greatly optimize the size of the KV store, and to allow for more accurate bidding at the same time.

Of course the snippet above is just a discussion starter, the amount of browser-side flexibility for key construction and bid computation would require a more detailed discussion.

Best regards, Jonasz

ardianp-google commented 3 years ago

Thank you for your interest in Dovekey.

Bid accuracy and scalability are important concerns for Dovekey, which are also voiced in TERN’s explainer.

Dovekey strives for simplicity as a core design principle. We agree that there is always going to be a tradeoff between efficiency vs simplicity. How to arrive at the right balance is an open question (perhaps an empirical question). Compared to classical TURTLEDOVE variants where the bids are computed in the browser, Dovekey’s main advantage is simplifying DSP and SSP adoption, as the bids, brand safety logic along with other publisher rules, are computed and enforced on the server-side, similar to how it operates today. This server-side computation also allows to ensure bid authentication.

What we have in mind is to use this KV server like a cache. Let us assume we have an initial state of the KV server (possibly empty). For every ad request, we can construct the key to retrieve the bid (that considers various signals as you mentioned). The DSP can then utilize the aggregate reporting API to find out which keys should be populated to provide coverage for the dimension combinations that happen in practice. This way, the DSP doesn’t need to materialize the entire feature space.

As a tradeoff, Dovekey may result in a decrease in accuracy and coverage, since:

Your proposal to decompose the keys into several components, that allows bid reconstruction using simple mathematical operation, seems to be an important enhancement to the design. Let us think through how this change could be potentially introduced into Dovekey.