prebid / prebid-server

Open-source solution for running real-time advertising auctions in the cloud.
https://prebid.org/product-suite/prebid-server/
Apache License 2.0
427 stars 732 forks source link

tmax adjustments for individual adapters/bidders #3965

Open scr-oath opened 1 week ago

scr-oath commented 1 week ago

As an example need, we have a bidder that's only in one location, however we have deployed PBS around the world. We need to adjust that bidder's network latency buffer accordingly in locations that are "farther" away so that it is told the appropriate amount of time it can use to answer.

Proposal:

bretg commented 4 days ago

Thanks @scr-oath , If we supported the "secondary bidders" feature, different timeouts for secondary bidders would make sense.

But until then, I'm not real fond of having different timeouts for different bidders. PBS gets only one response to the client, so there's only one timeout value. It doesn't make sense to me that bidderA would be told 200ms when bidderB gets 300ms because their endpoint is far away. We might as well let bidderA have that extended time.

FWIW, we added region-specific timeout adjustment in https://github.com/prebid/prebid-server/issues/2398 -- the way this works specifically in PBS-Go is described in https://docs.prebid.org/prebid-server/endpoints/openrtb2/pbs-endpoint-auction.html#pbs-go-1

For instance, we've changed our network buffer in APAC because our network there isn't as good as in other reasons. But all bidders get the same tmax value.

bretg commented 4 days ago

Actually give bidders the same time out, but optionally decrement the tmax actually sent to bidders.

Slind14 commented 2 days ago

My understanding is that this is not about the PBS Adapter timeout at all. It is about signaling the external bidder how much time they have to run the auction. This signaling should consider the network latency between the originating PBS instance and the bidder.

E.g. if both are in the same region and the network latency is < 20ms, then the bidder may use 800ms, however if it is in another region with a network latency of < 200ms, then the bidder may only use 400-600ms to not be timed-out

I believe the best approach here would be for PBS to measure the pure network latency, calculate the P90 and apply that to the tmax calculation.

linux019 commented 1 day ago

The another potential issue that large timeouts >1s will increase queue of HTTP requests to bidders and number of working goroutines and active connections because PBS spawns a new goroutine for each bidder request.

scr-oath commented 1 day ago

My understanding is that this is not about the PBS Adapter timeout at all.

AGREE: https://github.com/prebid/prebid-server/blob/master/config/config.go#L1336-L1353 As I understand it, globally at the moment, there are a few settings involved in deciding the tmax sent to bidders:

  1. The entire auction has a tmax chosen through either the request or a configured max
  2. The time already taken when the bidder requests happen is subtracted off
  3. There is a configured value for "how much work/time the PBS will do/take after responses come in" that is also subtracted off
  4. A buffer for network latency is subtracted as well

The resulting tmax is intended to be the amount of time a bidder can take in the handler to respond in time to go across the network and for the auction to wrap up.

The actual timeout should be the tmax reported to the bidder + the network latency.

E.g. if both are in the same region and the network latency is < 20ms, then the bidder may use 800ms, however if it is in another region with a network latency of < 200ms, then the bidder may only use 400-600ms to not be timed-out

YES This is the crux - if a PBS server is deployed to multiple regions, but a bidder is only in one location, then network latency will be higher for them in "farther" (or slower) regions.

I believe the best approach here would be for PBS to measure the pure network latency, calculate the P90 and apply that to the tmax calculation.

This is an interesting extension to the idea - yet, while I love the idea of measuring the exact thing, I do wonder about the added complexity for something that can be mostly statically determined and tuned.

I'm curious but feel like it's a distraction, perhaps, to the feature how one might measure network latency - would this require each bidder set up a "ping" endpoint - something that does zero work but just answers a request as fast a possible - so the p99 of the resulting observed time could be dynamically set as the amount to subtract off from the tmax?