prebid / prebid-server

Open-source solution for running real-time advertising auctions in the cloud.
https://prebid.org/product-suite/prebid-server/
Apache License 2.0
433 stars 742 forks source link

Prebid Caching #663

Closed bretg closed 5 years ago

bretg commented 6 years ago

This is a proposed set of additions around server-side caching that affects Prebid Server, Prebid Cache, Prebid.js, and Prebid SDK.

Background

Several Prebid use cases require that ad response creatives be cached server-side and subsequently retrieved upon Prebid’s bidWon event. Client-side caching is effective for the standard use case of client-side bidAdapters competing for web display ads through Prebid.js. Other integration types such as Prebid for Mobile, Prebid Video, and Prebid for AMP either cannot use client-side caching, or pay an undesirable performance penalty to do so.

Prebid's cache offering is the Prebid Cache server, which works alone and in conjunction with Prebid Server to implement some caching use cases.

Use Cases

Scenarios supported by this set of requirements:

  1. As a web publisher, I want to be able to use Prebid.js to serve video ads using a mix of bidders that support server-side and client-side caching of VAST XML. I want to be able to define the TTL when stored from the client so certain adunits (e.g. longer videos) may have custom TTL values.
  2. As a web publisher, I want to be able to use Prebid.js to serve video ads via Prebid Server, with the ability to define caching behavior.
  3. As an app developer, I want to be able to use Prebid SDK and Prebid Server to implement header bidding and minimize network traffic by utilizing server-side caching. I don't want the creative body to be returned in the result in order to save my user's network bandwidth and speed my application performance.
  4. As an operator of a prebid server cluster, I want to be able to host multiple independent datacenters in a region to provide fault tolerance.

New Requirements

These are features not currently supported by the Prebid caching infrastructure.

  1. The system should allow the publisher to define what gets cached in each supported scenario: either the whole bid or just the creative body.
  2. The system should allow the publisher's request to define whether the creative body (adm) should be returned even when cached. The default should be 'yes', because that's the current Prebid behavior.
  3. A full URL to the cached asset should be returned in each bid response. These attributes should be made available to renderers in all cache scenarios, including from the prebidServerBidAdapter
  4. The page should be able to specify an ad cache time-to-live (TTL) for each AdUnit. This is because some adunits may require longer cache periods than others. E.g. one customer wants to have a video unit where the VAST XML is cached for an hour while the default is 5 mins.
    1. Max TTLs should be configurable for each cache host company.
  5. Separate system mediaType default TTLs must be specifiable by the Prebid Server host company for video and for mobile. The hard coded system default TTL should be 300 seconds (5 mins) for both.
  6. The caching system should allow each publisher to be able to define their own TTL values by mediaType that override the system defaults.
  7. Prebid Server should use TTLs in this priority order:
    1. Request-specified TTL (e.g. this particular adunit has a TTL of 90 mins) (subject to configured Max TTL)
    2. Publisher mediaType configured TTL (e.g. all video for this publisher has a TTL of 60 mins) (server config)
    3. Format configured TTL (e.g. video on this cluster generally has a TTL of 30 mins) (server config)
    4. Hardcoded system default TTL (e.g. 5 min overall default) (server config)
  8. Operational reporting: Prebid Cache should log failed cache-writes and failed cache-reads as metrics.
  9. The system should support writing to multiple Prebid Cache servers. This enables operational redundancy so the same cache ID can be read from a cluster that didn't necessarily host the auction request. It would be better to do this with a distributed cache system, but this option could be useful for Prebid Server host companies.
    1. The HTTP return code should be for the primary local cluster write.
    2. Failures writing to a secondary cluster should be logged as a metric and to the local log file.
  10. Prebid Server should return an additional key value pair when an item was cached: hb_cache_hostpath. This value should be configurable for each cluster. It could be used by the Prebid Universal Creative to parameterize the cache settings for better portability.

Security

Security requirements for caching:

  1. The system should attempt to prevent specific cache IDs be written from unauthorized sources. The goal is to prevent an attack where malware is inserted into the cache on a valid key that might be retrieved by a user.
  2. The system should be able to detect suspicious cache write behavior, such as one client inserting a large number of entries.
  3. All cache writes and retrievals should be done over HTTPS.

Proposed OpenRTB2 request and response

Request extensions:
{
…
  "imp": [{
      "exp": 3600,    // openRTB location for request TTL
      ...
  }],
  "ext": {
    "prebid": {
      "cache": {
        "vastXml": {
          returnCreative: false,   // new: don't return the VAST, just cache it
        },
        "bids": {
          returnCreative: true, 
        }
      }
    }
  }
… 
}

Response extensions:

{
…
  "seatbid": [{
    "bid": [{
      …
      "ext": {
        "bidder": {
          ...
        },
        "prebid": {
          "targeting": {
             …
             "hb_cache_hostpath": "prebid.adnxs.com/pbc/v1/cache"
             … 
          },
          "cache": {
             "vastXml": {
                 "url": "FULL_CACHE_URL_FOR_THIS_ITEM",
                 "cacheId": "1234567890A"
             },
             "bids": {
                 "url": "FULL_CACHE_URL_FOR_THIS_ITEM",
                 "cacheId": "1234567890B"
             },
           }
         }
       }
    }
  }],
… 
}

Proposed Prebid.js Configuration

Prebid.js needs to be updated to allow the publisher to specify caching parameters. Suggested config:

pbjs.setConfig({
  "cache": {
    url: "https://prebid-server.pbs-host-company.com/cache",
    ttl: 300
  },
  "s2sConfig": {
    …
    "video": {           // new format selector
      "ext.prebid": {    // merged into the openRTB2 request
        "cache": {
          "vastXml": {
            returnCreative: false
          }
        }
      }
    }
    …
  }
});

Appendix - Changes to current systems

If all requirements above are to be implemented these are the changes that would be required.

Prebid.js - better support for s2s video header bidding

Prebid Cache

Prebid Server

Prebid SDK - in a server-side caching scenario

Prebid Universal Creative

(Note: async caching feature split out into https://github.com/prebid/prebid-server/issues/687)

bretg commented 6 years ago

Updated response example

bretg commented 6 years ago

Made a number of updates after feedback from AppNexus team:

Going to discuss the "two-endpoint" architecture with the team tomorrow.

bretg commented 6 years ago

Got feedback from another internal review that the ttl parameter on the PBC query string is unnecessary -- it's already supported within the protocol packet. So the proposal is to update PBJS to take a ttl argument on the cache object in setConfig and add it appropriately to the cache request.

dbemiller commented 6 years ago

Could you give more details on what you mean by "within the protocol packet?"

bretg commented 6 years ago

Followup on the "two-endpoint" architecture. We've confirmed that both Redis and Aerospike support a mode where a given key can't be overwritten, and that performance of this mode is good. There's a slight cost (~10%). The proposal is that we make this feature configurable so PBS host companies can make the tradeoff between security and performance. So we don't intend to split out the uuid-specification feature to a separate endpoint -- instead, added requirement 21:

  1. The cache server should also have a configuration which defines whether uuid is accepted as a parameter. The general idea is that a PBS cluster will run in one of two modes: either the caching layer prevents cache entries from being overwritten or the cache won't accept UUIDs on the request, which disables the 'asynchronous cache' feature.
bretg commented 6 years ago

Could you give more details on what you mean by "within the protocol packet?"

Apparently Prebid Cache Go and Prebid Cache Java have diverged more than I realized. The Java version supports an 'expiry' attribute on the POST. And a uuid key.

dbemiller commented 6 years ago

The proposal is that we make this feature configurable so PBS host companies can make the tradeoff between security and performance.

Imagine the experience of a publisher who wants to switch PBS host companies, or one who starts out running PBS themselves and decides to use a host company instead because it's more trouble than it's worth.

Imagine a publisher trying to read documentation to figure out how to use PBS, if the behavior depends on configs that they can't even see, or which a host company might change at any time without their notice.

This seems like a bad idea for everyone involved.

bretg commented 6 years ago

Here's the proposed story:

This does not appear to be an unreasonable or unworkable situation.

Having a two-VIP architecture adds fairly significant complexity in setup and debug. So it would only be utilized by PBS host companies that want to support asynchronous caching. So really it comes down to what sort of complexity is required to support asynchronous caching:

1) two-vip solution

Both cases require configuration, but #2 has fewer moving parts to break.

bretg commented 6 years ago

We do need to address the divergence between PBC-Go and PBC-Java. More on that in a separate conversation.

dbemiller commented 6 years ago

Might be a good idea to break this proposal into smaller pieces. Many parts of it are good ideas no matter what... but there's a lot to discuss about this async one.

Our consensus over here as basically: "let's run an experiment." Config & publisher-facing options are great if there are legitimately good reasons to make different choices... but they're horrible if one way is just "better".

My intuition here is that async would just be better across the board... but intuition counts for much less than concrete math or experimental data.

If you're open to this, I can open a new issue for it and we can discuss in more detail.

bretg commented 6 years ago

Yes, we can leave the async caching feature aside for now.

Have split out the relevant requirements into a separate issue -- https://github.com/prebid/prebid-server/issues/687

dbemiller commented 6 years ago

Min and max TTLs should be configurable for each cache host company.

Max TTL config makes sense because host companies have hardware capacity... but what's the use-case for min TTLs?

The caching system should allow each publisher to be able to define their own TTL values by mediaType that override the system defaults.

The publisher already has per-AdUnit cache control in (4)... so this introduces a data redundancy in the request.

I see how this would be a convenient option for publishers... but it's worth noting that the Prebid Server API isn't really publisher-facing. Publishers use PBS through Prebid.js, and edit Stored Requests through a GUI.

Prebid Server should use TTLs in this priority order:

Asking for clarification: where do you see the configs the host company sets in this hierarchy?

Our opinion was that the "max TTL" config took precedence over everything, because only the host company knows what their hardware can support.

hhhjort commented 6 years ago

Adding some support for reading exp from the imp and bid, and sending a ttl to prebid cache appropriately. Short term this will help optimize cache utilization. #684

bretg commented 6 years ago

what's the use-case for min TTLs?

It doesn't make sense to cache for less than a couple of seconds - it's an edge case, but the idea is to avoid read misses.

where do you see the configs the host company sets in this hierarchy?

Most of them are host company configs

The idea behind PBS account-level config is that overrides will be rare and can be supported as config by the PBS host company for important accounts.

Also - updated the cache response to be able to carry cache urls for both vastXml and bids. This accounts for the use case where both are requested.

hhhjort commented 6 years ago

Since we have stored requests, I am not sure that publisher level default TTLs are that needed. The stored requests do provide an even granular control, with the downside that it must be set per stored request rather than simply per media type. I am not against it per se, but would rather wait and see if there is a demand before adding it preemptively.

There is also the issue of adding too many controls on the TTL. The more rules we have as to how to set the TTL, the more difficult it becomes to debug why the cache expires when it does. And of course the system needs to run through the entire logic tree to determine the actual TTL on every cache request, which can eat up resources and latency.

For min TTL, I think it may be better to just let the ads fail to cache and have the issue caught quickly, rather than trying to second guess what the publisher meant. For example, let us say that we have a default TTL of 5 minutes, but the publisher realizes it can sometimes take a bit more than 5 minutes before the cache call is made. They want to bump it up to 10 minutes, but accidentally set it to 10 seconds instead. Now if we had a min TTL of 2 or 5 minutes, that TTL might still be enough to get the majority of the publisher's calls. But it could lead to a lot of confused debugging as they try to determine why the bump in TTL did not improve the cache performance, and perhaps degraded it. If however we let the 10 second TTL stand, they should recognize and catch the issue fairly quickly, and get the TTL they actually want in place much sooner.

dbemiller commented 6 years ago

Most of them are host company configs

Yeah... sorry, I wasn't clear. I meant to ask about the Max TTL allowed by PBS host. You listed it as a requirement in (4), but it wasn't clear where it sat in the hierarchy of (7).

It seems to me like that should take the highest precedence, since otherwise it's a hardware liability for the PBS host.

bretg commented 6 years ago

Here's the pseudo-code implemented by PBS-Java

if imp.exp then use that or configuredMaxTTL 
else if ext.prebid.cache.*.ttlseconds then use that or configuredMaxTTL 
else if account ID is available from request.app.publisher or request.site.publisher
       if mediatype config for the account is set up, use that
else if mediaType config for the cluster is set up, use that
else, finally, just use the default

Here are the server config values in the PBS-Java PR

It may be reasonable to place the account-level values in the Accounts DB table at some point, but for now we don't envision these values changing much, don't really want to encourage non-standard timeouts, and reading/caching/updating DB entries is harder than static config.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.