prebid / prebid-server

Open-source solution for running real-time advertising auctions in the cloud.
https://prebid.org/product-suite/prebid-server/
Apache License 2.0
425 stars 720 forks source link

Support video cache #324

Closed bretg closed 6 years ago

bretg commented 6 years ago

Behavior:

The cacheMarkup flag causes PrebidServer to cache the bids and remove the creative body from the response to the client.

The PrebidServerBidAdapter places cacheMarkup at the global s2sConfig level.

Problem:

When the same web page contains both video (cached) and banner (uncached), the s2sConfig.cacheMarkup flag has to be changed in between requestBids() calls.

Proposed solution:

Define cacheMarkup per media type.

This allows for more control over when Prebid Server caches and responds with hb_cache_url and hb_cache_id. The client should to be able to tell the server what it wants to do for responses of different media types.

The proposed mechanism is to add a prebidServer-options extension to the bidRequest to allow the client to define what should happen for each mediaType. E.g.

bidRequest: {
  ext: {
     prebidServer: {
         options: {
             bidCaching: {
                 banner: {
                      cacheMarkup: false
                 },
                 video: {
                     cacheMarkup: true
                     // could be more options in the future, e.g.  ttl: 300 and cacheScope: [all|markup]
                 }
             }
         }
     }
   }
}

When cacheMarkup is true for the given bid response based on mediaType, Prebid Server caches and responds with hb_cache_url and hb_cache_id.

There could also be a default cacheMarkup value for each mediaType:

bretg commented 6 years ago

@mkendall07 and @dbemiller - we've discussed this internal to Rubicon, and are willing to build it if you guys sign off on the concept and design.

dbemiller commented 6 years ago

Yes, we definitely do need this feature. I also agree it belongs in the bidRequest.ext.

There's one issue I know of, and I have a few questions about the motivation.

Backwards compatibility

I tried to get peoples' attention on #216... but there was no response from Rubicon, so we just discussed use-cases offline and started building it. I kept commenting in that thread, so you can follow the discussion if you're interested. Cacheing the VAST was discussed. A full-blown "cache by mediaType" feature was not. Since Prebid Mobile has already been updated to use bidRequest.ext.prebid.cache.bids = {}, we can't really break things for them anymore.

The good news is that I knew a feature like this was coming, so we can add support backwards-compatibly. For example, we could extend it to something like this:

bidRequest: {
  ext: {
     prebid: {
        cache: {
           bids: {} // This becomes optional, but one of bids or markup would be required.
           markup: {
             video: {
                // could add options in the future, e.g.  ttl: 300 and cacheScope: [all|markup]
             }
           }
         }
      }
   }
}

Motivation

I have a few questions about these I strongly believe that APIs should be extensible for future use-cases... but I also strongly believe that we shouldn't expose options which aren't worth the documentation burden, learning curve for our users, and time it takes to debug broken page setups.

With that in mind:

  1. Is there any reason that a pub would want to cache banner JSON, but not video or native? The use-case I'm aware of is Mobile, which wants to cache all Bid data.

  2. Is there any reason to cache just the ad markup for ad types besides video? The use-case I'm aware of is video players which require a URL which returns only VAST content. If there are no others, then we shouldn't include options like bidCacheing.banner.cacheMarkup.

  3. Is there a chance that pubs might need to cache the ad markup and full bid JSON? If there's any shadow of uncertainty here, we should use a different targeting key (for example: hb_cache_id for full bids, and hb_cache_adm_id for just the ad markup) so that we haven't shot ourselves in the foot later.

moonshells commented 6 years ago

Use case is important, it shows what functionalities we should provide. We may need to collect it from all bidders for all media types and channels. Only after that we know which level of flexibility/API/config we should expose as we are designing Prebid, then we create industry standard. For cache, the design may include what should Prebid do if there are multiple bidders and they may or may not want to use different cache endpoints.

dbemiller commented 6 years ago

The use-cases that I know of are:

  1. Cache the JSON for the highest bid for each Bidder in each Imp. (for mobile)
  2. Cache the VAST XML for all video bids which are the highest for each Bidder in each Imp. (for web).

It sounds to me like my (2) describes this issue too.

These are some concrete enhancements which we think we'll want in the future, but aren't prioritized or on our roadmap to implement anytime soon:

  1. Cache the JSON for all Bids involved in a Deal (for mobile).
  2. Cache the VAST XML for all Video Bid involved in a Deal (for web).

I'm also suspicious that we may need to cache both VAST XML and JSON, for Mobile Video. Nobody is able to confirm or deny, so... in the interest of flexibility, I consider it a possibility.

From @bretg's comment above, it sounds like there's another possible case:

  1. For any cached bid, let the publisher set the cache TTL.

If there are others, it's better to know sooner rather than later. There are never any guarantees, but it minimizes the likelihood that we'll find ourselves having to make a breaking change (or live with an ugly API) in the future.


This is the API design process which I've found works pretty well:

  1. Design an "idealized" API which covers all the known use-cases. If that's impossible, prioritize them by likelihood & importance and cover as many as you can.
  2. Consider which parts of that API would be optional params. These often overlap with the "potential" use-cases, because any critical use-cases would be implemented immediately.
  3. Implement and document an API which supports only the required params + any "MVP-worthy" optional ones.

Although there are never any guarantees, these steps minimize the likelihood of deprecated options and breaking changes in the future.

The best "idealized" API I could come up with in #216 (and tweaked to support the TTL case added here) is something like this:

{
  // If bidrequest.prebid.cache exists, then one of vastxml or bids should be defined.
  // If the user wanted to cache nothing, then they should omit bidrequest.prebid.cache
  bids: {
    // Both optional
    deals: false
    ttlseconds: 300
  },
  vastxml: { // "vastxml" name intendeds to signal that this only caches Video bids. Better names may exist
    // deals & ttlseconds exist here too
  }
}

If we like that, the next step would be to publicize this:

{
  // Only one of bids or vastxml is required
  bids: {},
  vastxml: {}
}

However... this design has a flaw (only one that I can see). If "cache bids and vastxml" will never be a valid use-case, then it offers too much flexibility. If a user asks us to cache both ways, then the hb_cache_id field is ambiguous. Does it point to the VAST, or the Bid JSON?

If we're certain that "cache bids and vastxml" will never be a future use-case, then my idealized API is kinda crappy and we should rethink it. If we're not certain, then we'll need to return vastxml IDs in a separate hb_* key so that we don't avoid that ambiguity.

moonshells commented 6 years ago

Make a lot of sense. I don't know what Mobile video wants to cache either. So far your "idealized" API should be the best solution. Another question, now we are using both cache.url and cacheMarkup, will this be an issue for DFP to find out the cache_url of the winner bid when there're multiple bidders?

dbemiller commented 6 years ago

Let's assume that we may need to cache both XML and Bid JSON for the same bid someday.

Make request.ext.prebid.cache accept this:

{
  // Only one of bids or vastxml is required
  bids: {},
  vastxml: {}
}

Today, we support the bids arg (and it's required). If present, we return an hb_cache_id targeting key to the highest overall bid for each Imp. We also return an hb_cache_id_{bidder} key for the highest Bid made by each Bidder in each Imp.

For this feature, we would add vastxml, make bids optional (so that only one of the two is required on each request), and return hb_vast_id and hb_vast_id_{bidder} targeting keys using the same conditions as above. The only exception is that these keys would only appear on the highest bid if it's a video bid, obviously, since other bids don't have VAST to cache.

dbemiller commented 6 years ago

The original comment from @bretg said:

This allows for more control over when Prebid Server caches and responds with hb_cache_url and...

I'm not sure what he means, though.

To be clear: hb_cache_url is not a targeting param today, in the openrtb endpoint or the legacy one. The legacy endpoint does return the cache URL in the JSON response... so maybe that's what he meant. If that would be helpful in OpenRTB, let's add a response.seatbid[i].bid[j].ext.prebid.cache object like this:

{
  bidurl: "url/to/full-bid.json",
  "vasturl": "url/to/vast-markup.xml"
}
hhhjort commented 6 years ago

That is a good point about where the cache URL info can be stored. If information does not need to be stored in targeting parameters, then best to keep it out to keep the parameters from being overloaded. I think there is a limit to the maximum number of bytes that can be sent as targeting parameters, so excessive use may eventually hit that wall.

hhhjort commented 6 years ago

So currently I am thinking the cache API would look something like this:

"cache": {
    "bids": {}
    "vastxml": {}
}

By default bids will put cache IDs into the targeting parameters, while vastxml will just include it somewhere in the response. I see room within bids and vastxml for optional flags to turn supplying the cache parameters in the targeting parameters on/off, if such a use case ever arises. More likely, force_cache as an addition to vastxml, as I am thinking by default if the bidder supplies a VAST URL in bid.nurl (needed if bid.adm is omitted) we just pass that URL along as the VAST URL.The force_cache parameter will force prebid server to call and cache the bid.nurl if that is where the VAST lives, rather than just passing the original URL.

Caching the VAST from bid.nurl is problematic as that is also supposed to be the win notification. We must call it in order to cache it, but we cannot guarantee that that bid will be the eventual winner. So that will generate reporting discrepancies as the bidder will record a win while the final adserver chooses some other ad to deliver. So I would not want to support force_cache if a strong use case does not exist for it.

moonshells commented 6 years ago

@dbemiller I totally agree with your design: for this feature, build bids args with hb_cache_id and vastxml with hb_vast_id. And how about the legacy endpoint, should we replace cacheMarkup with something like cacheBids and cacheVastXml? How should we implement this feature?

dbemiller commented 6 years ago

I don't care about the legacy endpoint at all. IMO our goal should be to encourage pubs to upgrade prebid.js versions and stop using it so we can delete it all from the project ASAP. If you find value in it and want to update the legacy API, I'll approve just about anything that doesn't violate the prebid code of conduct, doesn't have egregious performance drawbacks (no double-cache calls), and is written cleanly enough that I can tell it won't segfault :).

Caching the VAST from bid.nurl is problematic as that is also supposed to be the win notification. We must call it in order to cache it, but we cannot guarantee that that bid will be the eventual winner.

This is a really good point.. wasn't even on my radar for this project. In prebid.js we worked around this by making our own VAST which wraps that NURL. You can see how it's done here. If we need to cache a bid with a NURL, we should do the same in PBS.

I am thinking by default if the bidder supplies a VAST URL in bid.nurl (needed if bid.adm is omitted) we just pass that URL along as the VAST URL.

This is a great thought, but... unfortunately I think we need to cache these too. I'm told that DFP forces pubs to hardcode the domain which will be serving their video creatives. NURLs can come from any domain, so... we still need to cache something and return a UUID for them in the targeting keys.

moonshells commented 6 years ago

383

bretg commented 6 years ago

(Picking up this thread)

This covers the known use cases:

We're going to start building out this feature in Prebid-Server Java as defined here, and the prebidServerAdapter updates will follow. But we need to figure out what changes to make there. How about we support passing arbitrary ext.prebid objects through s2sConfig?

pbjs.setConfig("s2sConfig": {
   ...
   openrtbExt: {
      "cache": {
        "vastxml": {
             ttlseconds: 300
         }
      }
   }
   ...
});

Since a full json-merge would probably be too expensive in the browser, I think we can just copy any objects found in s2sConfig.openrtbExt into bidRequest.ext.prebid. Don't believe pbsAdapter sets anything in request.ext.prebid right now.

This design would allow a page to specify other PBS extensions as well:

dbemiller commented 6 years ago

hmm... I think that API is good from a PBS side, but should point out: TTLs are tricky, because everyone has some stake in them.

So... if we put a ttlseconds option in the API, I do think we need to be clear that it should be considered a "best effort" request, and not a guarantee. Other than that it looks great though.

As for the s2sConfig... that's probably a discussion to take to the Prebid.js project?

I liked your idea a lot at first glance... but after thinking about it a while, it definitely has some warts that I would point out. It mostly affects the Prebid.js community, though, so it's probably better had over there in a design issue.

bretg commented 6 years ago

Getting good questions from the engineering team that's forced me to re-evaluate my position on targeting variables. Right now we have hb_cache_id for mobile and hb_uuid for video. I had wanted to get us down to one: hb_cache_id, but this design pushes us back to two targeting variables.

Here's a summary of the PBS pseudo-code as I'm understanding it from above:

if request.ext.prebid.cache.bids object is specified:

if request.ext.prebid.cache.vastxml object is specified:

Notes:

dbemiller commented 6 years ago

That's a fair summary... and yes, I agree that hb_uuid makes more sense. It's a lousy name... but no reason to make people rewrite all their line items for it.