Conditional permissioning of id's & data

patmmccann commented 3 years ago

Type of issue

Feature Request

In the publisher committee, it was expressed that several publishers may want to offer conditional permissioning, in which user segment and contextual segment data may be transmitted when id's are not available and vice versa, but not both together. These publishers would also prefer to specify a priority.

Closely related to #5814 and #3687 and #4472 and the concept of allowedBidders eg at https://github.com/prebid/Prebid.js/blob/f61311769823d0431cd90a05c125d88905c0555e/test/spec/modules/prebidServerBidAdapter_spec.js#L1518

One possible scheme, extending from #5814

ext: { prebid: { data: { eidpermissions: [ // prebid server will use this to filter user.ext.eids {"source": "sharedid.org", "bidders": ["*"], "suppressIdWithData" : true}, {"source": "neustar.biz", "bidders": ["bidderB"], "suppressDataWithId" : true}, {"source": "id5-sync.com", "bidders": ["bidderA","bidderC"]} ] } } } }

bretg commented 3 years ago

@patmmccann - would like to flesh out the requirements a little more. Want to be clear about the scope and what you mention as "priority". The example you provide implies that eids are always secondary - they never show up if FPD is present.

Prebid.js and Prebid Server, both of which control the data sent to bid adapters, may have data that is sensitive to publishers and their users. This data includes: User IDs, First Party Data, and specifically user segments and contextual segments within First Party Data.

1) Both Prebid.js and Prebid Server must allow publishers to control which sensitive is sent to each bid adapter. 2) User segments and contextual segments must be controlled per existing FPD controls. i.e. setBidderConfig() in Prebid.js and for Prebid Server, ext.prebid.bidderconfig and ext.prebid.data.bidders[]. Note: controlling segments in this way is an extension of what's currently protected by PBS for AMP and App, since ext.prebid.data.bidders currently only covers {site/app/user}.ext.data 3) It should be possible for a publisher to define on a per-bidder basis: A) send both FPD and IDs, B) send neither FPD nor IDs, C) send IDs only if no FPD is available and finally D) send FPD only if no IDs are available.

Questions:

Would some IDs be allowed to go to all bidders? Would the existence of those "open" IDs affect FPD?
Likewise, some FPD may exist in non-protected areas (e.g. AdUnit). Does that count as FPD for the purpose of suppressing IDs?

patmmccann commented 3 years ago

Re: Would some IDs be allowed to go to all bidders? Would the existence of those "open" IDs affect FPD?

The example above imagines suppressIdWithData being an id specific setting, but not differing per bidder, either it is completely suppressed or available.

I agree the idea of suppressDataWithId needs hashing out to which data and suppressIdWithData may need a concept of qualifying data. I was imagining any data in the user segment field, but I agree some further config may be needed to qualify it.

bretg commented 3 years ago

One could imagine that the requirements here could be complicated like "if specific FPD attributes X,Y, or Z are present, then skip ID A". My sense is that we need to keep the Prebid feature simple or it should be implemented as publisher-specific logic in a BID_REQUEST event callback.

Bid adapter modules obtain FPD and EIDs separately. EIDs is controlled per-bidder by default because they obtain it through the bidrequest:

const eids = utils.deepAccess(bidderRequest, 'bids.0.userIdAsEids');

FPD can be controlled per-bidder through the setBidderConfig mechanism. Bidders call:

pbjs.getConfig({"ortb2"});

and PBJS core will give them either the global or the bidder-specific config as appropriate.

So here's a proposal that would allow two relatively simple requirements: 1) PBS-core turns off certain bidder-specific FPD if certain eid data is present 2) The userId module turns off certain sub-modules if certain ortb2 fields are set

Proposed Details

In this approach, a publisher should avoid two-way permissions for the same bidder because it's confusing to specify both. The implication of the proposal is that FPD takes priority over EIDs if both "skip" instructions are specified.

Anything more complicated than what's provided here can be implemented by the publisher as a BID_REQUEST event callback.

Limiting FPD given EIDs

Update PBS-core to support an optional 'skipIfEid' field on setBidderConfig:

pbjs.setBidderConfig({
   bidders: ["bidderA"],  // one or more bidders
   skipIfEid: ["eid-name1", "eid-name2"], // ignore this config if the bidder is allowed to see the named eids
   config: {
      ortb2: { ... }
   }
});

When getConfig() is called from an adapter, it loops through the bidderConfig entries: 1) If it finds one relevant to this bidder a) check if it has skipIfEid and if the userId module is loaded i) If so, grab that bidder's eids from the current auction's bidrequest object ii) If bids.0.userIdAsEids contains any of the named eids, then ignore this bidderConfig entry iii) If bids.0.userIdAsEids doesn't contains any of the named eids, this entry can be used after removing skipIfEid iv) when called from the User ID module (see below), bids.0.userIdAsEids will be empty, so the FPD will be set. b) else if no skipIdEid, use this one. 2) next bidderConfig entry

The pbsBidAdapter would pass skipIfEid through on ext.prebid.bidderconfig.
Prebid Server would be modified to follow similar logic as PBJS.

Limiting IDs given FPD

Update the user ID core module to support a new skipIfOrtb2 field:

userSync: {
 userIds: [
  {
    name: "sharedId",
    bidders: [ 'bidderA', 'bidderB' ],
    skipIfOrtb2: ["site.ext.data","user.ext.data"],
    ...

Before writing to bids.0.userIdAsEids or other output, the module should: 1) check to see if skipIfOrtb2 is specified. a) if so, call getConfig({ortb2}) as if the bid adapter was calling it. At this point bids.0.userIdAsEids isn't specified, so any 'skipIfEid' will be ignored. 2) loop through the eids 3) if skipIfOrtb2 is present: a) loop through the values. If any of the named JSON paths is present in ortb2, skip this eid. b) if non of the JSON paths are present, add the eid as usual

The pbsBidAdapter passes skipIfOrtb2 through on ext.prebid.data.eidpermissions

"ext": {
"prebid": {
  "data": {
    "eidpermissions": [
      {
        "source": "sharedid.org",
        "skipIfOrtb2": ["site.ext.data","user.ext.data"],
        "bidders": ["bidderA"]
     }
 }
}

Prebid Server is updated to apply similar logic as PBJS before passing to bid adapters.

joshuakoran commented 3 years ago

We should explain how buyers can calculate value if there is no ID present. For example, prior frequency of 2 is worth x, while prior frequency is worth y. Currently the machine learning algos calculate combinations of exposure information (context, geo, time, frequency, etc.) with marketer goals to back out the value associated with an impression and inform future bidding/direct buying.

Thus, enabling engagement and suppression (like frequency capping) is important, but any impairment in buyers' value calculation will have a corresponding negative impact on publisher revenues.

ian-lr commented 3 years ago

To ensure that this control has teeth, I suggest we also consider auditing ID systems and bidders before they can quality as compatible or 'honoring' this proposal.

The risks I see are:

Bidders may collect userID or contextual information outside of Prebid-governed data channels. For example, a bidder could attempt to directly call window.ats.retrieveEnvelope() rather than read it through the eid set.
UserIDs may collect contextual information in the process of generating or recalling their ID. If the userID vendor makes such information available to bidders through a side channel, this could negate the value of Prebid controlling for ID versus context.

The use of passively-collected data elements (HTTP headers like IP address and referrer URL) should be documented by bidders and userIds. If publishers wish to technically enforce availability of these elements, we could explore use of Prebid server as a proxy that could truncate/aggregate/obfuscate/remove these elements.

slayser8 commented 3 years ago

@joshuakoran as I spoke about in the meeting today, we’ll be addressing frequency capping and measurement through first party data signals in other issues. In fact, if you’d like to help build them out those or give feedback on proposals (like the Permutive fcapping proposal I’ve been looking for feedback for months on), I’m more than happy to put you in the right direction. But for now we’ll need this paired with other proposals to insure no publisher data leakage occurs to buyers.

joshuakoran commented 3 years ago

@slayser8 Yes please! Happy to help work with whichever teams are focused on completing the virtuous circle of engagement, measurement, optimization (to reallocate budgets or adjust bid prices to higher value placements).

antlauzon commented 3 years ago

I would like to add functionality here that would allow for an RTD provider to become a permissioned, trusted third party. This approach would allow for a trusted module to handle first party data on behalf of the publisher and process it before it is handed upstream to the bid adapters.

gmcgrath11 commented 3 years ago

@anthonylauzon what is an example of such data and such processing?

antlauzon commented 3 years ago

For instance, we would like to be able to allow for a universal bid adapter fpd access exclusion that would still allow for the fpd object to be read by an real-time data adapter.

antlauzon commented 3 years ago

I would like to suggest that we use a storageManager like object for FPD. This would allow us to standardize on an interface that we can use to both query and set the data in FPD, as well as potentially allowing for some form of object-level permissioning.

bretg commented 3 years ago

standardize on an interface that we can use to both query and set the data in FPD

While this is a pretty good idea @anthonylauzon - we just did a dramatic change of the interface used by publishers, I'd like to avoid another such change.

How about instead we develop a "mergeConfig()" convenience function that makes it easier for RTD modules to inject FPD and segments?

antlauzon commented 3 years ago

standardize on an interface that we can use to both query and set the data in FPD

While this is a pretty good idea @anthonylauzon - we just did a dramatic change of the interface used by publishers, I'd like to avoid another such change.

How about instead we develop a "mergeConfig()" convenience function that makes it easier for RTD modules to inject FPD and segments?

What we are trying to push for if this proposal gets developed is module-level permissions, not just bid adapter level permissioning. My personal view is that code-level fpd permissions in the JavaScript could be messy and ineffective, but I just want to make sure that RTD modules are able to access permissioned FPD.

bretg commented 3 years ago

module-level permissions, not just bid adapter level permissioning

Fair enough. GDPR set the precedence for "modulecode" as being broader than bid adapters -- includes modules of all types.

gglas commented 3 years ago

@gmcgrath11 is this still a priority?

gglas commented 3 years ago

@patmmccann we should prioritize this in the scope of the rest of the roadmap

jdwieland8282 commented 2 years ago

This topic came up in the Identity PMC today, and we are very much in favor of adding this functionality.

abhinavsinha001 commented 2 years ago

I see this as priority based rule problem. Here is a proposed rule format which I feel can address most of the cases.

Input Sample Data

eids= ['a.com','b.com','c.com']
segments= ['dp1.com/segtax=1','dp1.com/segtax=2','dp2.com/segtax=1']
ortbobjects=['ortb2.user.buyeruid','ortb2.user.id']

Simple Priority Rule (Pass All segments and do not pass any user Ids & eids)

{
  "bidders": ["bidderA","bidderB"],
  "rules": [   
    {
      "id": 1,
      "data_preference": [
        {
          "priority": 1,
          "object": "ortb2.user.data.segments",
          "key": ["*"]
        },
        {
          "priority": 2,
          "object": "ortb2.user.eids",
          "key": ["*"]
        },
        {
          "priority": 3,
          "object": "ortb2.user",
          "key": ["id","buyeruid"]
        }
      ]
    }
  ]
}

Filtered data after rule id = 1

eids= []
segments= ['dp1.com/segtax=1','dp1.com/segtax=2','dp2.com/segtax=1']
ortbobjects=[]

Complex / Granular Priority Rule

{
  "bidders": ["bidderA","bidderB"],
  "rules": [
    {
      "id": 1,
      "data_preference": [
        {
          "priority": 1,
          "object": "ortb2.user.eids",
          "key": ["a.com"]
        },
        {
          "priority": 2,
          "object": "ortb2.user.data.segments",
          "keys": ["dp1.com/segtax=1"]
        }
      ]
    },
    {
      "id": 2,
      "data_preference": [
        {
          "priority": 1,
          "object": "ortb2.user.data.segments",
          "key": ["dp2.com/segtax=1"]
        },
        {
          "priority": 1,
          "object": "ortb2.user.eids",
          "key": ["c.com"]
        }
      ]
    },
    {
      "id": 3,
      "data_preference": [
        {
          "priority": 1,
          "object": "ortb2.user.eids",
          "key": ["*"]
        },
        {
          "priority": 1,
          "object": "ortb2.user.data.segments",
          "key": ["*"]
        },
        {
          "priority": 2,
          "object": "ortb2.user",
          "key": ["id","buyeruid"]
        }
      ]
    }
  ]
}

Rules are executed in ascending order of id

Filtered data after execution of rule id 1

Rule id 1 says data against key a.com in ortb2.user.eids object has higher priority than data for key dp1.com/segtax=1 in object ortb2.user.data.segments
Another way of interpreting this rule is publisher doesn't want to sent segments from dp1.com/segtax=1 when source a.com is present in eids.

eids= ['a.com','b.com','c.com']
segments= ['dp1.com/segtax=2','dp2.com/segtax=1']
ortbobjects=['ortb2.user.buyeruid','ortb2.user.id']

Filtered data after execution of rule id 2

Rule id 2 says (for remaining data after rule id 1 execution) data against key c.com in ortb2.user.eids object has same priority as data for key dp2.com/segtax=1 in object ortb2.user.data.segments so both can be sent together.
Another way of interpreting this rule is publisher is ok with sending segments from dp2.com/segtax=1 along with source c.com in eids.

eids= ['a.com','b.com','c.com']
segments= ['dp1.com/segtax=2','dp2.com/segtax=1']
ortbobjects=['ortb2.user.buyeruid','ortb2.user.id']

Filtered data after execution of rule id 3

Rule id 3 says (for remaining data after earlier rule execution) all data in ortb2.user.eids object has same priority as data in ortb2.user.data.segments so both can be sent together. But id and buyeruid in ortb2.user object has lower priority so it should be removed.
Another way of interpreting this rule is publisher is ok with sending all other segments along with eids but not along with any user.id / user.buyeruid

eids= ['a.com','b.com','c.com']
segments= ['dp1.com/segtax=2','dp2.com/segtax=1']
ortbobjects=[]

Rule Execution Details:

Rules will be executed based on id sequence.
Same priority within data preference means all the data would be sent.
The rules can be extended to any ortb object or RTD module data.
I propose this to be built as separate rule engine module.

jdwieland8282 commented 2 years ago

Hi @abhinavsinha001 does a none wildcard value in the value object mean those fields will be removed or kept?

abhinavsinha001 commented 2 years ago

Hi @abhinavsinha001 does a none wildcard value in the value object mean those fields will be removed or kept?

Wildcard in values is used for matching criteria - keeping / removing logic works based on priority- whichever matching values have high priority will be kept, Lower priority will be discarded. If priority is the same both will be kept.

bretg commented 2 years ago

Sorry @abhinavsinha001 , but I don't follow several aspects of the syntax you propose above.

Are all rules executed once one of them matches?
I disagree with the choice of the field name value when the values are really attributes (e.g. buyeruid) and not actually values (e.g. 39w9uauiejiou)
I might help if you annotated the 2nd example -- I'm having trouble understanding the starting conditions, what rules fired and why certain field values were removed
Why label the priority explicitly when they're in an ordered array already?

abhinavsinha001 commented 2 years ago

Sorry @abhinavsinha001 , but I don't follow several aspects of the syntax you propose above.

Are all rules executed once one of them matches?

I disagree with the choice of the field name value when the values are really attributes (e.g. buyeruid) and not actually values (e.g. 39w9uauiejiou)

I might help if you annotated the 2nd example -- I'm having trouble understanding the starting conditions, what rules fired and why certain field values were removed

Why label the priority explicitly when they're in an ordered array already?

@bretg

The rules execute in order of priority i.e rule.priority and for each rule the data is selected / filtered based on data_preference priority.
Agree - I have not refined on the naming convention maybe object & key would make more sense.
Will update to make it clear.
Relying on array sequence is not a good approach in general and is error prone hence I have added it explicitly. So while implementing the array will be sorted on priority.

bretg commented 2 years ago

Making more sense now with the text. Thanks @abhinavsinha001. But still questions:

Filtered data after execution of rule id 2

Why is c.com missing from the example output when it says "publisher is ok with sending segments from dp2.com/segtax=1 along with source c.com in eids"?

And then c.com re-appears in the example output of Rule 3. (?)

Filtered data after execution of rule id 3 But id and buyeruid in ortb2.user object has lower priority so it should be removed.

So this id and buyeruid are always removed, or only if there's any data in ortb2.user.eids or ortb2.user.data.segments?

So here's how I'm gathering the algorithm:

For each `rule` entry
  Sort datapreference elements by priority    
  // not fond of this extra browser CPU burn, but it's required if the group wants to let pubs place elements out of order with explicit priority signals

  set foundMatchPriority=0
  For each `datapreference` entry
    if foundMatchPriority>0 and element.priority > foundMatchPriority  // in the delete phase
      if element matches requestData, remove requestData
    else // still looking for a match
       if element matches data, set foundMatchPriority=element.priority

close?

abhinavsinha001 commented 2 years ago

Making more sense now with the text. Thanks @abhinavsinha001. But still questions:

Filtered data after execution of rule id 2

Why is c.com missing from the example output when it says "publisher is ok with sending segments from dp2.com/segtax=1 along with source c.com in eids"?

And then c.com re-appears in the example output of Rule 3. (?)

Filtered data after execution of rule id 3 But id and buyeruid in ortb2.user object has lower priority so it should be removed.

So this id and buyeruid are always removed, or only if there's any data in ortb2.user.eids or ortb2.user.data.segments?

So here's how I'm gathering the algorithm:
For each `rule` entry
  Sort datapreference elements by priority    
  // not fond of this extra browser CPU burn, but it's required if the group wants to let pubs place elements out of order with explicit priority signals

  set foundMatchPriority=0
  For each `datapreference` entry
    if foundMatchPriority>0 and element.priority > foundMatchPriority  // in the delete phase
      if element matches requestData, remove requestData
    else // still looking for a match
       if element matches data, set foundMatchPriority=element.priority
close?

For the c.com missing and reappearing - this was a typo fixed it.
Regardingid & buyeruid -according to rule id 3 it has lower priority so yes if either eids or segments is present then only it will be removed.

And your pseudo code logic is pretty close - modified a bit - hope it makes some more sense now :)


For each `rule` entry
Sort datapreference elements by priority    
// not fond of this extra browser CPU burn, but it's required if the group wants to let pubs place elements out of order with explicit priority signals

set maxMatchPriority=99 // 1 means highest priority 99 means lowest
For each `datapreference` entry
if object.key exists 
 if maxMatchPriority < entry.priority  // if current data priority is lower delete it
    remove object.key from requestData
 else 
  set maxMatchPriority=entry.priority
else                                  // no matching object&key found so continue
  continue

bretg commented 2 years ago

Thanks for the clarifications Abhinav. Frankly I think this syntax is going to be hard to learn, write, and debug, but it's up to the the identity committee who will be documenting and supporting it.

dgirardi commented 2 years ago

@abhinavsinha001 what is the effect of rule 2 in your example? if I try to run the pseudocode in my head, it has no effect?

abhinavsinha001 commented 2 years ago

@abhinavsinha001 what is the effect of rule 2 in your example? if I try to run the pseudocode in my head, it has no effect?

@dgirardi Yes correct it doesn't do anything - just wanted to showcase the behaviour where if object rules are at the same priority all the objects are retained / compatible and doesn't have any effect unless there is a lower priority object rule. The actual filtering is showcased in rule #3.

dgirardi commented 2 years ago

I'm going to float an alternate proposal to see how it lands: if we define a way to encode simple predicates in JSON, we could offer it as a control over each item in ortb2. In pseudo-pseudocode,

function filter(ortb2, path, predicate) {
   Object.entries(ortb2).forEach(([key, value]) => {
       if (predicate({ortb2, path, key, value})) {
            delete ortb2[key];
       } else {
         if (isObject(value)) {
            filter(value, [...path, key], predicate);
         }
       }
  })
}

I was hoping to find a lightweight, JSON-friendly expression language ready to use for the predicate, but I had no luck. I came up with (inspired by cloudformation condition expressions):

{
  "bidders": ["bidderA", "bidderB"],
  "filter": {
    "OR": [
      {
        "AND": [
          {
            "EQ": [
              {"REF":  "path"},
              ["ortb2", "user", "data", "segments"]
            ]
          },
          {
            "EQ": [
              {"REF":  "key"},
              "dp1.com/segtax=1"
            ]
          },
          {
            "EXISTS": {
              "REF": ["ortb2", "user", "eids", "a.com"]
            }
          }
        ]
      },
      {
        "AND": [
          {
            "EQ": [
              {"REF":  "path"},
              ["ortb2", "user"]
            ]
          },
          {
            "IN": [
              {"REF":  "key"},
              [
                "id",
                "buyeruid"
              ]
            ]
          },
          {
            "OR": [
              {
                "!EMPTY": {
                  "REF": ["ortb2", "user", "eids"]
                }
              },
              {
                "!EMPTY": {
                  "REF": ["ortb2", "user", "data", "segments"]
                }
              }
            ]
          }
        ]
      }
    ]
  }
}

This is meant as an example recreating the same rule as in @abhinavsinha001's example above; the idea is to have something that can easily be "compiled" to a single predicate, in this case the equivalent of:

function predicate({ortb2, path, key, value}) {
  return (
    deepEquals(path, ["ortb2", "user", "data", "segments"]) && 
    deepEquals(key, "dp1.com/segtax=1") && 
    ortb2["user"]["eids"]["a.com"] != null
  ) || (
    deepEquals(path, ["ortb2", "user"]) &&
    ["id", "buyeruid"].includes(key) &&
    (Object.keys(ortb2["user"]["eids"]).length !== 0 || Object.keys(ortb2["user"]["data"]["segments"]).length !== 0)
  )
}

The advantage would be that predicates are a more general concept and more likely to be applicable elsewhere; they are also (I believe) more intuitive for people with some technical background.

The disadvantage is that to keep the "compilation" lightweight, the expression language might be too unwieldy. I'm sure that smarter people have come up with solutions prettier than mine - if you know of any bring them up!

abhinavsinha001 commented 2 years ago

@dgirardi Thanks for proposing this generic rule template.

I started with something very similar but did not want to have a full blown rule engine development & processing overhead(as you have already called out). Hence for this use case suggested a custom trimmed down priority based version that we can quickly get started with and addresses all requirements.
In the long run a generic rule engine definitely makes sense if we want to support very complex cases in future across other use cases as well.

dgirardi commented 2 years ago

@abhinavsinha001 There's a tradeoff between language pleasantness and "compilation" complexity / overhead. The example I gave above would be all the way on the "ugly but fast" end of it, I don't think it'd be significantly heavier than the more ad-hoc system you proposed. But it's definitely more time consuming to read and write for the publisher - if you ignore the learning curve, which is what I am not sure about. My sense is that if you have any technical training at all, you can start writing predicate expressions very quickly - you don't need to learn about domain specific priority groups. But I don't know how technical of an audience this is likely to find.

jdwieland8282 commented 2 years ago

This is the proposal that has come out of the Privacy and Identity PMC, posting here for for comments.

jdwieland8282 commented 2 years ago

DataController feedback from the Taxonomy PMC and Identity PMC has been merged. Please add additional comments to the doc.

prebid / Prebid.js