google / ads-privacy

Apache License 2.0
301 stars 53 forks source link

Scaup: How does the browser associate user lists with specific interest groups? #27

Open brodrigu opened 3 years ago

brodrigu commented 3 years ago

The scaup proposal details how an MPC powered ML model can be used to generate similar audience user-lists using user-profile data providing by ATPs, these users-lists are then used to assign TURTLEDOVE interest groups to the user/browser.

How does the browser/MPC know what the interest group is? Can the ATP provide a mapping of events / profile data that should correlate with an interest group?

p-j-l commented 3 years ago

Before I answer these would you mind if I clarify the questions that you’re asking?

Are you asking what events (e.g. conversions, clicks, views) are defined as being desirable for finding a similar audience? And how those events will be marked in the browser using some sort of Javascript API?

brodrigu commented 3 years ago

Hi @pjl-google,

After today's walkthrough in the W3C IWABG meeting, I think I have an assumption of how this would work. Can you confirm?

  1. User would visit a page, the browser communicates with an ATP to get events or interest groups to associate with the user.
  2. Browser sends these events, interest groups to the MPC servers (in some yet to be discussed private and secure protocol) for storage and use in ML models
  3. MPC servers create or update models using this new input as training data
  4. Browser asks MPCs for similar interest groups the user should be in. ~~ Users who have an event/ig vector similar to other users will get the igs of those similar users. This is sort of a black box to the ATP. The ATP can not specify the model to use that results in a given ig (at least at this time).
gangwang-google commented 3 years ago

A couple of small clarifications (in bold):

  1. User would visit a page, the browser communicates with an ATP, either directly (e.g. direct HTTP request, or ATP’s javascript on the page), or indirectly (e.g. via SSP), to get user profile data (e.g. the vertical of the current page, not interest groups) to associate with the user.
  2. Browser sends these user profile data and interest groups of which the browser is a member to the MPC servers (in some yet to be discussed private and secure protocol) for storage and use in ML models.
  3. MPC servers create or update models to predict interest groups that the user should join for the Similar Audience use case, or to predict other objectives per business need, using this new input as training data.
  4. Browser asks MPCs for interest groups the user should be in.

Users who have an event/IG vector similar to other users will get the IGs of those similar users. The ATP controls when to start the training process and when each browser should request the trained model for prediction. The ATP specifies the model type (e.g. k-NN vs Neural Network) and the model parameters (e.g. the k in k-NN, and the number of layers, size of each layer, etc. for Neural Network).

The idea is that the browser is going to ask the MPC servers which Interest Groups this particular user should be in. The MPC servers will securely evaluate all of the ML models that they have and send back a list of Interest Groups. ATPs will be in control of what training data goes into the ML model training process but they’ll be constructed and evaluated by the MPC servers.

Did that help?

Thanks, Gang Wang