patcg-individual-drafts / hybrid-proposal

1 stars 1 forks source link

Tracking users via sybil impression/destination/reporting sites #1

Open Stebalien opened 1 month ago

Stebalien commented 1 month ago

NOTE: I'm asking about this document. If I'm in the wrong repo, please point me in the right direction.

I'm grappling with what I think is a fundamental issue and trying to understand how IPA/PPA solves it. Specifically, the following tradeoff:

  1. There is some protection against sybil impression/destination/reporting sites: An attacker can only spend their own privacy budget without affecting the rest of the system.
  2. The system can be saturated: An attacker can spend everyone's privacy budget.
  3. The system is not private: An attacker can exceed the "safe" privacy budget by combining information from multiple sybils.

This is possibly related to https://github.com/patcg-individual-drafts/ipa/issues/57, but I'm wondering how that was solved in PPA. Specifically, I'm concerned about a party that can pretend to be a large number of:

  1. "sources" (website where you can see an ad).
  2. "reporting sites" (advertisers).
  3. "destinations" (website where, e.g., you buy something).
  4. "user agents" (browsers).

Such a party could:

  1. Define $N \times W$ impressions from distinct sources and with distinct destinations/reporting sites. Let's say that the first $W$ of these are the "baseline" impressions ($M_0$) and that there are $N-1$ sets of "marker" impressions ($M1 .. M{N-1}$).
  2. Save all impressions $N \times W$ on a bunch of sybil browsers to get above the reporting threshold with some probability (I assume there is such a threshold?).
  3. Whenever a user visits a target "from" website, use iframes, redirects, etc., save some user-specific $M_i$ marker impressions from $W$ distinct sources (with $W$ distinct destinations and $W$ distinct reporting sites).
  4. Whenever a user visits a target "to" website, use iframes, redirects, etc., request "conversions" from all possible $N \times W$ impressions (using $N \times W$ distinct destination sites and reporting sites).

Given a sufficiently large $W$, no amount of noise/differential privacy can hide the fact that the signal for $M_i$ is stronger than the signal for $M_0$. What am I missing?

bmcase commented 1 month ago

@Stebalien thanks for opening this issue!

NOTE: I'm asking about this document. If I'm in the wrong repo, please point me in the right direction.

The document you mentioned does have a new repo for keeping track of the design and issues: https://github.com/patcg-individual-drafts/hybrid-proposal In the PATCG we refer to this as the "hybrid proposal" of several preceding proposals. It would be great if we could move the discussion there.

The issue you describe and set of tradeoffs is indeed an interesting challenge to address and similar to how you describe comes fundamentally from site fragmentation of the base DP-privacy guarantee. Section 2 and Section 5 in this document DP Budgeting for Hybrid Proposal discusses this issue at more length and explore more best effort methods to mitigate it. In your described scenario specifically, a cross-site rate limiter with a short time window would provide a line of defense against registering one event with many redirects as if it happened on many sites.

The issue in patcg-individual-drafts/ipa#57 for IPA with a matchkey provider was more pronounced and not easily mitigated. With the hybrid proposal, site fragmentation of the privacy guarantee still needs to be addressed but there are many more tools at your disposal since privacy budgeting is done on the device.