private-attribution / ipa

A raw implementation of Interoperable Private Attribution
MIT License
41 stars 23 forks source link

DP Padding #1225

Closed bmcase closed 2 weeks ago

bmcase commented 1 month ago

PR to add DP padding for both OPRF and the new aggregation.

Right now the code sets parameters internally to PaddingParameters::default() which are moderate DP parameters. If we would like to set the parameters at run time we can add to the query config in a future PR.

Additional parameter sets are used for some tests:

bmcase commented 1 month ago

With draft:

However, the added rows by default seem to be majorly slowing down the CI tests, so will need to scale that back so they don't take so long.

Also created a table of parameters (in the test code) and printed the results here for expected number of dummies for different parameter sets.

bmcase commented 3 weeks ago

Working on updating the approach for DP padding to add separate batches of dummies for OPRF and then right before Breakdown Reveal Agg. To do so working on introducing a Paddable trait so we can reuse as much of the code in either case. Have this doc to summarize the approach, but having issues with trait bounds. @benjaminsavage can you take a look at this new direction? Latest code on the PR but this doc summarizes. https://docs.google.com/document/d/1heIeBE1lMSnmRGAUoopmDuOnKLkR1-_CA9EpqcbPdug/edit#heading=h.f83tb89xflfw

bmcase commented 3 weeks ago

Re @andyleiserson got email of comment but can't find to reply inline.

On ipa-core/src/protocol/ipa_prf/oprf_padding/insecure.rs:

Maybe OPRFPaddingDp and Error should move out of this file, if they're used for the secure mode of operation? (It may make sense to do that as a follow-up, though.)

Yes, I agree it may make sense to move some code around and do some renaming. Would probably put moving into a new PR to make it easier to review and would do a few things

  1. rename the folder oprf_padding to something like padding_dp and move out of ipa_prf since also used for aggregation.
  2. I'm not surer why insecure.rs is called that, but would move the Gaussian code out of it and into dp and move the OPRFPaddingDP code into distributions.rs.
  3. Will also rename OPRFPaddingDP to PaddingDP since used for both OPRF and Breakdown Reveal Agg.
codecov[bot] commented 3 weeks ago

Codecov Report

Attention: Patch coverage is 85.88771% with 93 lines in your changes missing coverage. Please review.

Project coverage is 92.62%. Comparing base (108119f) to head (3000b74). Report is 1 commits behind head on main.

Files Patch % Lines
ipa-core/src/protocol/ipa_prf/oprf_padding/mod.rs 83.45% 91 Missing :warning:
ipa-core/src/protocol/ipa_prf/oprf_padding/step.rs 50.00% 1 Missing :warning:
ipa-core/src/test_fixture/ipa.rs 85.71% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1225 +/- ## ========================================== - Coverage 92.74% 92.62% -0.12% ========================================== Files 198 200 +2 Lines 30780 31409 +629 ========================================== + Hits 28546 29093 +547 - Misses 2234 2316 +82 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

andyleiserson commented 3 weeks ago

Re @andyleiserson got email of comment but can't find to reply inline.

Sorry about that. I accidentally clicked the button to post a single comment rather than start a review. Then I deleted the comment and re-posted it as part of the review.

The code rearrangements you list make sense to me, and I agree it would be better to do in a separate PR.

bmcase commented 2 weeks ago

Ran draft runs with the default DP parameters

        OPRFPadding::Parameters {
            oprf_epsilon: 5.0,
            oprf_delta: 1e-6,
            matchkey_cardinality_cap: 10,
            oprf_padding_sensitivity: 2, // should remain 2
        }
          AggregationPadding::Parameters {
            aggregation_epsilon: 5.0,
            aggregation_delta: 1e-6,
            aggregation_padding_sensitivity: 10, // for IPA is most natural to set
                                                 // equal to the matchkey_cardinality_cap
        }

No really impact on draft run performance.

Branch
Query input size # rows for oprf # rows for aggregation Query time Draft link
DP Padding with default parameters 1000 360 9220 7 sec https://draft-mpc.vercel.app/query/view/lone-putty2024-08-27T1549
Main 1000 NA NA 7 sec https://draft-mpc.vercel.app/query/view/inept-black2024-08-27T1557
Main 100,000 NA NA 27 sec https://draft-mpc.vercel.app/query/view/sudsy-cords2024-08-27T1748
DP Padding with default parameters 100,000 654 9214 27 sec https://draft-mpc.vercel.app/query/view/elect-leach2024-08-27T1752

Even with default parameters there are some tests in the CI that run very slowly, so I am also adjusting back to relaxed parameters for the OPRF so the tests complete quickly. I will follow up with a PR that lets us pass in a selection for the parameters.

bmcase commented 2 weeks ago

@andyleiserson or @benjaminsavage I think this PR is ready to merge now. Ran some draft runs above and have set the parameters to make CI tests run fast enough. Will follow up with a PR that lets us pass in a choice of more conservative parameters. If one of you can merge, that'd be great as I don't have merge access. If you feel this is too many commits, Alex sometimes asks to use the merge option which squashes into one commit on main. Thanks