private-attribution / ipa

A raw implementation of Interoperable Private Attribution
MIT License
40 stars 23 forks source link

Integrating DP noise for OPRF IPA #1133

Closed bmcase closed 4 weeks ago

bmcase commented 2 months ago

This PR integrates adding DP into the oprf_ipa function. The DP parameters are also added to come as query inputs.

In a future PR we'll add a feature flag and environment variables to make it harder for the no-DP option for testing to be used accidentally in production.

Currently two network tests are failing, but creating a PR so folks can start to review.

bmcase commented 2 months ago

All tests are passing except these tests for compact gate which are failing with this error: unexpected narrow for ProtocolGate(/ipa_prf/d_p/noise_gen) => AggregationStep(aggregate00)

having run one of the tests as cargo test --test compact_gate --lib compact_gate_cap_8_no_window_semi_honest -p ipa-core --no-default-features --features "cli web-app real-world-infra test-fixture compact-gate"

I'm not really sure about the latest approach to compact gates, but is there something particular that needs to be done to support it when adding new steps? cc @akoshelev @benjaminsavage @martinthomson do you know what may be going on?

Edit: so was chatting more with Ben about it. I have derived compact steps for DP but probably what is missing in them is a reference out to AggregationStep::Aggregate as a child of NoiseGen uses the aggregate_values function.

In dp/step.rs I have

#[derive(CompactStep)]
pub(crate) enum DPStep {
    NoiseGen,
    ApplyNoise,
}

what is over in aggregation/step.rs is

#[derive(CompactStep)]
pub(crate) enum AggregationStep {
    #[step(child = BucketStep)]
    MoveToBucket,
    #[step(count = 32, child = AggregateValuesStep)]
    Aggregate(usize),
}

Noise gen uses Aggregate but not MoveToBucket.

martinthomson commented 2 months ago

Try this:

#[derive(CompactStep)]
pub(crate) enum DPStep {
  #[step(child = AggregationStep)]
  NoiseGen,
  ApplyNoise,
}
bmcase commented 2 months ago

Thanks @martinthomson , I think that works with another child of AppleNoise also

#[derive(CompactStep)]
pub(crate) enum DPStep {
    #[step(child = crate::protocol::ipa_prf::aggregation::step::AggregationStep)]
    NoiseGen,
    #[step(child = crate::protocol::boolean::step::SixteenBitStep)]
    ApplyNoise,
}
codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 86.20038% with 73 lines in your changes missing coverage. Please review.

Project coverage is 92.17%. Comparing base (e7e3e52) to head (2040849).

Files Patch % Lines
ipa-core/src/protocol/dp/mod.rs 79.03% 52 Missing :warning:
ipa-core/src/helpers/transport/query/mod.rs 26.66% 11 Missing :warning:
ipa-core/src/test_fixture/ipa.rs 87.87% 4 Missing :warning:
ipa-core/src/cli/playbook/mod.rs 94.33% 3 Missing :warning:
ipa-core/src/protocol/ipa_prf/mod.rs 97.77% 2 Missing :warning:
ipa-core/src/cli/playbook/ipa.rs 83.33% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1133 +/- ## ========================================== - Coverage 92.23% 92.17% -0.07% ========================================== Files 197 197 Lines 29951 30265 +314 ========================================== + Hits 27626 27897 +271 - Misses 2325 2368 +43 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

bmcase commented 2 months ago

Finished refactoring the DP module to use a struct NoiseParams to hold all the internal dp parameters.

After trying many things, was not able to get clap to parse a complex enum coming directly from the command line, such as

   #[cfg_attr(feature = "clap", arg(long, default_value = "WithDp{3.0}"))]
   pub dp_params: DpParams,

so I instead pass two parameters with_dp and epsilon and once read in construct the dp_params enum from those. I can comment with more to this issue https://github.com/private-attribution/ipa/issues/1149 about what we might like to rather do if we revisit parsing query params in the future.

Tests/checks are all passing now. I tried to create draft runs but that is crashing even on main right now.

martinthomson commented 2 months ago

This approach seems reasonable. One option you might try if you want to take another swing: make the parameter pub epsilon: Option<f64> and then impl From<Option<f64>> for DpParams.

bmcase commented 1 month ago

@akoshelev I finished addressing feedback and configuring default parameters to work for draft runs. I think we can go ahead and merge this PR now.

In the future will make a couple changes.:

bmcase commented 1 month ago

Here is a table of draft runs with the default DP params.

Size fixed at 1000. 

Max_breakdown_key always 256 (draft selection doesn’t change this anyway)

Max_trigger_value fixed at 7. 

You can see that query time does increase as per_user_credit_cap increases, but query time is at most around 30 sec. 


branch
per_user_credit_cap Query time Draft query link
main 32 5 sec https://draft-mpc.vercel.app/query/view/bone-cult2024-08-01T1634
main 64 5 sec https://draft-mpc.vercel.app/query/view/stout-sneak2024-08-01T1635
main 128 4 sec https://draft-mpc.vercel.app/query/view/each-intro2024-08-01T1626
main 256 Error: Invalid value specified for per-user cap: 256. Must be one of 8, 16, 32, 64, or 128. https://draft-mpc.vercel.app/query/view/found-favor2024-08-01T1633
#1133 Binomial DP 32 8 sec https://draft-mpc.vercel.app/query/view/macro-jog2024-08-01T1635
#1133 Binomial DP 64 17 sec https://draft-mpc.vercel.app/query/view/swish-panic2024-08-01T1622
#1133 Binomial DP 128 32 sec https://draft-mpc.vercel.app/query/view/tidy-cite2024-08-01T1629
#1133 Binomial DP 256 Error: Invalid value specified for per-user cap: 256. Must be one of 8, 16, 32, 64, or 128. https://draft-mpc.vercel.app/query/view/here-drink2024-08-01T1630


bmcase commented 1 month ago

@akoshelev finished addressing or replying to your feedback. Also added table of daft runs.

just please make sure you squash all commits into one

This is a merge option, right? I don't think I can do this as don't have merge access. Can you do when you merge?

akoshelev commented 1 month ago

Here is a table of draft runs with the default DP params.

Size fixed at 1000.

Max_breakdown_key always 256 (draft selection doesn’t change this anyway)

Max_trigger_value fixed at 7.

You can see that query time does increase as per_user_credit_cap increases, but query time is at most around 30 sec.

How do these results compare with OPRF IPA w/o noise?

akoshelev commented 1 month ago

This is a merge option, right? I don't think I can do this as don't have merge access. Can you do when you merge?

sure, I'll do it

bmcase commented 1 month ago

@akoshelev

How do these results compare with OPRF IPA w/o noise?

The rows in the table with branch as main are these without the noise generation. Essentially without noise the performance doesn't scale with the per_use_credit_cap as it does when we add noise.

akoshelev commented 1 month ago

Got it, thanks! Can you also run it on a large input (50M rows) with 32 per user cap to compare?

bmcase commented 1 month ago

@akoshelev figured out a better way to handle Eq for DpMechanism -- we can just remove it. Must have been something else earlier was needing that we don't use in the end.

Running draft on 50M for main. 2hrs already. How long do you expect main to take for that?

akoshelev commented 1 month ago

yea it normally takes around 5hrs to finish :(

bmcase commented 4 weeks ago

@akoshelev completed a few draft runs for larger scale queries and it looks like there is negligible performance impact as the query size increases (as would be expected). We would expect about 8 sec total time added and for these large queries that appears to be much smaller than the normal variance across different runs of the same code.

For per_user_credit_cap = 32 and the default DP parameters.

Branch and size
per_user_credit_cap Query time Draft query link
#1133 Binomial DP  5M input rows 32 40 min 00 sec https://draft-mpc.vercel.app/query/view/camp-beat2024-08-05T1841
Main5M input rows 32 40 min 53 sec https://draft-mpc.vercel.app/query/view/sewn-rinse2024-08-03T2137
#1133 Binomial DP  50M input rows 32 6 hr 38 min https://draft-mpc.vercel.app/query/view/elect-goon2024-08-06T0146
main  50M input rows 32 6 hr 35 min https://draft-mpc.vercel.app/query/view/spumy-wing2024-08-02T0042

akoshelev commented 4 weeks ago

agree, lets merge this