private-attribution / ipa

A raw implementation of Interoperable Private Attribution
MIT License
41 stars 23 forks source link

Test encrypted input #1237

Closed eriktaubeneck closed 2 weeks ago

eriktaubeneck commented 3 weeks ago

For the in-market test (and in general in production), we're constructing 3 files of encrypted data. #1216 built a tool to encrypt, and to test decryption. This PR adds a test which reads in those files and runs a test query based on those encrypted files.

One issue here is that it requires adding another feature (in-memory-infra) to the cli::crypto module, that's only used for testing. Relatedly, it also requires making runner::oprf_ipa::OprfIpaQuery public. I think it might be preferable to move this elsewhere, and I'd love some feedback on where best this test should live to minimize these types of changes.

As for implementing this for actual use (the next step, which I will do in a separate PR), we have two options:

  1. The report_collector binary could read in all 3 files, create the buffers, and upload them as it does today.
  2. We could create a new helper end point / modify the existing /create endpoint so that it takes in a URL of the input data and downloads that.

We will need (2) eventually for sharding, but it's not immediately obvious all that needs to be done to make it happen here. Curious if others have thoughts on which approach to take.

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.62%. Comparing base (fbc56af) to head (836ff97). Report is 9 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1237 +/- ## ========================================== - Coverage 92.82% 92.62% -0.20% ========================================== Files 200 200 Lines 31166 31413 +247 ========================================== + Hits 28930 29097 +167 - Misses 2236 2316 +80 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

andyleiserson commented 3 weeks ago

I think it makes sense to put this test with the integration tests in ipa-core/tests/helper_networks.rs. It could either be a new test or a modification of one of the existing ones.

The integration tests may currently all be using real HTTP networking rather than the in-memory implementation, but I don't think it has to be that way -- if there's a reason this test needs to use in-memory networking, it would be fine to have some integration tests in that configuration.

(2) We could create a new helper end point / modify the existing /create endpoint so that it takes in a URL of the input data and downloads that.

We will need (2) eventually for sharding, but it's not immediately obvious all that needs to be done to make it happen here. Curious if others have thoughts on which approach to take.

There is some overlap with the data transfer involved in sharding, but I'm not sure it's really interchangeable. I think it would be better to leave a pull-from-URL input mechanism and sharding as future, separate tasks.

martinthomson commented 3 weeks ago

The feature usage seems fine to me. You can do a bunch of things with tags if it’s only for tests.

eriktaubeneck commented 2 weeks ago

I think it makes sense to put this test with the integration tests in ipa-core/tests/helper_networks.rs. It could either be a new test or a modification of one of the existing ones. The integration tests may currently all be using real HTTP networking rather than the in-memory implementation, but I don't think it has to be that way -- if there's a reason this test needs to use in-memory networking, it would be fine to have some integration tests in that configuration.

Let me move it there and see if it works. There's no hard dependency on using in-memory networking, I was just trying to replicate the test in ipa-core/src/query/runner/oprf_ipa.rs.

There is some overlap with the data transfer involved in sharding, but I'm not sure it's really interchangeable. I think it would be better to leave a pull-from-URL input mechanism and sharding as future, separate tasks.

Sounds good. I'll just work on updating the existing report_collector cli to accept 3 encrypted files.

eriktaubeneck commented 2 weeks ago

As-is, I don't think this test will run in CI, it probably makes sense to add a config so it does.

@andyleiserson dumb question, but what config controls this?

andyleiserson commented 2 weeks ago

As-is, I don't think this test will run in CI, it probably makes sense to add a config so it does.

@andyleiserson dumb question, but what config controls this?

.github/workflows/check.yml controls what runs in CI (and pre-commit is what gets run before pushing).