private-attribution / ipa

A raw implementation of Interoperable Private Attribution
MIT License
42 stars 25 forks source link

add scalar batch invert #1450

Open tyurek opened 1 week ago

tyurek commented 1 week ago

There's one part of the PRF code that can likely be sped up by using batch inversion of Fp25519. This PR adds a function to call the Scalar batch_invert in curve25519-dalek which can be used here.

The function I wrote is not memory efficient at all because I couldn't figure out how to effectively work with <V as Vectorizable>::Array>. I'd appreciate some suggestions here :) Using Vecs appears to be necessary as it's part of the dalek batch_invert code. To re-implement that code using arrays, we would need to fork dalek to expose UnpackedScalar and the pack/unpack methods.

Even so, this version appears to be faster: 2M records in 6:12 https://draft-mpc.vercel.app/query/view/sore-deal2024-11-21T1848 4M records in 12:12 https://draft-mpc.vercel.app/query/view/roast-loft2024-11-21T2017

Compared to current main: 2M in 6:22 https://draft-mpc.vercel.app/query/view/gaudy-whack2024-11-21T1856 4M in 12:27 https://draft-mpc.vercel.app/query/view/beta-porch2024-11-21T2003

codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 93.51%. Comparing base (f42e334) to head (ed44a9f).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1450 +/- ## ========================================== - Coverage 93.53% 93.51% -0.02% ========================================== Files 230 230 Lines 40683 40698 +15 ========================================== + Hits 38053 38060 +7 - Misses 2630 2638 +8 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.


🚨 Try these New Features:

andyleiserson commented 1 week ago

My guess is that the overhead of the allocation is negligible relative to the cost of the inversion. But a few ideas:

akoshelev commented 1 week ago

I made the same change a few weeks ago and perf boost was negligible. Hope that it is not the case here

akoshelev commented 1 week ago

found the code: https://github.com/private-attribution/ipa/pull/1393

akoshelev commented 1 week ago

@andyleiserson pointed out that it was not the same thing - I did batch decompress while this PR is doing batch invert

tyurek commented 4 days ago

@andyleiserson I went with the try_into approach for now. What is your approach for checking generated code like this?

Also, how hard would it be to implement Iterable and the ability to index for SharedValueArray?

andyleiserson commented 4 days ago

What is your approach for checking generated code like this?

Either extract something that can be compiled standalone and use https://rust.godbolt.org/, or use objdump -d to inspect my own build. Or, a different way of checking this would be to find or create a suitable unit test or benchmark, and use cargo flamegraph to make sure there isn't a big chunk of time going somewhere you don't expect. (However, one problem with reading the flamegraphs is that inlining can obscure where the time is going.)

Also, how hard would it be to implement Iterable and the ability to index for SharedValueArray?

It already has IntoIterator. The problem with some kinds of iteration and indexing is that SharedValueArray can also be a bit vector. I'm also somewhat opposed to indexing into SharedValueArray on principle -- vectorized implementations should always be repeating the same operation for each element of the vector.