lschoe / mpyc

MPyC: Multiparty Computation in Python
MIT License
367 stars 76 forks source link

`SecInt` bit widths seemingly not enforced #33

Closed samcowger closed 1 year ago

samcowger commented 2 years ago

Suppose I run the following program on three parties, with input hardcoded purely to demonstrate this issue:

from mpyc.runtime import mpc
async def main():
    await mpc.start()
    a0 = mpc.input(mpc.SecInt(8)(300))
    a1 = a0[0] + a0[1] + a0[2]
    print(await mpc.output(a1))
    await mpc.shutdown()
mpc.run(main())

(I'm executing this program on localhost in three separate shells via python prog.py -M 3 -I {0,1,2} -B 8000, Python 3.10.5.)

At first glance, I would expect this program to either:

Instead, the program produces the result 900. Should I expect this behavior, or is there some way I can leverage the library's other types to achieve one of the potential behaviors I listed above?

lschoe commented 2 years ago

Right, currently the responsibility for keeping values in range (if desired) is with the programmer indeed. So, for inputs and outputs there is an opportunity to provide some range checks, but in general it is not possible (and certainly not without a heavy performance penalty) to check if values of secure integers etc. are within the type's nominal range.

For your example, if in fact all three parties use an input in the range of 8-bit signed integers, like each party using 100 as input value, the nominal range check would pass for all the inputs, but the sum a1 = sum(a0) would already be out of range. It's the programmer's task to avoid such overflows, if these would be a problem. Whether this should give an error upon output is not so clear; if the output protocol is run the actual value, even if it's out of range, has been revealed to all parties, and therefore they might have a look at it anyway. Issuing warnings if input or outputs are out of range still makes sense of course.

To automatically reduce all integer values modulo 256 say for 8-bit integers is a common and reasonable approach but unfortunately that would kill the performance. Simple arithmetic with secure integers +,- and even * are efficient as long as we don't require any range checks or automatic modulo 256 reductions.

Interestingly, however, it's actually advantageous in some cases to let intermediate values go out of range. Secure integers are implemented as secret-shared values modulo a prime, and this prime $p$ is like 30-bits longer than the nominal range. This "headroom" is needed for operations like secure comparisons. But this headroom also means that simple arithmetic can be done beyond the nominal range, modulo $p$, and the results will still be meaningful if the final value is known to be within the range $-p/2$ to $p/2$. We use this kind of ideas in multilateration.py for example, where the linear algebra is done modulo the prime attached to the secure integer type.

So, to keep the door open for these advanced methods, no strict range checking is currently enforced. This also gives some more time to evaluate different options, like having a symmetric range of ${-2^{\ell-1}, \ldots, 2^{\ell-1}}$ for $\ell$-bit signed integers, which may be useful sometimes. Other times the usual asymmetric range ${-2^{\ell-1}, \ldots, 2^{\ell-1}-1}$ is more appropriate, which fits well with two's complement bit representations. The choice could be left to the programmer.

This needs to be handled before MPyC can go into version 1.0 at some point, so something for next year.

Do you have some use cases already? That would be very nice of course.

samcowger commented 2 years ago

Thanks for your explanation, I understand why you'd want to avoid such a performance penalty. My use cases include implementing, e.g., hashing algorithms that rely on arithmetic on $l$-bit (unsigned) integers being, generally, modulo $2^l$.

Looking through your multilateration example, I didn't immediately see an explicit reference to the prime attached to integers. Where/how do you choose that prime?

lschoe commented 2 years ago

To see how the primes for secure ints are generated, a good starting point is this piece of code from the mpyc.sectypes module: https://github.com/lschoe/mpyc/blob/7625e22bb9b731b7c6dd611fa2bf76401e21b94a/mpyc/sectypes.py#L619-L646

Ultimately, the primes are generated using probabilistic Miller-Rabin tests, either using gmpy2.is_prime() or the stub provided in mpyc.gmpy. The candidates for testing primality at a given bit length, however, are traversed in a deterministic order, to make sure that all MPyC parties end up with the same prime for a secure integer type.

Secure arithmetic modulo $2^\ell$ is indeed nontrivial. It's built-in already for secure integers. For example, using the asyncio REPL with MPyC preloaded, you can do things like this:

$ python -m mpyc
>>> await mpc.output((secint(99) + secint(100) + secint(101)) % 2**8)
44
>>> await mpc.output((secint(99) + secint(100) + secint(101)) // 256)
1

The secure mod $2^\ell$ reductions are internally done using secure bit decomposition for the $\ell$ least significant bits of a secure integer. The underlying protocols aren't cheap, but you can try and see how it performs.

If your goal is to achieve secure MPC-based hashing, the SHA3 family may be a good option. SHA3 is quite MPC-friendly as the nonlinear part is like doing 1600 GF(2) multiplications in parallel. For this the secure type mpc.SecFld(2) can be used. With the upcoming support for NumPy arrays in MPyC this leads to simple implementations with good performance:

$ python sha3.py -M3 -n10
2022-09-02 11:02:00,285 Start MPyC runtime v0.8.8
2022-09-02 11:02:00,595 All 3 parties connected.
Input: b'hello123hello123hello123hello123hello123hello123hello123hello123hello123hello123'
openssl_sha3_256 Output: 0x79edfc8a7d4e68eaebca61a7dd2c5eefcf0b9ae868dd025afc322a4e8cb01505
2022-09-02 11:02:01,827 Stop MPyC runtime -- elapsed time: 0:00:01.540749

MPyC v0.8.8 will be available on GitHub soon.

lschoe commented 1 year ago

MPyC v0.8.8 with support for secure (secret-shared) NumPy arrays is now available on GitHub, check it out!

Threshold SHA3 is one of the new demos, see sha3.py.

samcowger commented 1 year ago

Cool! Thank you for the advice, and for the SHA3 pointer. I think my original question has been sufficiently addressed, so I'll go ahead and close this issue.