lschoe / mpyc

MPyC: Multiparty Computation in Python
MIT License
367 stars 76 forks source link

Effect of using the --no-prss flag #57

Closed RetoKrummenacher closed 1 year ago

RetoKrummenacher commented 1 year ago

Hi,

I'm trying to use the library for some simple aggregations over pandas dataframes, where each row is a computing party, to conduct some runtime experiments for MPC. As you explained in detail in #4, this will have some limitations due to PRSS.

I noticed, that it is possible to run it without PRSS (--no-prss flag). As far as I understand from your code, this is a simple Shamir sharing with random coefficient polynomials as implemented in thresha.random_split().

However, I can't see from the code where PRSS would actually have an influence.

While the computation time decrease as expected, the communication costs in bytes are the same with and without PRSS as you can see from the outputs below. Only the time used for setting up the connects increases. Does this imply the PRSS shares are created during connection? I thought PRSS actually is used to decrease communication costs.

I assume 22 bytes is for sending the share to all other parties and then another 22 bytes to reconstruct the secret with T parties. Is that correct?

I would appreciate if you could explain me the effects of using the --no-prss flag.

With PRSS:

python addition.py -M50 -T4  --log-level=debug
2023-06-09 16:05:35,123 Start MPyC runtime v0.9
2023-06-09 16:06:01,465 All 50 parties connected.
Total : 35890
2023-06-09 16:06:16,771 Stop MPyC -- elapsed time: 0:00:15.262|bytes sent: 1166
2023-06-09 16:06:16,772 Bytes sent per party: 0 44 44 44 44 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22
2023-06-09 16:06:16,773 Synchronize with all parties before shutdown
2023-06-09 16:06:16,776 Closing connections with other parties
Execution time including connecting: 0:00:41.692

Without PRSS:

python addition.py -M50 -T4 --no-prss --log-level=debug
2023-06-09 16:05:17,697 Start MPyC runtime v0.9
2023-06-09 16:05:18,880 All 50 parties connected.
Total: 35890
2023-06-09 16:05:18,935 Stop MPyC -- elapsed time: 0:00:00.054|bytes sent: 1166
2023-06-09 16:05:18,935 Bytes sent per party: 0 44 44 44 44 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22
2023-06-09 16:05:18,936 Synchronize with all parties before shutdown
2023-06-09 16:05:18,941 Closing connections with other parties
Execution time including connecting: 0:00:01.263
lschoe commented 1 year ago

Hi @RetoKrummenacher,

The effect of the no-prss flag is to disable the use of pseudorandom secret sharing (PRSS). This means that the generation and exchange of random keys for PRSS is skipped (in mpc.start()), which is useful when running MPyC with a larger number of parties $m$. For threshold $t$, the number of keys handled by each party grows as $\binom{m}{t}$, hence exponentially in $m$ for the default setting $t=\lfloor(m-1)/2\rfloor$.

For $m=50$ parties as in your case, one way out is to set the threshold to a small value, like $t=4$ as you actually do. This allows your program to run even with PRSS enabled. The effect of the no-prss flag in your example is still significant, as the time for the setup of the PRSS keys is about 40 seconds already: each of the 50 parties is exchanging $\binom{50}{4}=230300$ keys with each of the other 49 parties.

Now, depending on what your MPyC program is actually doing, for the current version of MPyC either of two things may happen when you use the no-prss flag. Namely, your program will complete mpc.start() quickly even when $\binom{m}{t}$ gets large, and subsequently:

The first behavior applies to your example, and also to the following MPyC demos: helloworld.py, oneliners.py, ot.py, unanimous.py, parallelsort.py, sha3.py. This lets you run these demos with many parties (e.g., helloworld.py with more than $m=1000$ parties on Linux, localhost).

All remaining demos will currently give the NotImplementedError. A goal for this (calendar) year is to support no-prss for all functionality in MPyC, which means that for instance the joint generation of a secret-shared random value will be done using an interactive protocol between the parties, whereas this is currently done without any interaction using PRSS.

RetoKrummenacher commented 1 year ago

Thank you for that very fast reply. Much appreciated.

My code is actually based on your sum example in oneliners.py. Meaning just summing up $M$ individual secret integer numbers, one one each party.

The build in limit of $m=256$ as mentioned in #4 seems not valid anymore then. I was able to run helloworld.py with $M=1000$.

What I still don't quite understand, how MPyC is computing a sum of M secret inputs when PRSS is disabled in terms of secret sharing? I can't see the advantage in terms of secrecy of using PRSS. As clearly the efficiency is limited.

lschoe commented 1 year ago

An important point to stress here is that PRSS indeed is for pseudorandom secret sharing, where pseudorandom actually applies to the secret being (created and) shared. We have no control over this value, so PRSS cannot be used by itself for "ordinary" secret sharing, where we want to share a given value.

In the oneliners demo we are using Shamir secret sharing inside mpc.input() to let parties distribute their private inputs between all parties in the secure computation. This in turn relies on a call to random_split() from the mpyc.thresha module, where parties will generate coefficients at random for a polynomial of degree (at most) $t$, setting the constant coefficient equal to the given secret (private input).

To see the benefits of PRSS you can look for calls to pseudorandom_share() in the MPyC runtime, e.g. this one inside mpc._randoms():

    shares = thresha.pseudorandom_share(field, m, self.pid, prfs, self._prss_uci(), n)

This call creates shares at each party of pseudorandom values. These shares also lie on a polynomial of degree (at most) $t$. But this is done without interaction between the parties. The flip side is that each party spends $\binom{m-1}{t}$ time in a call to pseudorandom_share().

RetoKrummenacher commented 1 year ago

Your first point makes me think that my experimental code does not need PRSS at all:

This is the code. Each party has its private secrete taken from a pandas dataframe. The Goal to add all the secrets over $M$ parties. The experiments running for different $M$ and $T$ measuring runtime together with the computation time and the sent bytes which are reported by MPyC by default.

async def addition(df):    
    st_add = time.time()
    secint = mpc.SecInt(32)
    await mpc.start()    
    # get the individual secret of party from the dataframe df
    my_int = int(df.iloc[mpc.pid]['Value to sum'])
    my_input = mpc.input(secint(my_int))
    total = sum(my_inpput)
    print('Total:', await mpc.output(total))
    await mpc.shutdown()    
    print('Execution time:',str(datetime.timedelta(seconds=round((time.time() - st_add)*1000)/1000))[:-3])

If I understand you correctly, this implementation does not use PRSS at all. Meaning that running it without the --no-prss does increase runtime but the individual private secrets are summed the same way with or without PRSS?

Too, because the PRSS keys are not distributed between parties, the bytes sent reported from MPyC are equal with and without PRSS?

lschoe commented 1 year ago

Right, this program does not use PRSS as mpc.input() and mpc.output() perform standard Shamir secret sharing. The PRSS keys that are exchanged by default are not counted towards the total bytes_sent as these do not depend on the computation proper.

RetoKrummenacher commented 1 year ago

Ok. Then I will run the experiments without PRSS. Many thanks for your patience and the detailed answers.