Some doubts about the sgn function

ChangSZ commented 3 years ago

I am sorry to bother you. I have some content that I don't understand, when I read your project source code. I hope I can get your help. I don't understand how this part works：

        if not EQ:  # a la Toft
            s_sign = (await self.random_bits(Zp, 1, signed=True))[0].value
            e = [None] * (l+1)
            sumXors = 0
            for i in range(l-1, -1, -1):
                c_i = (c >> i) & 1
                r_i = r_bits[i].value
                e[i] = Zp(s_sign + r_i - c_i + 3*sumXors)
                sumXors += 1 - r_i if c_i else r_i
            e[l] = Zp(s_sign - 1 + 3*sumXors)
            g = await self.is_zero_public(stype(self.prod(e)))
            h = 3 - s_sign if g else 3 + s_sign
            z = (c - a_rmodl + (h << l-1)) / (1<<l)

Questions:

What is the role of 3*sumXors?
Is there any relevant literature on the part of the comparison operation in MPYC?

Looking forward to your reply.

lschoe commented 3 years ago

Hi, happy to answer your questions.

Maybe first, to set the context for the piece of code that you've quoted, note that function mpc.sgn(a) actually combines three modes of comparison. The default comparison is in LT&EQ mode, which evaluates the sign(um) of a, which has three possible values: -1 if a<0 else 0 if a==0 else 1. The other modes are EQ mode and LT mode, which evaluate a==0 and a<0, respectively.

The above code fragment relates to the LT (less than) mode. A call like mpc.sgn(a-b, LT=True) will evaluate a<b, returning a secret-shared bit defined as 1 if a<b else 0.

The code for the LT mode has been adapted from the original code by Tomas Toft, which he added to VIFF, I think in 2007. That's where the comment "a la Toft" refers to. The underlying protocol was later documented in the PETS 2009 paper Privacy-Preserving Face Recognition.

The MPyC code differs quite a lot from the original VIFF code, but the essential ideas of the original protocol are retained. In particular, the use of 3*sumXors is retained. Here, the factor of 3 is critical: as soon as you lower this factor to 2 in either (or both) of the two places where it occurs, the protocol will be broken.

You can easily experience this by "hacking MPyC" yourself and then running some of the included unittests. Simply edit the file mpyc/runtime.py and then run, for example, python test_runtime.py in the directory mpyc/tests. You should see failures if you use 2*sumXors.

Intuitively, the term 3*sumXors is included to "lock" the value r_i-c_i for the first time that r_i differs from c_i, processing the bits from high to low. Also depending on the value of s_sign, which is -1 or 1, both with 50% probability, this will ensure that at the end the list e will either contain exactly one zero or no zeros at all. And these two cases for e will occur both with 50% probability, independent of the value of the (secret-shared) input a to the protocol. This fact ensures that we safely reveal whether the product of all elements of e is equal to 0, or not.

I hope this explanation helps a bit to reduce, or even eliminate, your "doubts" about this code fragment?

ChangSZ commented 3 years ago

Hi, thank you so much for your reply. The above information is perfect to answer my "doubts".

I noticed the comment "a la Toft" before, but I didn't understand what it means. When I Google Search "a la Toft", I didn't get any valuable information, lol. I read many papers in Tomas Toft, but the implementation is similar to this: 《Unconditionally Secure Constant-Rounds Multi-party Computation for Equality, Comparison, Bits and Exponentiation》. This is very different from MPYC and is relatively complicated.

The implementation in the MPYC project is really clever, learned.

Thanks again for your reply.

lschoe commented 3 years ago

Thx, happy to hear that!

Indeed, the DFK+06 paper you are mentioning is aimed at achieving a constant round complexity of O(1) for secure comparisons etc. The resulting protocols are often mostly of theoretical interest, because the hidden constants can be quite "enormous". Like, for a secure less-than a<b, they first compute the bit decomposition of a and b in 114 rounds and then do a bitwise less-than in 19 rounds, for a total of 133 rounds. Moreover, the number of secure multiplications for l-bit numbers goes up to O(l log l), again with a very large hidden constant: in total 110 l log_2 l + 140 l secure multiplications (only 22 l of which for the bitwise less-than).

In general, the computational complexity will go up if one starts to squeeze things in fewer rounds. Vice versa, if one relaxes the round complexity, the computational complexity can be lowered. With a linear round complexity of O(l) one can minimize the computational effort. But for common values like l=16, l=64, and l=256 say, using O(l) rounds will degrade the overall performance too much.

In MPyC we strike a nice balance by aiming for a logarithmic round complexity of O(log l). And with a small hidden constant, like a kind of optimal round complexity of log_2 l only. In addition, we keep the computation complexity at O(l), with a small hidden constant. The call self.prod(e) is the only part of a secure LT comparison in MPyC that takes a nonconstant number of rounds, namely log_2 len(e) = log_2 (l+1) rounds. So, for l=64 say, we only have about 6 rounds, and even for l=1024, we are at about 10 rounds only.

Other parts have been optimized too. For instance, compared to the original implementation in VIFF, the protocol saves the computation of one secure random bit. Random bits are quite costly to generate securely. The random bit is saved by observing that there is no need to break ties like it is done in Section 5 of Privacy-Preserving Face Recognition: to break ties for the case a=b, they work with a'=2a and b'=2b+1, increasing the bit length of the numbers by 1. In MPyC, the tiebreaker for mpc.sgn(a) would be used to handle the corresponding case a=0: for i=l-1, ..., 0, we then have r_i=c_i and hence that e[i]= s_sign because sumXors=0 throughout. Only at the end we have e[l]= s_sign-1, and this value is equal to 0 with 50% probability, as required for privacy.

ChangSZ commented 3 years ago

I am very happy to receive your message.

The content you mentioned above is very clear, helping me understand the optimization which made by MPyC in the actual engineering better.

The method of saving the computation of one random bit in mpc.sgn(a) is very clever. The computation performance is improved, under the premise of ensuring the correctness of the calculation and the safety of privacy. There are many optimizations in MPyC, which make MPC to have a good computation speed in the Python programming language environment.

Math is so amazing. Live and learn. Thank you for explaining.

lschoe / mpyc

Some doubts about the sgn function #26