Loss of precision when using multiple maths functions

lschoe / mpyc

MPyC: Multiparty Computation in Python

MIT License

367 stars 76 forks source link

Loss of precision when using multiple maths functions #29

Closed JoaoDiogoDuarte closed 2 years ago

JoaoDiogoDuarte commented 2 years ago

Hi,

I am trying to implement a simple function which outputs the correlation coefficient on secret shared values. This function uses multiple mpc.add, mpc.sub, mpc.mul and mpc.div. I have verified that my implementation is correct.

The issue is that since every one of these mpc maths functions causes a slight loss in precision, all of these losses add up and the function output is quite different from the correlation coefficient obtained from the correlation coefficient obtained from the plaintext. For example:

Plainly computed pearson 
-0.6625992397006207
Computed in 0.0014712810516357422 seconds

Securely computed pearson 
-0.5963393168058246
Computed in 0.24330759048461914 seconds

Error between values -0.0662599228947961

Do you know if there is any way to mitigate this, please? I am relatively new to MPyC so I may be missing something obvious!

Thanks! All the best, Joao

lschoe commented 2 years ago

Hi Joao, welcome! So, I assume you are using secure fixed-point numbers via mpc.SecFxp(). Which formula are you using to compute the correlation coefficient?

If you are already using a numerically stable formula, then you can still increase the bit length, e.g. using mpc.SecFxp(64) or mpc.SecFxp(80) to override the use of the default 32-bit fixed-point numbers (16-bit integer part, 16-bit fractional part), with 64-bit or 80-bit numbers. Also, you may need to scale the range of numbers appearing in your inputs.

JoaoDiogoDuarte commented 2 years ago

Good suggestion! I'll try that out and I also changed my implementation to have less mpc divides which also solved part of the issue.

I am using the standard formula as attached to the image:

And I am using SecFxp 64.

Thanks!

lschoe commented 2 years ago

Yeah, sounds good. That formula behaves OK numerically.

Since it was on the TODO list anyway, I've now added a secure method for the correlation coefficient to MPyC v0.7.10. You can find it in the mpyc.statistics module, method correlation() next to two more methods covariance() and linear_regression(). These methods are the secure equivalents of the ones that were recently added to Python 3.10.0 in the statistics module,

Using the same formula, the implementation of these methods makes use of mpc.in_prod() all the time (and mpc.sum()) to get a good performance. Otherwise, when using secure fixed-point numbers, lots of time would be wasted on secure mpc.trunc()'s if you would for instance write sum(a * b for a, b, in zip(x, y)) instead of mpc.in_prod(x, y).