lschoe / mpyc

MPyC: Multiparty Computation in Python
MIT License
367 stars 76 forks source link

Problems with division on numpy arrays #76

Closed MarcT0K closed 10 months ago

MarcT0K commented 11 months ago

Hi, While experimenting with some arrays of fixed-point numbers, I discovered two problems with the division operator:

np.divide

Calling np.divide on SecureFixedPointArray will call _norm. This function will fail on the following instruction:

    l = type(a).bit_length

I suppose this function works on scalars but fails on secure arrays since they have no bit_length attribute. I suppose we can quickly fix this by taking the bit_length of any element in the secure array, but I let you tell me whether you would prefer a more general refactoring to solve this issue.

np.reciprocal

To circumvent this problem, I tried to call directly np.reciprocal on my array. However, it seems like there is a numerical issue in this function. For example, with the following inputs [ 1. 36. 1. 16. 49. 1. 81. 49. 64. 121.], I obtain [3.73109370e+28 -3.68142988e+28 3.73109370e+28 2.55816413e+28 -3.71193020e+28 3.73109370e+28 3.82035502e+28 -3.71193020e+28 -1.00870767e+28 -1.77230424e+28].

I haven't investigated the origin of this behaviour yet.

As for my previous PRs, if we agree on a fix, I can implement them and submit a PR.

lschoe commented 11 months ago

Currently, fixed-point division for Numpy arrays is only defined for a scalar divisor. The function np_reciprocal() cannot be used here as that one applies to the inverse in a finite field.

Do you have an application with large arrays as divisor? Or an array with a few elements? In the latter case, you can take the reciprocal by using mpc._rec() (directly or indirectly) elementwise.

Future extension will have a vectorized implementation of _rec().

MarcT0K commented 11 months ago

Do you have an application with large arrays as divisor?

I have relatively large arrays (around 300K elements).

If needed, I can implement a vectorized version of _rec() and submit a PR. I skimmed through the code. If I understand well, the main task is to vectorize find() which is used in _norm.

MarcT0K commented 11 months ago

After a second check, this extension will also require to extend np_to_bits to fixed-point arrays.

lschoe commented 10 months ago

New version 0.9.5 includes a simple stub for a vectorized mpc._norm(). The remaining part of mpc._rec() is also vectorized this way.