lschoe / mpyc

MPyC: Multiparty Computation in Python
MIT License
378 stars 77 forks source link

Unexpected error while experimenting with large matrices #78

Closed MarcT0K closed 1 year ago

MarcT0K commented 1 year ago

Hi, I am experimenting some large matrix multiplications and I ran into an unexpected issue. To put it simply, I am multiplying two large matrices with 100 rows and an increasing number of columns. Until 3K columns, everything is fine but for 3K columns and more, I have an error I don't understand:

Traceback (most recent call last):
  File "/usr/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/mdamie/sparsesecureml/venv/lib/python3.9/site-packages/mpyc/asyncoro.py", line 431, in <lambda>
    task.add_done_callback(lambda t: _reconcile(decl, t))
  File "/home/mdamie/sparsesecureml/venv/lib/python3.9/site-packages/mpyc/asyncoro.py", line 355, in _reconcile
    givn = task.result()
  File "/home/mdamie/sparsesecureml/venv/lib/python3.9/site-packages/mpyc/asyncoro.py", line 283, in _wrap_in_coro
    return await awaitable
  File "/home/mdamie/sparsesecureml/venv/lib/python3.9/site-packages/mpyc/asyncoro.py", line 271, in __await__
    val = self.coro.send(None)
  File "/home/mdamie/sparsesecureml/venv/lib/python3.9/site-packages/mpyc/runtime.py", line 3672, in np_random_bits
    _r = thresha.np_pseudorandom_share(field, m, self.pid, prfs, self._prss_uci(), h)
  File "/home/mdamie/sparsesecureml/venv/lib/python3.9/site-packages/mpyc/thresha.py", line 171, in np_pseudorandom_share
    s = sum(prf_S(uci, (n,)) * _f_S_i(field, m, i, S)
  File "/home/mdamie/sparsesecureml/venv/lib/python3.9/site-packages/mpyc/thresha.py", line 171, in <genexpr>
    s = sum(prf_S(uci, (n,)) * _f_S_i(field, m, i, S)
  File "/home/mdamie/sparsesecureml/venv/lib/python3.9/site-packages/mpyc/thresha.py", line 257, in __call__
    dk = shake_128(self.key + s).digest(n_ * l)
ValueError: [digital envelope routines: EVP_DigestFinalXOF] not XOF or invalid length
  File "/home/mdamie/sparsesecureml/venv/lib/python3.9/site-packages/mpyc/runtime.py", line 795, in np_trunc
    r_bits = await self.np_random_bits(Zp, f * n)
Traceback (enclosing MPyC coroutine call):
2023-10-02 08:01:11,376 Exception in callback _SelectorSocketTransport._call_connection_lost(ConnectionRes...eset by peer'))
handle: <Handle _SelectorSocketTransport._call_connection_lost(ConnectionRes...eset by peer'))>
Traceback (most recent call last):
  File "/usr/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 978, in _call_connection_lost
    super()._call_connection_lost(exc)
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 736, in _call_connection_lost
    self._protocol.connection_lost(exc)
  File "/home/mdamie/sparsesecureml/venv/lib/python3.9/site-packages/mpyc/asyncoro.py", line 147, in connection_lost
    raise exc
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 856, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
ConnectionResetError: [Errno 104] Connection reset by peer
2023-10-02 08:01:11,376 Exception in callback _SelectorSocketTransport._call_connection_lost(ConnectionRes...eset by peer'))
handle: <Handle _SelectorSocketTransport._call_connection_lost(ConnectionRes...eset by peer'))>
Traceback (most recent call last):
  File "/usr/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 978, in _call_connection_lost
    super()._call_connection_lost(exc)
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 736, in _call_connection_lost
    self._protocol.connection_lost(exc)
  File "/home/mdamie/sparsesecureml/venv/lib/python3.9/site-packages/mpyc/asyncoro.py", line 147, in connection_lost
    raise exc
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 856, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
ConnectionResetError: [Errno 104] Connection reset by peer
Traceback (most recent call last):
  File "/home/mdamie/sparsesecureml/venv/bin/benchmark", line 11, in <module>
    load_entry_point('securesparsecomputations', 'console_scripts', 'benchmark')()
  File "/home/mdamie/sparsesecureml/securesparsecomputations/benchmark.py", line 410, in run
    mpc.run(main())
  File "/home/mdamie/sparsesecureml/venv/lib/python3.9/site-packages/mpyc/runtime.py", line 177, in run
    return self._loop.run_until_complete(f)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 640, in run_until_complete
    raise RuntimeError('Event loop stopped before Future completed.')

I am not sure where to investigate to debug it as the experiment fails only for larger matrices. The matrices are fixed-point arrays.

Moreover, a few months ago, I tried the same experiments with integer arrays and I could increase the number of columns up to 10K columns (before stopping the experiments to avoid memory overflow with larger matrices). Could the problem come from fixed-point arrays?

Looking forward to any input/intuition about this

lschoe commented 1 year ago

I recognize this error, which occurs when the output length demanded in a call to the SHAKE function gets too long, which is used in the implementation of PRSS. The error occurs for lengths of 2**31 bytes and more on Python 3.9 (on Windows).

You can try to run this with PRSS disabled using "no-prss" on the command line. Also, limiting the bit length of fixed-point numbers will help, and in your case above, in particular the number of fractional bits needs to be limited, because the problem originates from a call to np_trunc (which uses secure random bits internally for probabilistic rounding).

MarcT0K commented 1 year ago

Thanks, I'll try to launch my experiments with this option and update this issue.

The error occurs for lengths of 2**31 bytes and more on Python 3.9 (on Windows).

FYI, I am on a Debian server with Python 3.9

MarcT0K commented 1 year ago

The option no-prss seems to fix the problem. Thanks again!