initc3 / HoneyBadgerMPC

Robust MPC-based confidentiality layer for blockchains
GNU General Public License v3.0
128 stars 64 forks source link

"Random" Batch reconstruction failure #452

Closed sbellem closed 4 years ago

sbellem commented 4 years ago

This error happens somewhat randomly when running the asynchromix app at some epoch. The app may run fine for a few epochs, and then suddenly, at some epoch, not always the same the error will sometimes occur.

In the case shown below, it happened after 2 successful epochs.

hbmpc.peer0.io_1        | 2020-06-25 04:27:44,238:[mpcprogrunner.py:_load_pp_elements:57]:[INFO]: number of elements is: 400                                                                                        
hbmpc.peer0.io_1        | 2020-06-25 04:27:44,241:[mpcprogrunner.py:_mpc_loop:229]:[INFO]: [0] MPC initiated:2                                                                                                      
hbmpc.peer0.io_1        | 2020-06-25 04:27:44,245:[<string>:prog:4]:[INFO]: [0] Running MPC network                                                                                                                 
hbmpc.peer0.io_1        | 2020-06-25 04:27:44,347:[reed_solomon.py:_optimistic_update:323]:[CRITICAL]: Optimistic decoding failed                                                                                   
hbmpc.peer0.io_1        | 2020-06-25 04:27:44,347:[reed_solomon.py:_optimistic_update:323]:[CRITICAL]: Optimistic decoding failed                                                                                   
hbmpc.peer0.io_1        | 2020-06-25 04:27:44,436:[batch_reconstruction.py:batch_reconstruct:184]:[ERROR]: [BatchReconstruct] P1 reconstruction failed!                                                             
hbmpc.peer0.io_1        | 2020-06-25 04:27:44,440:[batch_reconstruction.py:batch_reconstruct:184]:[ERROR]: [BatchReconstruct] P1 reconstruction failed!                                                             
hbmpc.peer0.io_1        | 2020-06-25 04:27:44,444:[mpc.py:cb:195]:[ERROR]: Batch reconstruction for share_array (id: 2) failed!                                                                                     
hbmpc.peer0.io_1        | 2020-06-25 04:27:44,444:[mpc.py:cb:195]:[ERROR]: Batch reconstruction for share_array (id: 3) failed!                                                                                     
hbmpc.peer0.io_1        | 2020-06-25 04:27:44,445:[misc.py:print_exception_callback:17]:[CRITICAL]:                                                                                                                 
hbmpc.peer0.io_1        | Exception:                                                                                                                                                                                
hbmpc.peer0.io_1        | <Task finished coro=<AsyncMixin.__call__() done, defined at /usr/src/HoneyBadgerMPC/honeybadgermpc/progs/mixins/base.py:42> exception=HoneyBadgerMPCError('Batch reconstruction failed!')>

hbmpc.peer0.io_1        | Traceback (most recent call last):                                                                                                                                                        
hbmpc.peer0.io_1        |   File "/usr/src/HoneyBadgerMPC/honeybadgermpc/progs/mixins/base.py", line 49, in __call__                                                                                                
hbmpc.peer0.io_1        |     return await cls._prog(context, *args, **kwargs)                                                                                                                                      
hbmpc.peer0.io_1        |   File "/usr/src/HoneyBadgerMPC/honeybadgermpc/progs/mixins/share_arithmetic.py", line 42, in _prog                                                                                       
hbmpc.peer0.io_1        |     f, g = await gather(*[(j - u).open(), (k - v).open()])                                                                                                                                
hbmpc.peer0.io_1        | honeybadgermpc.exceptions.HoneyBadgerMPCError: Batch reconstruction failed!                                                                                                               
hbmpc.peer0.io_1        | )                                                                                                                                                                                         
hbmpc.peer0.io_1        | 2020-06-25 04:27:44,447:[misc.py:print_exception_callback:17]:[CRITICAL]:                                                                                                                 
hbmpc.peer0.io_1        | Exception:                                                                                                                                                                                
hbmpc.peer0.io_1        | <Task finished coro=<MPCProgRunner._mpc_loop() done, defined at /usr/src/HoneyBadgerMPC/apps/asynchromix2/mpcprogrunner.py:128> exception=HoneyBadgerMPCError('Batch reconstruction failed
!')>                                                                                                                                                                                                                
hbmpc.peer0.io_1        | Traceback (most recent call last):                                                                                                                                                        
hbmpc.peer0.io_1        |   File "/usr/src/HoneyBadgerMPC/apps/asynchromix2/mpcprogrunner.py", line 246, in _mpc_loop
hbmpc.peer0.io_1        |     result = await ctx._run()
hbmpc.peer0.io_1        |   File "/usr/src/HoneyBadgerMPC/honeybadgermpc/mpc.py", line 254, in _run
hbmpc.peer0.io_1        |     return result.result()
hbmpc.peer0.io_1        |   File "<string>", line 6, in prog
hbmpc.peer0.io_1        |   File "/usr/src/HoneyBadgerMPC/apps/asynchromix/butterfly_network.py", line 51, in iterated_butterfly_network
hbmpc.peer0.io_1        |     result = await batch_switch(ctx, xs_, ys_, k)
hbmpc.peer0.io_1        |   File "/usr/src/HoneyBadgerMPC/apps/asynchromix/butterfly_network.py", line 15, in batch_switch
hbmpc.peer0.io_1        |     ms = (await (sbits * (xs - ys)))._shares
hbmpc.peer0.io_1        |   File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run
hbmpc.peer0.io_1        |     self._context.run(self._callback, *self._args)
hbmpc.peer0.io_1        |   File "/usr/src/HoneyBadgerMPC/honeybadgermpc/utils/misc.py", line 18, in print_exception_callback
hbmpc.peer0.io_1        |     raise ex
hbmpc.peer0.io_1        |   File "/usr/src/HoneyBadgerMPC/honeybadgermpc/progs/mixins/base.py", line 49, in __call__
hbmpc.peer0.io_1        |     return await cls._prog(context, *args, **kwargs)
hbmpc.peer0.io_1        |   File "/usr/src/HoneyBadgerMPC/honeybadgermpc/progs/mixins/share_arithmetic.py", line 42, in _prog
hbmpc.peer0.io_1        |     f, g = await gather(*[(j - u).open(), (k - v).open()])
hbmpc.peer0.io_1        | honeybadgermpc.exceptions.HoneyBadgerMPCError: Batch reconstruction failed!
hbmpc.peer0.io_1        | )
hbmpc.peer0.io_1        | 2020-06-25 04:27:50,694:[preprocessor.py:_offline_mixes_loop:69]:[INFO]: [0] Bits finished in 10.578808546066284
amiller commented 4 years ago

Let's try to reproduce this in a non-random way. Everything in hbMPC that is random should also support running in a "pseudorandom" mode with a seed. The AsyncIO concurrency in single-process mode should be deterministic as well, no race conditions. We should aim to reproduce this and catch it for sure, it's unclear whether it's a bug in the application or in batch reconstruction.

amiller commented 4 years ago

We are closing this as we determined it was most likely caused by creating inconsistent fake preprocessing in a sharedata/ folder (in particular, by having every node attempt to create fake preprocessing data, leading to race conditions)