lschoe / mpyc

MPyC: Multiparty Computation in Python
MIT License
367 stars 76 forks source link

Sorting a list of tuples #37

Closed MarcT0K closed 1 year ago

MarcT0K commented 1 year ago

Hi, I am doing some experiments using MPyC and I am stuck on a little problem: I would like to sort a list of tuples but I have an error. I don't understand yet all the details of the library so I would like your insights before implementing my sorting procedure manually.

Here is a minimal example of the problem:

from mpyc.runtime import mpc
from mpyc.seclists import seclist

def example_tuple_sort():
    secint = mpc.SecInt(64)
    res = seclist([], sectype=seclist)
    res.append(seclist([secint(1), secint(2)], sectype=secint))
    res.append(seclist([secint(0), secint(4)], sectype=secint))
    res.sort(key=lambda tup: tup[1])

If you try to run this function, you obtain the following error:

<ipython-input-235-ca59bed29c48> in example_tuple_sort()
      4     res.append(seclist([secint(1), secint(2)], sectype=secint))
      5     res.append(seclist([secint(0), secint(4)], sectype=secint))
----> 6     res.sort(key=lambda tup: tup[1])
      7 

~/.local/lib/python3.10/site-packages/mpyc/seclists.py in sort(self, key, reverse)
    331         if key is None:
    332             key = lambda a: a
--> 333         runtime._sort(self, key)
    334         if reverse:
    335             self.reverse()

~/.local/lib/python3.10/site-packages/mpyc/runtime.py in _sort(self, x, key)
   1113                     if i & p == r:
   1114                         a, b = x[i], x[i + d]
-> 1115                         x[i], x[i + d] = self.if_swap(key(a) >= key(b), a, b)
   1116                 d, q, r = q - p, q >> 1, p
   1117             p >>= 1

~/.local/lib/python3.10/site-packages/mpyc/runtime.py in if_swap(self, c, x, y)
   1641 
   1642         if isinstance(x, list):
-> 1643             z = self._if_swap_list(c, x, y)
   1644         else:
   1645             d = c * (y - x)

~/.local/lib/python3.10/site-packages/mpyc/asyncoro.py in typed_asyncoro(*args, **kwargs)
    384             while True:
    385                 try:
--> 386                     coro.send(None)
    387                 except StopIteration as exc:
    388                     runtime._pc_level -= 1

~/.local/lib/python3.10/site-packages/mpyc/runtime.py in _if_swap_list(self, a, x, y)
   1622             await self.returnType((stype, x[0].integral and y[0].integral), 2, n)
   1623 
-> 1624         a, x, y = await self.gather(a, x, y)
   1625         if f:
   1626             a = a >> f  # NB: no in-place rshift!

~/.local/lib/python3.10/site-packages/mpyc/runtime.py in gather(self, *obj)
    146 
    147     def gather(self, *obj):
--> 148         return asyncoro.gather_shares(self, *obj)
    149 
    150     async def barrier(self, name=None):

~/.local/lib/python3.10/site-packages/mpyc/asyncoro.py in gather_shares(rt, *obj)
    223         return _SharesCounter(rt._loop, obj)
    224 
--> 225     return _AwaitableFuture(_get_results(obj))
    226 
    227 

~/.local/lib/python3.10/site-packages/mpyc/asyncoro.py in _get_results(obj)
    198 
    199     if isinstance(obj, (list, tuple)):
--> 200         return type(obj)(map(_get_results, obj))
    201 
    202     return obj

~/.local/lib/python3.10/site-packages/mpyc/asyncoro.py in _get_results(obj)
    198 
    199     if isinstance(obj, (list, tuple)):
--> 200         return type(obj)(map(_get_results, obj))
    201 
    202     return obj

~/.local/lib/python3.10/site-packages/mpyc/seclists.py in __init__(self, x, sectype)
     60 
     61         if sectype is None:
---> 62             raise ValueError('sectype missing')
     63 
     64         i = 0

If I understand well, it seems like the sorting procedure swaps the lists and re-instantiates the seclist object, but the parameter sectype is not provided which causes the error. By the way, if you give me some instructions, I can contribute to the repo and fix this potential bug (I am not sure whether the problem is a bug or my code for now).

Thank you for your help and thank you for this great library! :)

lschoe commented 1 year ago

Hi, a quick reply to help you on the right track hopefully.

If you want to sort length-2 tuples of secure integers on the second component, you can do it like this:

from mpyc.runtime import mpc

secint = mpc.SecInt(64)
res = []
res.append([secint(0), secint(4)])
res.append([secint(1), secint(2)])
res = mpc.sorted(res, key=lambda pair: pair[1])

Rather than Python tuples, you need to use Python lists here, as entries of res, so I've renamed "tup"to "pair". The call to mpc.sorted() then sorts the pairs on their second component.

Using seclist for lists of pairs is not supported, as the elements of a seclist are currently restricted to secure numbers (int, fxp, fld).

MarcT0K commented 1 year ago

Thank you very much for your quick reply. Initially, I had used seclist to have a secure sort but I hadn't noticed that mpc.sorted could perform a secure sort on Python lists. I tried a few things and I saw that I can only sort on one coordinate at a time: I cannot write mpc.sorted(res, key= lambda pair: (pair[1], pair[0])) as in Python sorted function. If yes, how would you suggest extending the existing code: create a SecureTuple class that have a dedicated comparison operation?

Later, in my work, I will probably need to shuffle a list of tuples. I saw that there is a shuffle function but it takes as input a sectype. Is there a trick/function I should know to shuffle a list of elements?

lschoe commented 1 year ago

Yeah, right, to sort for example pairs of integers lexicographically, convert the pairs to instances of a class that overloads lt() with the appropriate comparison. The demo for ID3 decision trees shows an example of such a class: https://github.com/lschoe/mpyc/blob/98766105a1567a59e35fab4234a59fc0a407118c/demos/id3gini.py#L100-L105 The class is then used as key for sorting operations: https://github.com/lschoe/mpyc/blob/98766105a1567a59e35fab4234a59fc0a407118c/demos/id3gini.py#L79

For shuffling I've extended the code for mpc.random.shuffle() a bit to handle your case of shuffling a list of tuples. Instead of tuples you need to use lists like above: https://github.com/lschoe/mpyc/blob/98766105a1567a59e35fab4234a59fc0a407118c/tests/test_random.py#L39-L46 That may already cover what you'll need?

MarcT0K commented 1 year ago

Thank you very much for your precious help.

Following your suggestion, I was able to sort my list :smile:

Moreover, I tried the shuffle function and it works!

I have no remaining blocking points but I have two last questions to better understand the philosophy of MPyC:

  1. The shuffle function takes as input a secure type. Why is the type necessary? (e.g., I can't shuffle tuples storing elements of different types) Is it mandatory for security reasons?
  2. The output of comparison of secure types (e.g., secint(3) == secint(2)) cannot be used in boolean operations. I implemented my boolean operations using arithmetical operators (e.g., a or b is implemented as a + b - a*b) but I would like to know whether there are more optimized ways to compute boolean operators. Should I use mpc.all([a,b]) for and operator and mpc.any([a,b]) for or operator?
lschoe commented 1 year ago

OK, great! You may try to cover other cases of shuffling yourself, like tuples of mixed type, see below, where functions like mpc.convert() will come in handy. More efficient protocols for shuffling are also projected, like based on the results in Chapter 3 of Niels de Vreede's PhD thesis Assorted algorithms and protocols for secure computation, with follow-up work in Noah van der Meer's Bachelor thesis Root Finding over Finite Fields for Secure Multiparty Computation.

The philosophy behind MPyC is quite extensive I would say, but one aspect is to stay as close as possible to Python. So, that is also part of the answers to your questions here.

  1. A basic reason why mpc.random.shuffle() takes a sectype as first argument is to make it consistent with the other functions in mpc.random, which all have a sectype as first argument. The sectype argument can be used in calls like mpc.random.shuffle(secint, list(range(11))) to shuffle a publicly known list, which will be returned as a list of secure integers. You are right that the sectype may often be inferred from the elements in the given list. And that would also allow tuples with items of different secure types, e.g., a list of tuples all of type tuple[secint, secfxp]. This should be supported at some point in MPyC, as the goal is to support all list/tuple operations for heterogenous data, as long as these make sense in an MPC setting. Shuffling a list of type list[tuple[secint, secfxp]] makes sense, but for a list of type list[Union[secint, secfxp]] we would anyway see the type of each element (secint or secfxp) and it need not make that much sense to shuffle such a list (although the order of elements of the same type is still random). For strictly homogeneous data, there will be a random shuffle for secure Numpy arrays.

  2. Things like if secint(3) == secint(2) are not allowed indeed, because the outcome of a secure comparison cannot be used to publicly branch on it. (Meilof Veeningen has made a package oblif that actually supports oblivious brancing for MPyC.) For boolean arithmetic with secure integers (containing bits) it's best to use the overloaded operators &, | , ~, like this a = secint(1); b = secint(0); c = ~(a & b) == ~a | ~b. These operations are already efficient.

MarcT0K commented 1 year ago

This answers all my questions, thank you very much!