lschoe / mpyc

MPyC: Multiparty Computation in Python
MIT License
367 stars 76 forks source link

Questions about the implementation of secure ID3 #38

Closed DylanWangWQF closed 1 year ago

DylanWangWQF commented 1 year ago

Hi, @lschoe
I try to runmpyc for some benchmark results on secure ID3 decision tree training. I checked your paper "Practical Secure Decision Tree Learning in a Teletreatment Application", there are three algorithms that provide different security levels.

Currently, I'm not that familiar with the protocols in the paper In mpyc, it seems that you only implement SID3. https://github.com/lschoe/mpyc/blob/98766105a1567a59e35fab4234a59fc0a407118c/demos/id3gini.py#L18

Do we implement SID3T which provides a higher security level? I want to compare it with SID3T.

lschoe commented 1 year ago

Hi @DylanWangWQF

So, the demo only implements the SID3 protocol from the paper. The paper also reports on the implementations of SID3T and SID3P. These implementations were all done in VIFF at the time.

You may consider doing your own implementation of SID3T in MPyC, starting from the current demo, or rather starting from the NumPy variant np_id3gini.py (which is much faster). In the demo np_lpsolver.py you'll find two more examples of the use of the argmin() method with secure arrays.

DylanWangWQF commented 1 year ago

Hi @lschoe

BTW, are the original implementation in VIFF available currently?

lschoe commented 1 year ago

Hi @DylanWangWQF

OK, I'm attaching a copy (as a text file) of the Python 2 code ID3-Gini2T.py.txt that we used at the time for the paper. It's for the SID3T protocol. You may be able to write your own version of SID3T in MPyC based in this code.

This code ran on our TUE extension of VIFF, also relying on Marcel Keller's boost extension of VIFF, to get better performance at the time. The np_id3gini.py demo improved a lot since then, like for the KRKPA7 dataset it runs well below 4s on my somewhat dated i7-4790 desktop, compared to 46s reported in the paper on a slightly older i5-3470.

Also note that the code actually reveals the secretly computed decision tree, but this is not essential, only to be able to print it somewhat easily.

DylanWangWQF commented 1 year ago

@lschoe Thanks a lot! This really saves me time. I will try to implement a version in MPyC based on this code.