Can you give the training data file and the training output file (used by Post_processing.ipynb)?

CHEN-CONGCONG commented 4 years ago

As used in the picture：

1、Post_processing.ipynb：

2、secondary.cpp

snwagh commented 4 years ago

@MrChencc, the files/train_data_* and others contain just a few MNIST samples are were used only for debugging purposes and won't be of much use. To reproduce the correctness runs, I've provided pre-trained data for SecureML network architecture (in files/preload/SecureML/). Once you run the MPC inference, the SecureML.txt file will be output by the code. You need to path it correctly to the Post_processing.ipynb script.

In general, you can take a pre-trained network and use/modify the code in secondary.cpp appropriately to import the pre-trained model as well as a sample batch size. You can then use the same data in the plaintext model trained on any platform such as TF or PyTorch and compare its output with the output of the MPC. The Post_processing.ipynb is a small script to count the number of misclassifications after appropriate fixed-point to float conversions. If you need to reproduce the results for the other networks as well, I can try to retrieve the data from the servers.

CHEN-CONGCONG commented 4 years ago

@snwagh ,

Thank you very much, it helped me a lot.

You said that the files/train_data_* only for debugging, so how do I use the complete datasets like MNIST to generate a pre-trained model. It seems that all parties in the code use files/train_data_A and files/train_data_B.

And if you are convenient, I would like other pre-trained network data like files/preload/SecureML/.

Thanks again.

snwagh commented 4 years ago

Good question, this is to be implemented I see the comment in the code also indicates likewise. I won't have the time to implement this but I can give you pointers and you can submit a merge request once you have a working prototype.

You will need to specify the paths from where each party receives its inputs (shares of training data, training label, testing data, testing label). You can tweak the paths but it would be better if you can use the reference code in function preload_network (here) to receive each party's share. The SecureNN codebase might have something useful for this, with tweaking for RSS, might be useful in parsing real data into simple shares (for instance (data, 0, 0) format in RSS).

The pre-trained models provided in the repo are generated are done using PyTorch (for instance here). To actually train the model in C++, you will have to run the train function (after you fix the data importing). Once again, it won't have very high accuracy out of the box but anything about 30% should show non-trivial leaning and be used as an indication that things are working. With a bit more tweaking you should be able to get higher accuracies.

To test the preloaded networks, just uncomment these lines (here), remove the training lines and add the testing lines in (here).

Finally, I will try to add the other pretrained data as well. If I don't do it within a week, please go ahead and reopen this issue.

CHEN-CONGCONG commented 4 years ago

Okay, thanks for the reply, I will try to experiment next.

snwagh commented 4 years ago

@MrChencc, I've added the other pre-trained data (here), this should help you reproduce the other results as well.

snwagh / falcon-public

Can you give the training data file and the training output file (used by Post_processing.ipynb)? #4