Open ciciwell opened 3 years ago
Hi, sorry about the delay in responding. Here's some inputs:
transform=transforms.ToTensor()
argument in the pyTorch code or another general reference). Your help was very much appreciated.
For the training of the "SecureML" neural network, I directly used the following functions in secondary.cpp.
net->forward(); net->backward();
When using the net->backward() function, I also directly called the computeDelta() function and the updateEquations() function.
However, I found that the funcDivision() function in the "computeDelta()" can only calculate a single value, and I am now trying to modify it.
In addition, when backward() function is executed, the variable of the learning rate in the globals.h does not seem to be used.
Can you tell me if I missed something, or what further changes need to be made?
Yeah, the train function in secondary.cpp should be the right one.
And you're right, the division function is yet to be implemented in full generality. Thanks for working on it, feel free to check in your edits.
Finally, about the learning rate, each layer should use the LOG_LEARNING_RATE
variable in its truncation (I guess learning rate variable is legacy).
This is all I can think of for now. My suggestions for your third question initially (quoted below) should be useful. If you have more issues/questions, feel free to open up this issue again.
3. Sure, training is not-easy and I wish I had more hands to build more tools for better training experience. Till then here are a few suggestions: * For one, there is a fair bit of parameter tuning in this right now so the first important suggestion is to uncomment [this line](https://github.com/snwagh/falcon-public/blob/master/src/globals.h#L49) and comment out the next one and only run one or two forward, backward passes (you'll in all likelihood need multiple attempts before you want to throw all the 10 epochs at it). The idea is with this simple dataset, you should start to see an accuracy above 40% or so after a few passes only. * The drop after the first forward pass indicates that the backprop might have an issue. The backprop is not thoroughly tested but what I suspect is happening is that the fact that in backprop we need to [truncate by a large value](https://github.com/snwagh/falcon-public/blob/master/src/FCLayer.cpp#L116-L120) (which is roughly 13+5+7=25 bits). This might be too much for the 32 bit space so try to reduce the floating precision and the batch size (and maybe the learning rate) by a bit and see if that gives you something. * Finally, without normalization, training is hard, particularly in MPC. So you might be more sensitive to the initial weights so make sure that part is implemented correctly.
Hi snwagh, Thank you very much for your continued help to me.
Now I am trying to use this codebase to realize the training of the "MiniONN" over the MNIST dataset. In CNNLayer.cpp,about TODO content is not very clear, can you give me some hints? In addition, about the realization of MiniONN, could you give some other suggestions?
About that TODO, it must be something left over from one of the commits. I think I have completed that piece of code but that only thing that is left is a thorough testing of that component. So effectively, if you backprop through a Conv layer over plaintext (say through PyTorch), does the output of the MPC code agree with that of the PyTorch code (this has to be tested).
What do you mean by realization? Do you mean any pointers on how to get it trained with good accuracy? I think the general principles for that will be the same for all the smaller networks. One thing you can do is try to run one forward backward pass on the plaintext code and then compare that with one iteration of MPC code. If they agree, even one epoch should give you about 95% accuracy (because that is how much the plaintext gives and my hunch is that MPC approximations will not change it that much).
thanks for the reply! Next, I will test according to your suggestions.
Now I am trying to use this codebase to realize the training of the "SecureML" neural network over the MNIST dataset, but I have encountered some problems. Please could you advise me what I should do?
I fixed the data importing of the codebase, and the training label and testing label used the one-hot representation. could you help me confirm that this input format is correct? In addition, do I need to perform other preprocessing on the training data before importing the data?
I used the He-Initialization method to initialize the parameters(weights).
The following figure shows the training accuracy of 10 epochs, and the result is very unsatisfactory. Could you give me some valuable suggestions?
In addition, Can functions "backward","computeDelta" and "updateEquations" in the codebase be used directly?