How to improve training accuracy?

ciciwell commented 3 years ago

Now I am trying to use this codebase to realize the training of the "SecureML" neural network over the MNIST dataset, but I have encountered some problems. Please could you advise me what I should do?

I fixed the data importing of the codebase, and the training label and testing label used the one-hot representation. could you help me confirm that this input format is correct? In addition, do I need to perform other preprocessing on the training data before importing the data?
I used the He-Initialization method to initialize the parameters(weights).
The following figure shows the training accuracy of 10 epochs, and the result is very unsatisfactory. Could you give me some valuable suggestions?
In addition, Can functions "backward","computeDelta" and "updateEquations" in the codebase be used directly?

snwagh commented 3 years ago

Hi, sorry about the delay in responding. Here's some inputs:

The input format is correct, labels should be passed as one-hot encoding. For the training, you also need to normalize/transform the values between 0-1 (compare with the transform=transforms.ToTensor() argument in the pyTorch code or another general reference).
He-Initialization should be fine, though make sure you implement it correctly (i.e., the reconstructed weights should be He initialized instead of the shares and thus parties need to have RSS of the weights). It might be easier to start with an initialization that is the same for all layers and set according to He initialization corresponding to the first layer but the former approach is more general and would be better.
Sure, training is not-easy and I wish I had more hands to build more tools for better training experience. Till then here are a few suggestions:
- For one, there is a fair bit of parameter tuning in this right now so the first important suggestion is to uncomment this line and comment out the next one and only run one or two forward, backward passes (you'll in all likelihood need multiple attempts before you want to throw all the 10 epochs at it). The idea is with this simple dataset, you should start to see an accuracy above 40% or so after a few passes only.
- The drop after the first forward pass indicates that the backprop might have an issue. The backprop is not thoroughly tested but what I suspect is happening is that the fact that in backprop we need to truncate by a large value (which is roughly 13+5+7=25 bits). This might be too much for the 32 bit space so try to reduce the floating precision and the batch size (and maybe the learning rate) by a bit and see if that gives you something.
- Finally, without normalization, training is hard, particularly in MPC. So you might be more sensitive to the initial weights so make sure that part is implemented correctly.
Can you describe how you're trying to call these functions? Generally, if used within the usual framework you should not need access to them. From a C++ point of view, I'm sure you should be able to have a workaround to use them but I would wait to hear from you on how you need to use them.

ciciwell commented 3 years ago

Your help was very much appreciated. For the training of the "SecureML" neural network, I directly used the following functions in secondary.cpp. net->forward(); net->backward(); When using the net->backward() function, I also directly called the computeDelta() function and the updateEquations() function. However, I found that the funcDivision() function in the "computeDelta()" can only calculate a single value, and I am now trying to modify it. In addition, when backward() function is executed, the variable of the learning rate in the globals.h does not seem to be used. Can you tell me if I missed something, or what further changes need to be made?

snwagh commented 3 years ago

Yeah, the train function in secondary.cpp should be the right one.

And you're right, the division function is yet to be implemented in full generality. Thanks for working on it, feel free to check in your edits.

Finally, about the learning rate, each layer should use the LOG_LEARNING_RATE variable in its truncation (I guess learning rate variable is legacy).

This is all I can think of for now. My suggestions for your third question initially (quoted below) should be useful. If you have more issues/questions, feel free to open up this issue again.

3. Sure, training is not-easy and I wish I had more hands to build more tools for better training experience. Till then here are a few suggestions:

   * For one, there is a fair bit of parameter tuning in this right now so the first important suggestion is to uncomment [this line](https://github.com/snwagh/falcon-public/blob/master/src/globals.h#L49) and comment out the next one and only run one or two forward, backward passes (you'll in all likelihood need multiple attempts before you want to throw all the 10 epochs at it). The idea is with this simple dataset, you should start to see an accuracy above 40% or so after a few passes only.
   * The drop after the first forward pass indicates that the backprop might have an issue. The backprop is not thoroughly tested but what I suspect is happening is that the fact that in backprop we need to [truncate by a large value](https://github.com/snwagh/falcon-public/blob/master/src/FCLayer.cpp#L116-L120) (which is roughly 13+5+7=25 bits). This might be too much for the 32 bit space so try to reduce the floating precision and the batch size (and maybe the learning rate) by a bit and see if that gives you something.
   * Finally, without normalization, training is hard, particularly in MPC. So you might be more sensitive to the initial weights so make sure that part is implemented correctly.

ciciwell commented 3 years ago

Hi snwagh， Thank you very much for your continued help to me.

Now I am trying to use this codebase to realize the training of the "MiniONN" over the MNIST dataset. In CNNLayer.cpp，about TODO content is not very clear, can you give me some hints? In addition, about the realization of MiniONN, could you give some other suggestions?

snwagh commented 3 years ago

About that TODO, it must be something left over from one of the commits. I think I have completed that piece of code but that only thing that is left is a thorough testing of that component. So effectively, if you backprop through a Conv layer over plaintext (say through PyTorch), does the output of the MPC code agree with that of the PyTorch code (this has to be tested).

What do you mean by realization? Do you mean any pointers on how to get it trained with good accuracy? I think the general principles for that will be the same for all the smaller networks. One thing you can do is try to run one forward backward pass on the plaintext code and then compare that with one iteration of MPC code. If they agree, even one epoch should give you about 95% accuracy (because that is how much the plaintext gives and my hunch is that MPC approximations will not change it that much).

ciciwell commented 3 years ago

thanks for the reply! Next, I will test according to your suggestions.

aanand300 commented 2 years ago

Hi @ciciwell Can you please help me figure out how you are able to get the rolling accuracy which can be seen in the screenshot that you posted above? I'm facing some trouble with input data

snwagh / falcon-public

How to improve training accuracy? #8