snwagh / falcon-public

Implementation of protocols in Falcon
89 stars 45 forks source link

Incomplete Implementation of getAccuracy() in NeuralNetwork.cpp #29

Open andeskyl opened 2 years ago

andeskyl commented 2 years ago

Hello, I am an undergraduate student who are working on privacy-preserving machine learning for my graduation thesis. By testing the code downloaded from the repo, I found that the getAccuracy() in NeuralNetwork.cpp is not implemented, and it always showed an accuracy of 100%, which is quite weird. Can anyone kindly provide some solution on how I can fix it? Thank you very much!

snwagh commented 2 years ago

Indeed, the function is not implemented and thus the bizarre accuracy numbers. I would not have the time to implement this but here's a brief code logic for the simplest way to get this working (parts of which should already be implemented):

Let me know if you need clarification on any of these. If you manage to complete this, do a pull request and I can merge it to this repo.

andeskyl commented 2 years ago

Thank you for your reply! I have implemented the getAccuracy() function and confirmed the correctness of my algorithm. However, I encountered another issue when I am testing the code with SecureML model and MNIST dataset. In particular, I parsed the MNIST dataset downloaded from the internet and modified the train() function in secondary.cpp to print out the training and testing accuracy after each iteration. Yet, I found that the training and testing accuracy is low and does not move after the first iterations, as shown in the figure below. In addition, I found that the weights and biases of the FC layer of the network does not change even the updateEquation() has called. Is there any possible reason for this issue and how can I fix it? Thanks a lot!

Training result
snwagh commented 2 years ago

So I think there must be an issue with the training. 9/11 % indicates that the model is outputting random values so you need to do a bit of parameter tuning.

What is the learning rate you're using? Fixed-point precision? And the weight initialization?

andeskyl commented 2 years ago

I does not change any code in FCLayer.cpp and the hyperparameters provided in global.h (except I changed the NUM_ITERATIONS to 5). So the LOG_LEARNING_RATE is 5, FLOAT_PRECISION is 13, LEARNING_RATE is (1 << (FLOAT_PRECISION - LOG_LEARNING_RATE)), the weight and bias is initialized to all 0.

snwagh commented 2 years ago

The weight initialization makes a big difference. Biases set to 0 are fine. For weights, ideally you would like to use Kaiming He initialization but you can use a more "hacky" form of it using something similar to the code provided in FCLayer.cpp

You have to set two things, one generate random "small" values (about 0.001). And second you need to ensure the RSS constraint is met, that each pair of parties shares one of these random values so the randomness generated should be common randomness (for ease, you can set the other two RSS shares to be zero but one of the non-zero RSS share components has to be generated randomly).

andeskyl commented 2 years ago

I have changed the weight initialization to initialize weight randomly by following the idea of FCLayer.cpp. Now the weight of the first two FC layer is updating throughout the iterations. However, the third FC layer is still not updating (I get a weird observation that the deltaWeight variable of the third FC layer of SecureML always remain all 0). As a result, the training and testing accuracy is still very low and flutuating.

I have also tried to include more training and testing data and also different value of LOG_LEARNING_RATE in global.h, but it does not help on improving the accuracy.

train
snwagh commented 2 years ago

I think I know what's causing this. The 32 bit space is too small for the entire training. Try setting myType to uint64_t and increasing the fixed point precision to about 20 bits.

andeskyl commented 2 years ago

I changed myType to uint64_t and FLOAT_PRECISION to 20 and observed something like gradient exposion. I also tried to adjust the LOG_LEARNING_RATE from 5 to 19, but the weights and deltaWeights is still in a large value.

train
snwagh commented 2 years ago

Can you print all the weights and activations for the first forward and backward pass? The weights seem already to have overflown. With 20 bits of floating precision, any integer above 1000 would result in an overflow. LOG_LEARNING_RATE of 19 seems to be too high.

andeskyl commented 2 years ago

The figure I attached above is using LOG_MINI_BATCH 7 and LOG_LEARNING_RATE 5. Since the size of weights and deltaWeights is quite large, I will try to attach as much as possible. The figure below is the first weights and deltaWeights with LOG_MINI_BATCH 3 and LOG_LEARNING_RATE 5.

weight deltaweight
snwagh commented 2 years ago

I would recommend you don't print the entire sets. Print only the first 10 input, output values (and other values) for each layer. Right now the weights seem fine, deltaWeight doesn't look right but it is hard to say why -- (1) I don't know which layer the above variables are printed (assuming it is SecureML network, is the first, second, or third FCLayer) (2) The other inputs/outputs to this computation particularly, the delta calculation (3) Finally, are you using with or without normalization (if I remember correctly, you probably want the control flow to use this part of the code)?

So print an output (only of the FC layers or since it is a small network you can print all the layers, including the ReLUs) in the following manner that contains the first 10 samples of: Forward:

Backward:

llCurious commented 2 years ago

Hey, i am also working on this part. Could @AndesPooh258 init a PR or put the link to your github repo? Thanks a lot!!

andeskyl commented 2 years ago

I would recommend you don't print the entire sets. Print only the first 10 input, output values (and other values) for each layer. Right now the weights seem fine, deltaWeight doesn't look right but it is hard to say why -- (1) I don't know which layer the above variables are printed (assuming it is SecureML network, is the first, second, or third FCLayer) (2) The other inputs/outputs to this computation particularly, the delta calculation (3) Finally, are you using with or without normalization (if I remember correctly, you probably want the control flow to use this part of the code)?

As I am currently dealing with multiple deadlines, for (1) and (2) I will do the testing as soon as I completed these deadlines. For (3) I am current setting WITH_NORMALIZATION as true.

Hey, i am also working on this part. Could @AndesPooh258 init a PR or put the link to your github repo? Thanks a lot!!

I have made a fork for this repo here.

andeskyl commented 2 years ago

Batch input (first 2000 input):

Input

Forward of FC layer 1 (first 200 elements of weight and activations):

forward FC 1

Forward of FC layer 2 (first 200 elements of weight and activations):

forward FC 2

Forward of FC layer 3 (first 200 elements of weight and activations):

forward FC 3

Update equation of FC layer 1 (first 100 elements of weight, deltas, and deltaWeights):

update FC 1

Update equation of FC layer 2 (first 100 elements of weight, deltas, and deltaWeights):

update FC 2

Update equation of FC layer 3 (first 100 elements of weight, deltas, and deltaWeights):

update FC 3
snwagh commented 2 years ago

Good chance the issue is caused because of non-normalized inputs. Can you try after converting the inputs between 0-1 as floats (by default MNIST has 0-255 range values)?

andeskyl commented 2 years ago

I have modified the MNISTParse.c to convert the inputs between 0-1 as floats. Now the weight and activation of the first iterations become normal. However, the deltas and deltaWeight in updateEquation is still very large. Therefore, the weight become large after the first update.

Forward (first 200 elements of weight and activations):

forward

Update Equation (first 200 elements of weight and activations):

update equation
snwagh commented 2 years ago

Right, can you now print the weights/input/output of each FCLayer? So until some work is done on automating this/building more software, unfortunately we're stuck with this "looks reasonable" debugging. To help you break the process down further, you can do the following: