zama-ai / concrete-ml

Concrete ML: Privacy Preserving ML framework using Fully Homomorphic Encryption (FHE), built on top of Concrete, with bindings to traditional ML frameworks.
Other
983 stars 143 forks source link

Significant Accuracy Decrease After FHE Execution #875

Open Sarahfbb opened 1 month ago

Sarahfbb commented 1 month ago

Summary

What happened/what you expected to happen?

Description

We've observed significant accuracy discrepancies when running our model with different FHE settings. The original PyTorch model achieves 63% accuracy. With FHE disabled, the accuracy drops to 50%, and with FHE execution enabled, it further decreases to 32%. The compilation uses a dummy input of shape (1, 100) with random values (numpy.random.randn(1, 100).astype(numpy.float32)). Since the accuracy with FHE disabled matches the quantized model's accuracy, it suggests that the accuracy loss from 63% to 50% is likely due to quantization. However, the substantial drop to 32% when enabling FHE execution indicates a potential issue with the FHE implementation or configuration that requires further investigation.

Step by step procedure someone should follow to trigger the bug:

minimal POC to trigger the bug

```python print("Minimal POC to reproduce the bug") ![Screenshot from 2024-09-19 20-42-32](https://github.com/user-attachments/assets/84920f60-fa89-4223-930a-a5021e71a6e6) ![Screenshot from 2024-09-19 20-42-49](https://github.com/user-attachments/assets/f638bb7e-5efc-4ae7-9c1c-09b1fc518f3f) ![Screenshot from 2024-09-19 20-42-58](https://github.com/user-attachments/assets/361cf709-5674-4732-bdc1-c767b23a04ae) Andthe above screenshots are my compile process and process of performing the compile model on the encrypted data. I guess that the issue is related to FHE execution. And i write this file based on this open source code:https://github.com/zama-ai/concrete-ml/blob/main/docs/advanced_examples/ClientServer.ipynb ```

bcm-at-zama commented 1 month ago

Hello, could we have a GitHub repo to reproduce the problem, please? We need some code to reproduce. Thanks

jfrery commented 1 month ago

Hi @Sarahfbb,

As @bcm-at-zama says, it's much easier for us to help with some simple code to reproduce.

I will try to answer what I can from what you say:

The compilation uses a dummy input of shape (1, 100) with random values (numpy.random.randn(1, 100).astype(numpy.float32))

That is not ideal since it won't represent the actual distribution of your input. This can lead to a significant in quantization. Ideally we prefer having a small representative dataset here. Basically a few random points taken from the training set.

Since the accuracy with FHE disabled matches the quantized model's accuracy, it suggests that the accuracy loss from 63% to 50% is likely due to quantization.

That is correct. That first drop comes from quantization. You can study the effect of quantization on your model by playing around with the n_bits parameters. I advise you to set rounding_threshold_bits = 6 and check different n_bits.

However, the substantial drop to 32% when enabling FHE execution indicates a potential issue with the FHE implementation or configuration that requires further investigation.

That drop isn't expected unless you have changed the values of p_error in the FHE configuration somehow? This is the accuracy you get with fhe=simulate, right?

Sarahfbb commented 1 month ago

Hello, could we have a GitHub repo to reproduce the problem, please? We need some code to reproduce. Thanks

https://github.com/Sarahfbb/FHE/tree/main/S

Sarahfbb commented 1 month ago

Hi @Sarahfbb,

As @bcm-at-zama says, it's much easier for us to help with some simple code to reproduce.

I will try to answer what I can from what you say:

The compilation uses a dummy input of shape (1, 100) with random values (numpy.random.randn(1, 100).astype(numpy.float32))

That is not ideal since it won't represent the actual distribution of your input. This can lead to a significant in quantization. Ideally we prefer having a small representative dataset here. Basically a few random points taken from the training set.

Since the accuracy with FHE disabled matches the quantized model's accuracy, it suggests that the accuracy loss from 63% to 50% is likely due to quantization.

That is correct. That first drop comes from quantization. You can study the effect of quantization on your model by playing around with the n_bits parameters. I advise you to set rounding_threshold_bits = 6 and check different n_bits.

However, the substantial drop to 32% when enabling FHE execution indicates a potential issue with the FHE implementation or configuration that requires further investigation.

That drop isn't expected unless you have changed the values of p_error in the FHE configuration somehow? This is the accuracy you get with fhe=simulate, right?

Thanks you so much for your reply,I've change the visibility of this repository,and here is my repository:https://github.com/Sarahfbb/FHE/tree/main/S, the workline is :Extracted_features,S_training,S_qat_training,S_compilation,Batch_test. Btw the "S_qat_training" quantizing process.

But i don't think it's simulation,i tried the FHE mode directly

jfrery commented 1 month ago

Thanks for the code.

I can see that you use a polynomial approximation for the activation function. If you did that on purpose to make the FHE runtime faster then it's not going to work. Just using a simple torch activation function like relu sigmoid will run fine.

I am not sure where you are evaluating the quantized model from concrete-ml. I see you are evaluating the quantized torch model built with brevitas so I think that 50% is what you got from this evaluation? Once you compile you should get a quantized_module you can use this to do quantized_module.forward(X, fhe="simulate") to get the prediction you should have using FHE.

Batch_test seems to be running the evaluation using the deployment API instead. So unless I am mistaking, this 68% vs 50% is the torch on fp32 data vs brevitas qat torch model. Right?

Sarahfbb commented 1 month ago

Sure will have a try to modify the activation function as your advice,thanks a lot! And for the accuracies: 63% is the accuracy of the "S_training.py" 50% is the accuracy of the "FHE_Disable_Test.py"after"FHE_Disable_Compilation.py" 32% is the accuracy of "Batch_Test.py"after"S_compilation.py"