Significant Accuracy Decrease After FHE Execution

Sarahfbb commented 1 month ago

Summary

What happened/what you expected to happen?

Description

We've observed significant accuracy discrepancies when running our model with different FHE settings. The original PyTorch model achieves 63% accuracy. With FHE disabled, the accuracy drops to 50%, and with FHE execution enabled, it further decreases to 32%. The compilation uses a dummy input of shape (1, 100) with random values (numpy.random.randn(1, 100).astype(numpy.float32)). Since the accuracy with FHE disabled matches the quantized model's accuracy, it suggests that the accuracy loss from 63% to 50% is likely due to quantization. However, the substantial drop to 32% when enabling FHE execution indicates a potential issue with the FHE implementation or configuration that requires further investigation.

versions affected: concrete-ml 1.6.1
python version: 3.10
config (optional: HW, OS):
workaround (optional): if you’ve a way to workaround the issue
proposed fix (optional): if you’ve a way to fix the issue

Step by step procedure someone should follow to trigger the bug:

minimal POC to trigger the bug

```python print("Minimal POC to reproduce the bug") ![Screenshot from 2024-09-19 20-42-32](https://github.com/user-attachments/assets/84920f60-fa89-4223-930a-a5021e71a6e6) ![Screenshot from 2024-09-19 20-42-49](https://github.com/user-attachments/assets/f638bb7e-5efc-4ae7-9c1c-09b1fc518f3f) ![Screenshot from 2024-09-19 20-42-58](https://github.com/user-attachments/assets/361cf709-5674-4732-bdc1-c767b23a04ae) Andthe above screenshots are my compile process and process of performing the compile model on the encrypted data. I guess that the issue is related to FHE execution. And i write this file based on this open source code:https://github.com/zama-ai/concrete-ml/blob/main/docs/advanced_examples/ClientServer.ipynb ```

bcm-at-zama commented 1 month ago

Hello, could we have a GitHub repo to reproduce the problem, please? We need some code to reproduce. Thanks

jfrery commented 1 month ago

Hi @Sarahfbb,

As @bcm-at-zama says, it's much easier for us to help with some simple code to reproduce.

I will try to answer what I can from what you say:

The compilation uses a dummy input of shape (1, 100) with random values (numpy.random.randn(1, 100).astype(numpy.float32))

That is not ideal since it won't represent the actual distribution of your input. This can lead to a significant in quantization. Ideally we prefer having a small representative dataset here. Basically a few random points taken from the training set.

Since the accuracy with FHE disabled matches the quantized model's accuracy, it suggests that the accuracy loss from 63% to 50% is likely due to quantization.

That is correct. That first drop comes from quantization. You can study the effect of quantization on your model by playing around with the n_bits parameters. I advise you to set rounding_threshold_bits = 6 and check different n_bits.

However, the substantial drop to 32% when enabling FHE execution indicates a potential issue with the FHE implementation or configuration that requires further investigation.

That drop isn't expected unless you have changed the values of p_error in the FHE configuration somehow? This is the accuracy you get with fhe=simulate, right?

Sarahfbb commented 1 month ago

Hello, could we have a GitHub repo to reproduce the problem, please? We need some code to reproduce. Thanks

https://github.com/Sarahfbb/FHE/tree/main/S

Sarahfbb commented 1 month ago

Hi @Sarahfbb,

As @bcm-at-zama says, it's much easier for us to help with some simple code to reproduce.

I will try to answer what I can from what you say:

The compilation uses a dummy input of shape (1, 100) with random values (numpy.random.randn(1, 100).astype(numpy.float32))

That is not ideal since it won't represent the actual distribution of your input. This can lead to a significant in quantization. Ideally we prefer having a small representative dataset here. Basically a few random points taken from the training set.

Since the accuracy with FHE disabled matches the quantized model's accuracy, it suggests that the accuracy loss from 63% to 50% is likely due to quantization.

That is correct. That first drop comes from quantization. You can study the effect of quantization on your model by playing around with the n_bits parameters. I advise you to set rounding_threshold_bits = 6 and check different n_bits.

However, the substantial drop to 32% when enabling FHE execution indicates a potential issue with the FHE implementation or configuration that requires further investigation.

That drop isn't expected unless you have changed the values of p_error in the FHE configuration somehow? This is the accuracy you get with fhe=simulate, right?

Thanks you so much for your reply,I've change the visibility of this repository,and here is my repository:https://github.com/Sarahfbb/FHE/tree/main/S, the workline is :Extracted_features,S_training,S_qat_training,S_compilation,Batch_test. Btw the "S_qat_training" quantizing process.

But i don't think it's simulation,i tried the FHE mode directly

jfrery commented 1 month ago

Thanks for the code.

I can see that you use a polynomial approximation for the activation function. If you did that on purpose to make the FHE runtime faster then it's not going to work. Just using a simple torch activation function like relu sigmoid will run fine.

I am not sure where you are evaluating the quantized model from concrete-ml. I see you are evaluating the quantized torch model built with brevitas so I think that 50% is what you got from this evaluation? Once you compile you should get a quantized_module you can use this to do quantized_module.forward(X, fhe="simulate") to get the prediction you should have using FHE.

Batch_test seems to be running the evaluation using the deployment API instead. So unless I am mistaking, this 68% vs 50% is the torch on fp32 data vs brevitas qat torch model. Right?

Sarahfbb commented 1 month ago

Sure will have a try to modify the activation function as your advice,thanks a lot! And for the accuracies: 63% is the accuracy of the "S_training.py" 50% is the accuracy of the "FHE_Disable_Test.py"after"FHE_Disable_Compilation.py" 32% is the accuracy of "Batch_Test.py"after"S_compilation.py"

zama-ai / concrete-ml

Significant Accuracy Decrease After FHE Execution #875

Summary

Description