[Question] How does ReLU work in the new NN example

vincehong commented 2 months ago

Congratulations on your new results in https://www.zama.ai/post/making-fhe-faster-for-ml-beating-our-previous-paper-benchmarks-with-concrete-ml ! We wonder if more details about the underlying improvements could be described?

For example, the printed number of PBS in NN-20 in https://github.com/zama-ai/concrete-ml/blob/main/use_case_examples/white_paper_experiment/WhitePaperExperiments.ipynb is 2440=784+18x92, this means only one PBS is required to implement a ReLU, which is counter-intuitive: Since the model is quantized to 6 bits, the result W*X would be around 14bits, how can we use one PBS to do ReLU on 14bits inputs?

Thanks.

andrei-stoian-zama commented 2 months ago

thanks!

The underlying representation used by Concrete is variable sized integers - the message space can contain up to 20-30bits. However, the PBS will only work efficiently for integers up to 6-8 bits (it can go up to 16 but it's slower). It is possible, through what we call "approximate rounding" to only apply PBS to the desired number of MSBs of a high bitwidth accumulator (the 6 msbs of the 14 bits accumulator).

A PBS refreshes noise but also applies a table lookup to the value it processes. Thus, when applying PBS we get the RELU evaluation for free.

Using only the MSBs of the accumulator works well because of quantization: quantizing a value implies dividing it by a scale factor. This division can be thought of as dividing by a power of two and by another smaller scale factor. Dividing by a power of two is actually removing LSBs.

vincehong commented 2 months ago

Thanks for the fast reply!

Dividing by a power of two is actually removing LSBs.

But removing LSB will also cost some PBS?

andrei-stoian-zama commented 2 months ago

There are two approaches to removing LSBs:

exact rounding: will use as many 1-b PBS as LSBs you want to remove
approximate: will simply ignore the LSBs during the PBS - but it adds some a probability of off-by-one in the result of the PBS

vincehong commented 2 months ago

There are two approaches to removing LSBs:

exact rounding: will use as many 1-b PBS as LSBs you want to remove

approximate: will simply ignore the LSBs during the PBS - but it adds some a probability of off-by-one in the result of the PBS

Ah that's the point, thanks! Have you tested the impact of such approximation errors? Since the final FHE accuracy 95.8% is only evaluated in fhe="simulate".

andrei-stoian-zama commented 2 months ago

FHE simulation takes into account any impact of the noise so you can be confident that it will represent FHE accuracy well. We also ran 100 samples with FHE to be sure, the accuracy was preserved.

vincehong commented 1 month ago

I change the following line simulate_predictions = q_module.forward(data, fhe="simulate") into simulate_predictions = q_module.forward(data, fhe="execute") in https://github.com/zama-ai/concrete-ml/blob/main/use_case_examples/white_paper_experiment/WhitePaperExperiments.ipynb

The result is as follows:

Running NN-20 on a 128-core machine machine:Accuracy in fp32 : 98.067% for the test set Accuracy with FHE-simulation mode : 94.241% for the test set FHE Latency on encrypted data : 2.197s per encrypted sample. Number of PBS: 2440 Running NN-50 on a 128-core machine machine:Accuracy in fp32 : 97.446% for the test set Accuracy with FHE-simulation mode : 91.336% for the test set FHE Latency on encrypted data : 5.574s per encrypted sample. Number of PBS: 5200

So I am wondering:

The results will be different for simulate mode and execute mode.
Using approximate PBS in ReLU will cause a non-negligible effect on the accuracy.

Update:

I also give the unmodified simulated mode for reference: simulate_predictions = q_module.forward(data, fhe="simulate") The results are:

Accuracy with FHE-simulation mode : 96.244% for the test set FHE Latency on encrypted data : 6.562s per encrypted sample. Number of PBS: 2440 Running NN-50 on a 128-core machine machine:Accuracy in fp32 : 97.446% for the test set Accuracy with FHE-simulation mode : 95.032% for the test set FHE Latency on encrypted data : 15.127s per encrypted sample. Number of PBS: 5200

vincehong commented 1 week ago

Saw this issue solved in 1.7.0. Now simulated mode correctly reflects the accuracy loss. Thanks !

bcm-at-zama commented 1 week ago

Great to see that it has been fixed with the new Concrete ML release. Indeed, in Concrete ML 1.6, we had identified an issue with approximate-mode simulation, which is fixed in 1.7, and you've been able to check it.

If you see another accuracy difference between simulation and real-FHE, please report and we investigate: it's not supposed to happen and will be treated as a bug

zama-ai / concrete-ml

[Question] How does ReLU work in the new NN example #809