Closed vincehong closed 1 week ago
thanks!
The underlying representation used by Concrete is variable sized integers - the message space can contain up to 20-30bits. However, the PBS will only work efficiently for integers up to 6-8 bits (it can go up to 16 but it's slower). It is possible, through what we call "approximate rounding" to only apply PBS to the desired number of MSBs of a high bitwidth accumulator (the 6 msbs of the 14 bits accumulator).
A PBS refreshes noise but also applies a table lookup to the value it processes. Thus, when applying PBS we get the RELU evaluation for free.
Using only the MSBs of the accumulator works well because of quantization: quantizing a value implies dividing it by a scale factor. This division can be thought of as dividing by a power of two and by another smaller scale factor. Dividing by a power of two is actually removing LSBs.
Thanks for the fast reply!
Dividing by a power of two is actually removing LSBs.
But removing LSB will also cost some PBS?
There are two approaches to removing LSBs:
There are two approaches to removing LSBs:
- exact rounding: will use as many 1-b PBS as LSBs you want to remove
- approximate: will simply ignore the LSBs during the PBS - but it adds some a probability of off-by-one in the result of the PBS
Ah that's the point, thanks! Have you tested the impact of such approximation errors? Since the final FHE accuracy 95.8% is only evaluated in fhe="simulate".
FHE simulation takes into account any impact of the noise so you can be confident that it will represent FHE accuracy well. We also ran 100 samples with FHE to be sure, the accuracy was preserved.
I change the following line
simulate_predictions = q_module.forward(data, fhe="simulate")
into
simulate_predictions = q_module.forward(data, fhe="execute")
in https://github.com/zama-ai/concrete-ml/blob/main/use_case_examples/white_paper_experiment/WhitePaperExperiments.ipynb
The result is as follows:
Running NN-20 on a 128-core machine machine:Accuracy in fp32 : 98.067% for the test set Accuracy with FHE-simulation mode : 94.241% for the test set FHE Latency on encrypted data : 2.197s per encrypted sample. Number of PBS: 2440 Running NN-50 on a 128-core machine machine:Accuracy in fp32 : 97.446% for the test set Accuracy with FHE-simulation mode : 91.336% for the test set FHE Latency on encrypted data : 5.574s per encrypted sample. Number of PBS: 5200
So I am wondering:
Update:
I also give the unmodified simulated mode for reference:
simulate_predictions = q_module.forward(data, fhe="simulate")
The results are:
Accuracy with FHE-simulation mode : 96.244% for the test set FHE Latency on encrypted data : 6.562s per encrypted sample. Number of PBS: 2440 Running NN-50 on a 128-core machine machine:Accuracy in fp32 : 97.446% for the test set Accuracy with FHE-simulation mode : 95.032% for the test set FHE Latency on encrypted data : 15.127s per encrypted sample. Number of PBS: 5200
Saw this issue solved in 1.7.0. Now simulated mode correctly reflects the accuracy loss. Thanks !
Great to see that it has been fixed with the new Concrete ML release. Indeed, in Concrete ML 1.6, we had identified an issue with approximate-mode simulation, which is fixed in 1.7, and you've been able to check it.
If you see another accuracy difference between simulation and real-FHE, please report and we investigate: it's not supposed to happen and will be treated as a bug
Congratulations on your new results in https://www.zama.ai/post/making-fhe-faster-for-ml-beating-our-previous-paper-benchmarks-with-concrete-ml ! We wonder if more details about the underlying improvements could be described?
For example, the printed number of PBS in NN-20 in https://github.com/zama-ai/concrete-ml/blob/main/use_case_examples/white_paper_experiment/WhitePaperExperiments.ipynb is 2440=784+18x92, this means only one PBS is required to implement a ReLU, which is counter-intuitive: Since the model is quantized to 6 bits, the result W*X would be around 14bits, how can we use one PBS to do ReLU on 14bits inputs?
Thanks.