secretflow / heu

A high-performance homomorphic encryption algorithm library.
https://www.secretflow.org.cn/docs/heu/en/
Apache License 2.0
90 stars 39 forks source link

Paillier ciphertext results different from plaintext calculation #52

Closed lidh15 closed 1 year ago

lidh15 commented 1 year ago

As heu.phe integrated intel IPCL, I compared intel's implementation with heu.phe. The following test https://github.com/intel/pailliercryptolib_python/blob/ff77e7ee3b3af402f8fc445672974ed0a5a57a6b/bench/bench_ipcl_python.py#L60 computed a ciphertext times a plaintext and then plus another plaintext. It takes a range of numbers into calculation, and let's take the last item in the array as an example, the function is simplified:

def BM_Add_CTPT(state):
    x = (63 + 11) * 5111.2834
    y = (32768 - 63) * 1.3872
    ct_x = pk.encrypt(x)
    ct_x = ct_x * x.astype(int) # because heu.phe doesn't support float multiply
    while state:
        ct_z = ct_x + y
    print(sk.decrypt(ct_z)[-1])

for heu.phe, I did:

from heu import phe
kit = phe.setup(phe.SchemaType.IPCL) # all schemas will return the same result
encrypt = kit.encryptor()
decrypt = kit.decryptor()
evaluate = kit.evaluator()
encode = phe.FloatEncoder(phe.SchemaType.IPCL)
x = (63 + 11) * 5111.2834
ct_x = encrypt.encrypt(encode.encode(x))
ct_x_2 = evaluate.mul(ct_x, int(x))
y = (32768 - 63) * 1.3872
pt_y = encode.encode(y)
ct_z = evaluate.add(ct_x_2, pt_y)
z = encode.decode(decrypt.decrypt(ct_z))

The results of ((63 + 11) * 5111.2834) * int((63 + 11) * 5111.2834) + (32768 - 63) * 1.3872 is clearly143061371616.53043, which is also returned by intel IPCL, but with heu.phe, the result is 143061371616.5304, the last digit is truncated. And similar accuracy losses are observed in lots of tests.

lidh15 commented 1 year ago

And of course, even if the results are accurate, intel's python implementation still has the advantage that it is born with float multiplication over heu.phe python implementation.

da-niao-dan commented 1 year ago

Hello,

Thank you for your question. Your issue with numerical accuracy is a common problem that arises when using floating-point arithmetic in homomorphic encryption. We recommend using the HEU library's big int encoder to achieve any degree of accuracy required.

To achieve the desired accuracy, we suggest using the following code:

kit = phe.setup(phe.SchemaType.IPCL)
encrypt = kit.encryptor()
decrypt = kit.decryptor()
evaluate = kit.evaluator()
scale = 10 ** 11
encode = phe.BigintEncoder(phe.SchemaType.IPCL)
x = (63 + 11) * 5111.2834
x_encoded = encode.encode(int(x * scale))
ct_x = encrypt.encrypt(x_encoded)
ct_x_2 = evaluate.mul(ct_x, int(x))
y = (32768 - 63) * 1.3872
pt_y = encode.encode(int(y * scale))
ct_z = evaluate.add(ct_x_2, pt_y)
z = encode.decode(decrypt.decrypt(ct_z)) / scale

To explain the reasoning behind using the big int encoder, HEU provides various encoders, and while the float encoder is easier to use, it has an accuracy limit. This is because most machine learning applications can afford some accuracy loss but require faster computations. Therefore, we choose to make the float encoder have an accuracy limit due to other considerations. However, we support any degree of accuracy using HEU's big int encoder.

A value of x * int(x) has 11 decimal places, and to avoid truncating the last decimal place digit, we recommend encoding with a scale of 10 11. Encoding with scale means encoding the integer x int(x) scale. Although a float encoder does the scaling process for the user, it has limited accuracy, and forcing it to encode with a scale of 10 11 may result in numerical overflow. You can try it yourself with phe.FloatEncoder(phe.SchemaType.IPCL, 10 ** 11).

Hence, when requiring very accurate results, the float encoder is not the best tool. Instead, we suggest using the big int encoder, but we must handle the scaling ourselves.

Thank you for the great example, and please let us know if you have any further questions.

usafchn commented 1 year ago

Hello, @lidh15

I think the result of ((63 + 11) * 5111.2834) * int((63 + 11) * 5111.2834) + (32768 - 63) * 1.3872 is 143061371616.5304, not 143061371616.53043. The result of IPCL and Python is wrong, but HEU is right.

In python, 74 * 5111.2834 gives you the answer 378234.97160000005. But the actual result is 378234.9716. An error in the binary decimal representation causes the final 0000005, eventually resulting in 143061371616.53043. On the other hand, if you evaluate the expression on paper using the school method, the final result should be 143061371616.5304

In [1]: 74 * 5111.2834
Out[1]: 378234.97160000005

In [2]: 378234.97160000005 * 378234 + 45368.376
Out[2]: 143061371616.53043

In [3]: 378234.9716 * 378234 + 45368.376
Out[3]: 143061371616.5304

The @da-niao-dan ‘s method gives you the same result as python because he calculates x = (63 + 11) * 5111.2834 on the Python side, and then sends x to HEU to encryption, so the error has been introduced when using Python to calculate x.

In [6]: x = (63 + 11) * 5111.2834

In [7]: print(x)
378234.97160000005

In [8]: scale = 10 ** 11

In [9]: print(scale)
100000000000

In [10]: print(int(x * scale))
37823497160000008
lidh15 commented 1 year ago

Oh yes, you are right, I over trusted python's float precision.