ibarrond / Pyfhel

PYthon For Homomorphic Encryption Libraries, perform encrypted computations such as sum, mult, scalar product or matrix multiplication in Python, with NumPy compatibility. Uses SEAL/PALISADE as backends, implemented using Cython.
https://pyfhel.readthedocs.io/
Apache License 2.0
489 stars 78 forks source link

How to encrypt images and visualize them? #83

Closed albjerto closed 3 years ago

albjerto commented 3 years ago

Hello,

I am doing my Master Thesis project that involves encrypting images and one of the encryption types chosen is homomorphic encryption. I have a HxW greyscale image in a numpy array of integers. Following the tutorial in the documentation, I first flatten the array to get a one-dimensional np array, then I create an equally-sized empty one and fill it with the result of the encryption. See the code snippet below:

# instantiation
HE = Pyfhel()
HE.contextGen(65538)
HE.keyGen()

.
.
.

# encryption part

flattened_img = plain_img.flatten()
array_gen = np.empty(len(flattened_img), dtype=PyCtxt)

for i in range(len(flattened_img)):
      array_gen[i] = HE.encryptInt(flattened_img[i])

The other encryption utilities I use return a bytearray. Since I want to visualize this image, I load the serialized data into a numpy array and reshape it to the size of the original image, so I do (not considering eventual padding which is handled beforehand)

np.frombuffer(encrypted_bytes, dtype=np.uint8).reshape(original_img_size)

What I am doing with Pyfhel is this:

bytearray(array_gen)

with though returns 8 times more bytes than the original image, so, when I load it, I do

np.frombuffer(encrypted_bytes, dtype=np.int64).astype(np.uint8).reshape(original_img_size)

I am aware of both the functions .to_bytes() of the PyCtxt class and of batch encryption. What is really confusing is that they both return 32828 bytes no matter if I’m calling it on a single integer or the whole batch.

Also, I saw in the examples in the repository that pickling is used. I tried following them, but I get this error:

Traceback (most recent call last):
  File "D:\alberto\corsi\Tesi\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3418, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-20-f2b2cc48d081>", line 51, in <module>
    arr1_pk = pickle.dumps(arr_gen1)
  File "stringsource", line 2, in Pyfhel.PyCtxt.PyCtxt.__reduce_cython__
TypeError: no default __reduce__ due to non-trivial __cinit__

So, after this immensely long preamble, I have some questions:

Thank you, Alberto

ibarrond commented 3 years ago

for i in range(plain_img.shape[0]): for j in range(plain_img.shape[0]): array_gen[i][j] = HE.encryptInt(plain_img[i][k])


- *What exactly is returned from the .to_bytes() function?*
A serialized version of the ciphertext. As you might know, homomorphic encryption ciphertexts are, under the hood, rings of polynomials. In order to move the information contained inside a ciphertext somewhere else (e.g., different machine), we convert this polynomials into a bytes string (just taking the raw binary content with some indicators on how to read it). That's what `to_bytes` returns.
- *Why is the size of the ciphertext of an integer the same of the ciphertext of a batch?*
This is thanks to a technique called  "batching". More info in [this answer](https://crypto.stackexchange.com/questions/86140/what-is-batching-in-homomorphic-encryption).
- *Are the additional field of the PyCtxt class serialized as well?*
No. The Pyfhel instance used to create it is not serialized with it, and you need to later provide an indication on the type of encoding used (integer, batching, float) in order to deserialize.
- *How can I get the ciphertext alone?*
The ciphertext alone is what you get. However, this serialization does include some internal information coming from the underlying SEAL library. As of today, you cannot get only the polynomials that compose the core of a ciphertext in Pyfhel. If you wish to get it, we can work out a modification of Pyfhel to do so.
- *If I’m doing something wrong, how would you suggest I encrypt and serialize an image to visualize the encrypted result?*
I believe there is a misconception here: there is no point in visualizing the encrypted result. It looks just like random noise, and, as you ventured earlier, the size of the ciphertext is bigger than that of a plaintext. This is not only from Pyfhel and FHE, but applicable to all encryption mechanisms out there. Unless you're planning to break the security of the scheme, you won't get anything out of visualizing the encrypted image.

If, nonetheless, you wish to "visualize" something, your current approach (taking all 8-bits and reading them as if they were 64-bits) is as good as any other artificial reduction of the serialized ciphertexts. Be aware that it has no meaning.

PS: Can you post the pickle error separately? It might be a bug worth correcting.
albjerto commented 3 years ago

Thank you @ibarrond, that provided a lot of clarifications. I posted the pickling error in a separate issue: https://github.com/ibarrond/Pyfhel/issues/84

ibarrond commented 3 years ago

Glad to hear! I'll close the issue with a summary:

How to encrypt images and visualize them? Encrypt each pixel individually (double for loop) or use a batch for each row/column. There is no way to visualize them! It looks random, and you won't distiguish anything!