johanernst / khPRF-PSA

GNU General Public License v3.0
7 stars 0 forks source link

Can I ask a question about the code? #1

Open 1853582 opened 1 year ago

1853582 commented 1 year ago

Can I ask a question about the code?

  1. Why can we only calculate hash once to obtain λ Hash values

  2. Are there any requirements for the values of q and p here? In your comments, it was explained that 512 bits can be split into Z_q The 4 (128 bit) values of q, so q needs to be taken to the 128th power of 2. Would changing q result in an error?

johanernst commented 1 year ago

Thank you for your questions.

  1. I don't understand your first question. Can you try to rephrase it?

  2. Let me first explain potential effects of changing q that are bit more independent of the concrete implementation. Then I will point out some things that can happen if you change q in this specific implementation. 2.1. For the scheme to function properly, p must be large enough to contain the entire message space plus some extra space that will be used for error correction, because the pseudorandom function is only key-homomorphic. For security reasons q should be significant larger than p. For a bit more details you can have a look at the corresponding paper: https://petsymposium.org/2021/files/papers/issue4/popets-2021-0063.pdf (Section 4, in particular 4.2 and 4.3). And if you want a lot more details, you can have a look at the paper that introduced the learning-with-rounding (LWR) problem: https://eprint.iacr.org/2011/401.pdf It is convenient to choose q as a power of 2 (e.g. 2^128), because then you can simply use 128 bit of a hash value, when you need a random value in Z_q. However there is no theoretical reason why q should not be e.g. 2^127. 2.2. If you change only q (i.e. the variable key_mod) in this implementation, then you will probably produce an error or a security problem. You would at least have to change the variable output_length in prf.go accordingly (when q=2^n, then you have to use n bits of the SHA3 hash value to create the variable integer_hash_value in the function "hash"). On the other hand I think you could change p (i.e. the variable message_mod) (at least to some extent) without breaking the implementation.

Warning: The code in this repository is only meant as proof of concept for scientific purposes (e.g. it was not written with side channel attacks in mind). Also am not an expert on the exact effects that different choices of p and q have on the security of the underlying learning-with-rounding (LWR) problem. So I would strongly advice you against solely relying on this code or my above answers in a real world implementation where the security of actual data is at stake.

1853582 commented 1 year ago

Thank you for your detailed answer. If I want to combine your homomorphic pseudo-random function with Shamir algorithm, each aggregator uses a different key. If this key is modulo 2 to the 128th power, it may not be very convenient to apply. Can we use large prime numbers for the modulus of key or must we perform operations on the finite field GF (2 ^ 128).

1853582 commented 1 year ago

I read from your paper: “Note that if q does not divide the size of the output space of H′, extra analysis is needed to make sure that the mod q operation does not induce any bias. However in our case we choose q as power of 2, whereby it divides the size of the output space of H′. In our implementation we used SHA3-512 as H′.” Can you further explain the consequences of deviation caused by module q operation? Is there a solution to this type of problem please?

johanernst commented 1 year ago

For security reasons the in LWR needs to be uniformly random in (Z_q)^n. In other words, in each iteration of the loop, the variable in the code has to be a uniformly random number between 0 and q-1. If q=2^128, this is easy to ensure. Just take the first 128 bits from the output of the hash function H. Now lets say that q is a prime number between 2^127 and 2^128. Then you cannot simply, take the first 128 bits of the hash value, because this number could be larger than q. You cannot simply take the first 127 bit of the hash value either, because then your result will be between 0 and (2^127)-1 instead of between 0 and q-1. A (somewhat flawed) solution would be to take the first 128 bit of the hash value and then compute mod q, in order to get a number between 0 and q-1. The problem here is that the result will not be uniformly distributed anymore. This is what is meant in the paragraph from the paper, which you quoted. Let me give you an example with small numbers. The hash value is 3 bit long, i.e. it is a number between 0 and 7. Lets say you want q=5, so you take the hash value and compute mod 5. Depending on the hash value you will get the following results:

Hash value result after computing mod 5 0 0 1 1 2 2 3 3 4 4 5 0 6 1 7 2

The numbers 0,1,2 appear twice in the result, while 3 and 4 only appear once. Thus, 0,1,2 are twice as likely as 3 and 4 (this is the "bias" from your quote), which is not a uniform distribution anymore.

One potential solution to this problem is that you choose a prime number q that is very very close to 2^128. Then the bias is very small (the "statistical difference" to a uniform distribution is very small). More concretely: I typed "what is the closest prime number to 2^128" in wolfram-alpha and it gave q=340282366920938463463374607431768211507 as answer (please check if this is actually a prime number). This is very close to 2^128, in fact q - 2^128 = 51. So if you simply take the first 128 bit of the hash value, then this is very close to a uniform distribution over Z_q. You don't even have to compute modulo. The "statistical distance" between the distribution that you would produce and a uniform distribution over Z_q should be less than 2^-122, so I think that this should be ok from a security perspective.

An alternative solution could be to take a prime number q that is slightly smaller than 2^128 and if your hash value is larger than q, through it away and compute a new one. This should produce an (exact) uniform distribution, but may have other caveats.

Use this at your own risk! I have spent a bit more than 1 hour thinking about the question and writing the answer, so its probably best to thoroughly think it through yourself and double check with other people.

Also I am pretty sure that other people have thought about that problem for a longer time than I did, so there is probably more information (and potentially a better solution) somewhere on the internet.

1853582 commented 1 year ago

Thank you for your answer. I will continue to think about it.

1853582 commented 1 year ago

Hello, I have some thoughts about this paper:

  1. In the Setup and Key Management section, you mentioned that it is necessary to assign a secret key to each client, and then obtain the total key through the NIKE method and send it back to the server. This part of the communication seems a bit heavy, and I was wondering if it could be replaced with the Shamir method to distribute the key. Each client's secret key is ki=LiSi, and then S=L1S1+L2S2++ LnSn calculates the total secret key.

  2. You mentioned that encrypting a label can only encrypt one number. If the object I encrypt is not a number but a vector, it will result in longer encryption time. Is there a way to solve this problem? My current idea is to calculate the hash in advance and assemble it into a vector H=[H (1),..., H (n)]. Then, I can use the tensor library to calculate H · K and perform modular operations.

Do you think these two points are feasible?

1853582 commented 1 year ago

I am a software engineer and currently the project requires the use of quantum resistant key homomorphic pseudo random numbers. Sorry to bother you, I asked you so many questions.

1853582 commented 1 year ago

Hello,I also want to ask is there a way to obtain multiple pseudo random numbers at once?

johanernst commented 1 year ago
  1. The NIKE is necessary so that each pair of clients has a shared secret key. Each client then uses this key to blind their PRF secret key in a way that the server can only learn the sum of the keys. This is done in order to avoid having a central party creating and distributing the PRF secret keys. I do not see how Shamir's secret sharing would avoid the central party. But I also don't understand what exactly your idea is.

  2. I guess you can even compute the entire PRF value(s) in advance. Then encryption is only adding the pre-computed PRF value(s) to the message.