google / heir

A compiler for homomorphic encryption
https://heir.dev/
Apache License 2.0
354 stars 51 forks source link

Write an initial lowering for NTT/INTT using a standard algorithm #543

Closed j2kun closed 7 months ago

j2kun commented 8 months ago

Depends on https://github.com/google/heir/issues/182

I'm not an expert in NTT algorithms, but here are a few references:

inbelic commented 8 months ago

I have started to work on a naive textbook implementation of the NTT op lowering, following here. One issue that arises is computing the 2n-th primitive root of unity. Presumably, this is something we would want to avoid due to its computational cost when lowering the operation.

Either, we could have some type of cached results for the prevalent values (Dilithium: 2n-th = 1753, nth = 1753^2, etc.) and fallback on computation if they are not found, or, I would propose that the primitive root of unity could be added as an attribute to a polynomial with a ring. Since it is computationally efficient to check if it is correct during verification, and it would allow for the user to specify a particular one if there are multiple.

WDYT? @j2kun @AlexanderViand-Intel

j2kun commented 8 months ago

I think we can likely hard-code them, but looking to @AlexanderViand-Intel for prior art since I think he did that in HECO already. In particular, I believe we would need one primitive root for every 32-bit or 64-bit prime we use for the smaller side of an RNS decomposition. I'm guessing 50 would be enough to support RNS with typical sizes of Q (50*32 = 1600 bits), so maybe we could generate a static DenseMap with 100 32-bit primes mapping to their primitive roots of unity, and similarly 100 64-bit primes. And check in whatever script does the work so we can tweak it as we get the need for more primes.

If we want to generate them on the fly, we would likely need to use one of the C++ libraries like NTL or FLINT to generate them, and add it as a dependency. I can do this if we decide to go this route.

j2kun commented 8 months ago

primitive root of unity could be added as an attribute to a polynomial with a ring

I think adding it as an extra field on RingAttr would be a good choice, and we could make it optional and materialize it from the hard-coded data at parse time.

j2kun commented 8 months ago

Note to self, this sympy function can compute the roots: https://docs.sympy.org/latest/modules/ntheory.html#sympy.ntheory.residue_ntheory.nthroot_mod

j2kun commented 8 months ago

@WoutLegiest mentioned they have a lot of experience in managing the twiddle factors in his lab. Ping :)

inbelic commented 8 months ago

I think we can likely hard-code them, but looking to @AlexanderViand-Intel for prior art since I think he did that in HECO already. In particular, I believe we would need one primitive root for every 32-bit or 64-bit prime we use for the smaller side of an RNS decomposition. I'm guessing 50 would be enough to support RNS with typical sizes of Q (50*32 = 1600 bits), so maybe we could generate a static DenseMap with 100 32-bit primes mapping to their primitive roots of unity, and similarly 100 64-bit primes. And check in whatever script does the work so we can tweak it as we get the need for more primes.

Nice, that makes sense to me. I will hard-code them for an initial approach with a script as mentioned.

Note to self, this sympy function can compute the roots:

Note for the note: it does not compute primitive roots

AlexanderViand-Intel commented 8 months ago

I agree that, at least initially, it makes sense to keep the metadata generation out of the compiler code/runtime. However, as we support more varied approaches to modular ring arithmetic, I think we might see an explosion of, e.g., different NTT designs requiring slightly different twiddle factors? @WoutLegiest (and other HW wizards) probably knows more?

WoutLegiest commented 8 months ago

Gone summon a die hard HW wizards @axytho, what is your opinion?

axytho commented 8 months ago

I think the approach you're suggesting here is good: hardcode the twiddle factors for the most commonly used moduli in the respective schemes, but allow the user to set their own moduli AND twiddle factor(s).

It's correct that you only need to store one primitive root per modulus. (Optimally, store the 2n-th root). However, there are 2^{2n} possible 2n-th roots, so it's possible someone else might choose a different 2n-th root for say, generating the bootstrapping key. Then the bootstrapping will give the wrong result.

We actually had this problem in the ZPRIZE competition: The competition rules specified a modulus (part of the zero knowledge scheme used, so makes sense) but also implicitely specified a twiddle factor by the test vectors that were used to test our output. We got the organizers to agree to change the test vectors to our twiddle factor as it allowed for more efficient hardware.

People might make software-based optimizations depending on twiddle factors (we've thought of this), but this is especially the case in hardware: we can really optimize the hardware by choosing the right twiddle factor, and even if it's not that important, FPGA bitstreams of the bootstrapping step are likely to be delivered with a certain twiddle hardcoded which might not be easy to change.