Open AlexanderViand-Intel opened 1 year ago
Let's make this issue a bit more specific for anyone reading this issue who isn't familiar with the details.
Cf. the Dilithium spec, section 1.2 "Implementation Considerations" (page 5) of https://pq-crystals.org/dilithium/data/dilithium-specification-round3-20210208.pdf
They use Z_q[X]/(X^256 + 1)
and do a matrix-vector multiplication where the matrix has size 6x5. They use the "fully-splitting NTT algorithm" so that the polynomials can be multiplied componentwise after being transformed.
This ticket should be split into more manageable pieces. I think they might be:
If you agree I can go ahead and file those issues, and this ticket can be switched perhaps to adding the end-to-end test of a Dilithium signature.
Let me also add two FHE-specific references that are likely to be useful (in addition to more general work on fast NTT/modular arithmetic implementation):
Intel HEXL: Accelerating Homomorphic Encryption with Intel AVX512-IFMA52 which gives some details on how to exploit AVX512 for HE
HEaaN.MLIR: An Optimizing Compiler for Fast Ring-Based Homomorphic Encryption which has done exactly this work of implementing fast FHE-friendly polynomial ring arithmetic in MLIR, which is unfortunately not open-source but might still be a good reference.
This ticket should be split into more manageable pieces. I think they might be [...] If you agree I can go ahead and file those issues, and this ticket can be switched perhaps to adding the end-to-end test of a Dilithium signature.
Yes, I think that'd be a good idea! I'm guessing you'll create an umbrella issue to track all those issues?
This ticket should be split into more manageable pieces. I think they might be:
- Define appropriate IR constructs to identify polynomial representations in NTT/FFT/coefficient domains. I believe we discussed using the tensor encoding attribute to store something like "this tensor is an NTT-domain polynomial with parameters ..."
- Write a pass that converts "naive" polynomial multiplications into combinations of NTT + element-wise mul + INTT
- Write canonicalizations that collapse unnecessary NTT/INTT
- Write an initial lowering for NTT/INTT using a standard algorithm
- Support lowering a linalg.generic of polymuls (should already be supported by @AlexanderViand-Intel's recent work)
- Write optimized NTT/INTT lowerings for special hardware (e.g., AVX)
Since this was written, the following are completed (though may not be fully polished)
Since this was written, the following are completed (though may not be fully polished)
- [x] Support lowering a linalg.generic of polymuls (should already be supported by @AlexanderViand-Intel's recent work)
I'm not sure I actually implement lowerings for linalg.generic
(the -convert-elementwise-to-affine -polynomial-to-standard
llowering chain uses affine.for
instead), but if there's interest by someone to pick up this PR, I'm more than happy to add support for linalg.generic
, too :)
As the title says.
We considered the Dilithium signature scheme and the Kyber KEM as initial "end-to-end" examples for
polynomial
and this lowering. While Dilithium shouldn't be an issue, Kyber's ring doesn't support full-depth NTT, so the lowering would have to be aware of this (See https://electricdusk.com/ntt.html for a nice explanation of the issue).