Open sduyyy opened 1 year ago
In addition, what is the input range?
Hi, if I remember correctly, literature shows that signed arithmetic is faster for this size of modular arithmetic. For example, there are instructions smmulr in Armv7E-M and vpmulhrsw in AVX2. Both of them operate on signed inputs, and there is no unsigned counterparts.
However, I'm not sure if this is the reason for the changes. Maybe others working Kyber already can answer this.
I made an implementation in pure Kotlin. I even added Barrett and Montgomery reductions, but I still don't know why. All my functions work in the unsigned space only. Maybe it's worth it to dig through the commit history.
In the history implementation of barrett_reduce function, the output is in {0,...,q} congruent to a modulo q. But in Commits on Nov 20, 2020, the output of barrett_reduce is in {-(q-1)/2,...,(q-1)/2} congruent to a modulo q. Why is it modified like this?