Open mmcloughlin opened 5 years ago
P-256 prime is:
ffffffff00000001 0000000000000000 00000000ffffffff ffffffffffffffff
p256const1 0 p256const0 -1
The assembly above performs the multiply as
p*u = p256const1 * u * 2^192 + 2^96 * u - u
Note the OpenSSL version does it completely without multiplies:
# Reduction iteration is normally performed by accumulating
# result of multiplication of modulus by "magic" digit [and
# omitting least significant word, which is guaranteed to
# be 0], but thanks to special form of modulus and "magic"
# digit being equal to least significant word, it can be
# performed with additions and subtractions alone. Indeed:
#
# ffff.0001.0000.0000.0000.ffff.ffff.ffff
# * abcd
# + xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.abcd
#
# Now observing that ff..ff*x = (2^n-1)*x = 2^n*x-x, we
# rewrite above as:
#
# xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.abcd
# + abcd.0000.abcd.0000.0000.abcd.0000.0000.0000
# - abcd.0000.0000.0000.0000.0000.0000.abcd
#
# or marking redundant operations:
#
# xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.----
# + abcd.0000.abcd.0000.0000.abcd.----.----.----
# - abcd.----.----.----.----.----.----.----
The
crypto/elliptic
implementation of P-256 exploits structure of the prime to minimize the number of multiplies required in computingp*u
during Montgomery reduction. Consider implementing the same technique. See for example:https://github.com/golang/go/blob/05e77d41914d247a1e7caf37d7125ccaa5a53505/src/crypto/elliptic/p256_asm_amd64.s#L1594-L1604