mmcloughlin / ec3

Elliptic Curve Cryptography Compiler: an incomplete experiment in code-generation for elliptic curves in Go
BSD 3-Clause "New" or "Revised" License
56 stars 6 forks source link

asm/fp/mont: exploit structure of prime #95

Open mmcloughlin opened 5 years ago

mmcloughlin commented 5 years ago

The crypto/elliptic implementation of P-256 exploits structure of the prime to minimize the number of multiplies required in computing p*u during Montgomery reduction. Consider implementing the same technique. See for example:

    // First reduction step
    MOVQ acc0, mul0
    MOVQ acc0, hlp
    SHLQ $32, acc0
    MULQ p256const1<>(SB)
    SHRQ $32, hlp
    ADDQ acc0, acc1
    ADCQ hlp, acc2
    ADCQ mul0, acc3
    ADCQ $0, mul1
    MOVQ mul1, acc0

https://github.com/golang/go/blob/05e77d41914d247a1e7caf37d7125ccaa5a53505/src/crypto/elliptic/p256_asm_amd64.s#L1594-L1604

mmcloughlin commented 5 years ago

P-256 prime is:

ffffffff00000001 0000000000000000 00000000ffffffff ffffffffffffffff
      p256const1                0       p256const0               -1

The assembly above performs the multiply as

p*u = p256const1 * u * 2^192 + 2^96 * u - u
mmcloughlin commented 5 years ago

Note the OpenSSL version does it completely without multiplies:

    # Reduction iteration is normally performed by accumulating
    # result of multiplication of modulus by "magic" digit [and
    # omitting least significant word, which is guaranteed to
    # be 0], but thanks to special form of modulus and "magic"
    # digit being equal to least significant word, it can be
    # performed with additions and subtractions alone. Indeed:
    #
    #        ffff.0001.0000.0000.0000.ffff.ffff.ffff
    # *                                         abcd
    # + xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.abcd
    #
    # Now observing that ff..ff*x = (2^n-1)*x = 2^n*x-x, we
    # rewrite above as:
    #
    #   xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.abcd
    # + abcd.0000.abcd.0000.0000.abcd.0000.0000.0000
    # -      abcd.0000.0000.0000.0000.0000.0000.abcd
    #
    # or marking redundant operations:
    #
    #   xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.----
    # + abcd.0000.abcd.0000.0000.abcd.----.----.----
    # -      abcd.----.----.----.----.----.----.----

https://github.com/openssl/openssl/blob/8dc57d76c99dffd91e88622e2ca2b4bd7de5e1aa/crypto/ec/asm/ecp_nistz256-x86.pl#L932-L954