Montgomery arithmetic doesn't seem to be constant-time

mratsim commented 4 years ago

Looking into montgomery.hpp there is a conditional substraction that depends on input secret data in all procedures:

https://github.com/niekbouman/ctbignum/blob/a532f24b3f7cb3c052fae3a02807bb0b64063883/include/ctbignum/montgomery.hpp#L63-L64

https://github.com/niekbouman/ctbignum/blob/a532f24b3f7cb3c052fae3a02807bb0b64063883/include/ctbignum/montgomery.hpp#L115-L116

https://github.com/niekbouman/ctbignum/blob/a532f24b3f7cb3c052fae3a02807bb0b64063883/include/ctbignum/montgomery.hpp#L161-L162

https://github.com/niekbouman/ctbignum/blob/a532f24b3f7cb3c052fae3a02807bb0b64063883/include/ctbignum/montgomery.hpp#L209-L210

Did I miss something?

niekbouman commented 4 years ago

Dear @mratsim,

That is indeed the case, you are not missing something. Certain functions in the library are constant-time, but certainly not all of them! There might be a confusion in the naming of the library. The ct in ctbignum originated from "compile-time".

If you need a constant-time modular reduction (where the modulus is fixed), you could look at invariant_div.hpp This is using the Granlund--Montgomery invariant division technique.

If, instead, you seek constant-time Montgomery arithmetic (i.e., keeping numbers in the Montgomery representation most of the time), then I advise you to have a look at papers by, among others, Joppe Bos, for example this one by Joppe Bos and Peter Montgomery. IACR ePrint 2017/1057 You would then need the 'redundant representation' technique to get rid of the conditional subtraction as described in that paper. (BTW I never implemented such constant-time Montgomery arithmetic routine myself)

I hope this helps.

Off topic: I recently looked at your Nim projects like weave, awesome!!

mratsim commented 4 years ago

I see, thanks for the heads up.

On Contant-time Montgomery Multiplication

I was looking into constant-time Montgomery Multiplication and managed to implement it yesterday (not tested on multiLimbs yet):

https://github.com/mratsim/constantine/blob/f6b229b1/constantine/math/bigints_raw.nim#L341-L377

func montyMul*(
       r: BigIntViewMut, a, b: distinct BigIntViewAny,
       M: BigIntViewConst, montyMagic: Word) =
  ## Compute r <- a*b (mod M) in the Montgomery domain
  ## `montyMagic` = -1/M (mod Word). Our words are 2^31 or 2^63
  ##
  ## This resets r to zero before processing. Use {.noInit.}
  ## to avoid duplicating with Nim zero-init policy
  # i.e. c'R <- a'R b'R * R^-1 (mod M) in the natural domain
  # as in the Montgomery domain all numbers are scaled by R

  checkValidModulus(M)
  checkOddModulus(M)
  checkMatchingBitlengths(r, M)
  checkMatchingBitlengths(a, M)
  checkMatchingBitlengths(b, M)

  let nLen = M.numLimbs()
  setZero(r)

  var r_hi = Zero   # represents the high word that is used in intermediate computation before reduction mod M
  for i in 0 ..< nLen:

    let zi = (r[0] + wordMul(a[i], b[0])).wordMul(montyMagic)
    var carry = Zero

    for j in 0 ..< nLen:
      let z = DoubleWord(r[j]) + unsafeExtPrecMul(a[i], b[j]) +
              unsafeExtPrecMul(zi, M[j]) + DoubleWord(carry)
      carry = Word(z shr WordBitSize)
      if j != 0:
        r[j-1] = Word(z) and MaxWord

    r_hi += carry
    r[^1] = r_hi and MaxWord
    r_hi = r_hi shr WordBitSize

  # If the extra word is not zero or if r-M does not borrow (i.e. r > M)
  # Then substract M
  discard r.sub(M, r_hi.isNonZero() or not r.sub(M, CtFalse))

the r.sub call is special, it takes a control parameter as the last input to conditionally do or not do a substraction (it always return the carry/borrow though) this way I can do the last conditional substraction in constant time.

On compile-time

Making it work at compile-time in my library would also be possible. The input types of montyMul are pointers but if they were changed to plain objects, Nim could use them at compile-time (it requires much less ceremony than C++ templates for that matter). The reason I did not do that is because of generic monomorphization which I think your library might suffer of:

For example you will have an instantiation of each proc for all the M, N, Modulus static parameter here: https://github.com/niekbouman/ctbignum/blob/83c63116469a97c07cb9dbed24ed379dd5b711ee/include/ctbignum/addition.hpp#L110-L170
as you forward everything to either add_same, mod_add or add_without_carry that are parametrized to a single N, the code explosion is still limited to all the N supported. If it's 1~3 modulus/size it's probably reasonable but I wanted a library that could support lots of cryptographic curve while being usable on restricted devices hence code size is a concern
Not an issue in Nim but C++ compilation is really slowed done by templates instantiation

So I choose to have a dual implementation, low-level is type-erased, with all runtime evaluation but high-level is using integer generics for bits (and number of words is derived from there) and enum generics for Curve (and modulus is compile-time derived from there).

Weave

Thanks for the kind words regarding Weave.

You might be interested in my talk from weaks ago about the design of high-performance multithreading runtime: https://fosdem.org/2020/schedule/event/nimultralowoverheadruntime/

Closing this since the misunderstanding on "ct" was clarified.

niekbouman / ctbignum