status-im / nim-blscurve

Nim implementation of BLS signature scheme (Boneh-Lynn-Shacham) over Barreto-Lynn-Scott (BLS) curve BLS12-381
Apache License 2.0
26 stars 11 forks source link

Carry/Borrow in bigint addition/substraction #1

Closed mratsim closed 5 years ago

mratsim commented 6 years ago

There is no carry/borrow in the code. Apparently this is enabled by field arithmetic.

https://github.com/status-im/nim-milagro-crypto/blob/4add8c3441802b9962c966d023b629dcfb207640/src/generated/big_384_29.c#L357-L369

Citing the help - http://docs.milagro.io/en/amcl/milagro-crypto-library-white-paper.html#42-addition-and-subtraction:

4.2 Addition and Subtraction The existence of a word excess means for example that multiple field elements can be added together digit by digit, without processing of carries, before overflow can occur.

Only occasionally will there be a requirement to normalize these extended values, that is to force them back into the original format. Note that this is independent of the modulus.

The existence of a field excess means that, independent of the word excess, multiple field elements can be added together before it is required to reduce the sum with respect to the modulus. In the literature this is referred to as lazy, or delayed, reduction. In fact we allow the modulus to be as small as 254 bits, which obviously increases the field excess.

Note that these two mechanisms associated with the word excess and the field excess (often confused in the literature) operate largely independently of each other.

AMCL has no support for negative numbers. Therefore subtraction will be implemented as field negation followed by addition. Negation is performed using the method described as Option 1 in [aranha−karabina−longa−gebotys−lopez]. Basically the number of the active bits in the field excess of the number to be negated is determined, the modulus is shifted left by this amount plus one, and the value to be negated is subtracted from this value.

Note that because of the "plus 1", this will always produce a positive result at the cost of eating a bit into the field excess. image small 256-bit number representation

Normalization of extended numbers requires the word excess of each digit to be shifted right by the number of base bits, and added to the next digit, working right to left. Note that when numbers are subtracted digit-by-digit individual digits may become negative. However since we are avoiding using the sign bit, due to the magic of 2's complement arithmetic, this all works fine without any conditional branches.

Reduction of unreduced BIG numbers is carried out using a simple shift-compare-and-subtract of the modulus, with one subtraction needed on average half of the time for every active bit in the field excess. Hopefully such reductions will rarely be required, as they are slow and involve unpredictable program branches.

Since the length of field elements is fixed at compile time, it is expected that the compiler will unroll most of the time-critical loops. In any case the conditional branch required at the foot of a fixed-size loop can be accurately predicted by modern hardware.

The problem now is to decide when to normalize and when to reduce numbers to avoid the possibility of overflow. There are two ways of doing this. One is to monitor the excesses at run-time and act when the threat of overflow arises. The second is to do a careful analysis of the code and insert normalization and reduction code at points where the possibility of overflow may arise, based on a static worst-case analysis.

The field excess En of a number n is easily captured by a simple masking and shifting of the top word. If two normalized numbers a and b are to be added then the excess of their sum will be at worst Ea+Eb+1. As long as this is less than 2FE where FE is the field excess, then we are fine. Otherwise both numbers should be reduced prior to the addition.

In AMCL these checks are performed at run-time. However, as we shall see, in practice these reductions are very rarely required. So the if statement used to control them is highly predictable. Observe that even in the worst case, for a 16-bit implementation, the excess is a generous FE=4, and so many elements can be added or subtracted before reduction is required.

The worst case word excess for the result of a calculation is harder to calculate at run time, as it would require inspection of every digit of every BIG. This would slow computation down to an unacceptable extent. Therefore in this case we use static analysis and insert normalization code where we know it might be needed. This process was supported by special debugging code that warned of places where overflow was possible, based on a simple worst-case analysis.