Open ebfull opened 5 years ago
I can't remember what paper I first saw the deferred Montgomery reduction in. I will look for that.
It's a pretty straight forward concept though. If you're computing e.g. a * b + c * d
, you can perform the multiplications, add the products together, and then perform the reduction on the result. This can be done efficiently up to a certain point due to the extra 3 most significant bits that are free.
The other related tricks (not performing normalization after the reduction, or after additions, etc.) don't seem to help enough in practice IMO.
Would you please attach implementation or paper references for both deferred montgomerry reduction and fp operations if there is one? @ebfull