Open mmcloughlin opened 5 years ago
The observation is that we don't need to compute as many partial products due to symmetry. That is, we can write
x^2 = 2 * SUM_{i < j} x_i * x_j * 2^{b(i+j)}
+ SUM_i x_i^2 * 2^{2bi}
Status of work-in-progress implementation:
ec3 go
sqr 66 60
mod 90 59
Montgomery reduction is far too expensive. Probably for two reasons:
p
#95
As a stopgap #32 implements square with multiply. We should have dedicated assembly for this that saves on 64-bit multiplies.