Profile implementation of P256 and P384 sign/very operations, and evaluate performance degradation due to lack of extensible modular instructions for P384.
[x] document loading of curve domain parameters and constants #6177
Evaluation results for p384 multiplication
(copied from the #3192)
Currently it is 152 cycles for 384 bit vs. 37 cycles for 256 bit.
This is with using the fully unrolled multiplication.
There is room for optimization here:
For a generic implementation (no specific curve) we would implement Karatsuba for the multiplication kernel
For the p384 specific implementation we would either move to Solinas or use an optimized Barret implementation (using the special form of the pre-computed barrett parameter for p384). This could use Karatsuba on top. Using Solinas should be faster but I am not sure yet how efficiently it can be implemented on OTBN.
Profile implementation of P256 and P384 sign/very operations, and evaluate performance degradation due to lack of extensible modular instructions for P384.
(Split out from https://github.com/lowRISC/opentitan/issues/2856)
Sub-tasks
Evaluation results for p384 multiplication
(copied from the #3192) Currently it is 152 cycles for 384 bit vs. 37 cycles for 256 bit. This is with using the fully unrolled multiplication.
There is room for optimization here: