Open mcy opened 2 years ago
These look fun! When thinking seriously about these, we'd need to work out why we're adding them. A couple of quick thoughts:
n
and n | 1
(like you were suggesting for the multi-word instructions)@GregAC
Multi-word add
It could also still be useful for unsaturated limbs to have something that doesn't need a carry between adds -- e.g. I have 4 limbs but none of them are storing the full 256 bits, so an instruction could add the 4 pairs of corresponding limbs but have no need to carry between them (that would be done later as a multi-bit carry).
@felixmiller, any instructions you would like to see?
Rupert: a lot of these have come up in discussions with Jade about things that would simplify or improve the performance of our assembly. I think she and Felix can provide concrete use examples to help pare down what's worth spending area and encoding budget on.
Re. SHA2 instructions we should take SCA considerations into account. This requires careful consideration at the interface level, and it is possible we may want to opt for a micro code implementation instead of following the RISC-V specification.
Marking as P2 as this is not currently part of the current iteration plan.
Re. SHA2 instructions we should take SCA considerations into account. This requires careful consideration at the interface level, and it is possible we may want to opt for a micro code implementation instead of following the RISC-V specification.
Yea, we should be very careful here with SHA2 instructions. We should discuss the exact use case too.
Two things we were discussing in the past:
Carry-less multiplication Nothing that we desperately need right now, but we probably want to support this in the (very) long term to enable polynomial arithmetic.
Composable (multi-limb) pseudo modular addm subm
The addm
and subm
instructions (pseudo modular add/sub) come in very handy for any 256-bit elliptic curve implementations. For any multi-limb implementation this blows up to 3 steps: (1) Arithmetic operation (sub/add), (2) (speculative) modular correction, (3) final select. All this resulting in 1 machine instruction per step and limb (e.g. 6 instructions for P-384). Nice would be a composable multi-limb addm/subm that would take one instruction per limb. The big issue here is probably the register to store the (then larger) modulus.
@moidx, @jadephilipoom, and I have been discussing future extensions we'd like to see in the OTBN ISA in future revisions. This issue is intended to record those in a central place, but they are in no way plan-of-record or intended to make it into the timeline for the current revision.
(Previously: https://github.com/lowRISC/opentitan/issues/5058).
An incomplete list of things we would like to see:
wX
registers into lanes.xperm4/8
: treat the first word as an array and the second word as an array of indices, and use it to perform N lookups into the first array. Useful for constant-time LUTs.