Open mkannwischer opened 5 months ago
regarding the pipeline models, I did some analysis on those already. M7 (it was even cited by paper linked by one keccak file): https://github.com/jnk0le/random/tree/master/pipeline%20cycle%20test#cortex-m7 and M4: https://github.com/jnk0le/random/tree/master/pipeline%20cycle%20test#cortex-m3-and-m4
@jnk0le Your document is very interesting and useful, thanks a lot. I've got a question. You write:
e.g. following snippet doesn't stall:
add.w r0, r2
eor.w r6, r0, r6, ror #22
I get that the add
runs on the "early ALU" to be able to 0-latency fwd to eor
. But doesn't the inline shift for the eor
also have to run on the early ALU, in the same cycle?
r0 is forwarded as non shifted operand (ie. there is no false dependency by skewed operand), so it can forward to late ALU (inline shifted operand2 work similarly to CM85 except that non shifting ones are special cased to not use shifter)
inline shift of second op clobbers the shifter in "early" stage (aka EX1 in ARM nomenclature) so first instruction can't be a shifting one.
@jnk0le Thanks -- but doesn't this mean that in EX1 we use shifter (for EOR) and adder (for add
) at the same time?
Yes, there is one shifter and 2 AGUs, (presumably) one of which can execute add
/sub
/mov
form older slot
(ie. there is no false dependency by skewed operand)
ldr
result is however subject to this kind of false dependency
@jnk0le agree, it's probably the AGU here. It doesn't work with EOR instead of ADD anymore, confirming that.
Continuation of #55