riscvarchive / riscv-code-size-reduction

https://jira.riscv.org/browse/RVG-122
150 stars 34 forks source link

content of ra after fully executing popret[z] #197

Closed jnk0le closed 1 year ago

jnk0le commented 1 year ago

But I can also imagine the cores that might want to finish popret[z] a cycle earlier and e.g. somehow load stacked ra directly to pc in parallel with addi.

There is also possibility of abuse of observed behaviour, so I think it needs to be explicitly stated what can happen to ra after popret[z].

tariqkurd-repo commented 1 year ago

you can implement whatever you like in your microarchitecture, providing it is interrupt safe.

jnk0le commented 1 year ago

Checked cortex m0 and its pop {pc} doesn't modify the lr.

Reading the ARMv7m doc:

Pop Multiple Registers loads a subset (or possibly all) of the general-purpose registers R0-R12 and the PC or the LR from the stack

If the PC is specified in the register list, the instruction causes a branch to the address (data) loaded into the PC

I have the impression that it performs loads as explicitly stated in instruction: load to LR OR to PC not both. (guess that could be due to legacy baggage) Hence it's equivalent to:

content of ra prior to executing popret[z]

Back to riscv:

you can implement whatever you like in your microarchitecture, providing it is interrupt safe.

For POPRET once the stack pointer adjustment has been committed the ret must execute

Appears to software as: [...] lw ra, 12(sp) [...] ret

Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame, (move zero into a0,) return to ra.

This instruction pops (loads) the registers in reg_list from stack memory, adjusts the stack pointer by stack_adj, moves zero into a0 and then returns to ra

Well, I guess it actually could be interpreted so loosely:

But, will it be considered Zcmp compliant still? How about the verification stuff like golden models, compliance/regression tests etc.? (those will probably implement a straightforward interpretation) There is also dangerous precedence in implementing things, basing on this kind of spec interpretations.

Also the "abuse of observed behaviour" could happen in both scenarios:

I think that explicitly stating if uarch is allowed to implement any of those approaches or "must load to ra", is a backwards compatible change (wrt. hardware designs) and could be easily handled by puplic review process.