t-crest / patmos

Patmos is a time-predictable VLIW processor, and the processor for the T-CREST project
http://patmos.compute.dtu.dk
BSD 2-Clause "Simplified" License
135 stars 72 forks source link

handbook: destination register is undefined during the load delay slot #50

Closed Emoun closed 5 years ago

Emoun commented 5 years ago

In the patmos handbook under the description for typed loads it states:

The value of the destination register is undefined during this load delay slot.

This has some implications that I think we do not want. Say we have some code that needs to load 2 values and add them to an existing value in a register. A naive implementation could look like:

lwc $r1 = [$r2]
lwc $r1 = [$r3]
add $r4 = $r4, $r1
add $r4 = $r4, $r1

However, the above wording would render this wrong: at the third instruction the value of $r1 is undefined, since it's the destination register of the previous load. A correct implementation would have to be:

lwc $r1 = [$r2]
lwc $r5 = [$r3]
add $r4 = $r4, $r1
add $r4 = $r4, $r5

Which requires an additional register. $r1 is effectively unavailable for the second load. If we don't have enough registers available, a nop would be needed after the second load.

In my opinion, the value of the destination register should be unaffected by the load until after the delay slot. Then we can always reuse registers in successive loads. I also think this is the behavior most would expect.

I have already brought this to the attention of @schoeberl, so this issue is mostly to ensure we don't forget. Also, I don't know how Patmos currently implements the loads or whether a change would be needed to conform to my proposal.

jeuneS2 commented 5 years ago

While it would be certainly possible to implement this behavior, it is not straightforward to get this right when considering the details: a) Either load may or may not stall. The writeback of the first load would need to be delayed if the second load stalls. b) Adding logic to delay forwarding as needed may hurt fmax and thus needs to be placed carefully. c) interrupts need to be suppressed in the load delay slot.

schoeberl commented 5 years ago

We will keep the implementation as it is.