riscv-software-src / riscv-perf-model

Example RISC-V Out-of-Order/Superscalar Processor Performance Core and MSS Model
Apache License 2.0
117 stars 43 forks source link

Random branch misprediction/flushing has a race condition #153

Closed klingaard closed 5 months ago

klingaard commented 5 months ago

Flushing from the ExecutionPipe, unfortunately, does not work. There is a race condition where ROB could retire a flushing branch before the flush manager has a chance to flush it. I have a patch for this.

danbone commented 5 months ago

I'm sure I saw this case when I originally tested it, and I split the completion of instruction in the executepipe to happen after detecting the misprediction.

https://github.com/riscv-software-src/riscv-perf-model/blob/bf5ae7e8adbd453a8aee2bc659a3c4559fc1b12a/core/ExecutePipe.cpp#L98-L109

Unless the timings have changed, then I don't see how it could happen?

cycle N (TICK): misprediction -> flushmanager -> sends flush cycle N+1 (FLUSH) : flush packet arrives in ROB (instructions following the mispredicted branch are flushed) cycle N+1(TICK): instruction completes

https://github.com/riscv-software-src/riscv-perf-model/blob/bf5ae7e8adbd453a8aee2bc659a3c4559fc1b12a/core/ExecutePipe.hpp#L87-L88

https://github.com/riscv-software-src/riscv-perf-model/blob/bf5ae7e8adbd453a8aee2bc659a3c4559fc1b12a/core/FlushManager.hpp#L121-L125

https://github.com/riscv-software-src/riscv-perf-model/blob/bf5ae7e8adbd453a8aee2bc659a3c4559fc1b12a/core/ROB.hpp#L107-L109

Could it be that there is a flush input port set with a >1 cycle delay on another unit?

I also noticed that the last parameter here should be set to true, as it is a zero cycle port?

https://github.com/riscv-software-src/riscv-perf-model/blob/bf5ae7e8adbd453a8aee2bc659a3c4559fc1b12a/core/FlushManager.hpp#L124-L125

klingaard commented 5 months ago

Yeah, in a design I'm modelling, timing changed. :smile: This is what happens for me...

Scenario: The branch is at the bottom of the ROB waiting to execute with a few other newer instructions after it.

cycle N (TICK): branch executed/completed -> misprediction -> flushmanager -> sends flush
                ROB retires branch and the newer instructions
cycle N+1 (FLUSH) : flush packet arrives in ROB (instructions following the mispredicted branch are flushed)
cycle N+1(TICK): instruction completes

Since (as you observed), the inport on the ROB is delayed a cycle, the flush came in after the ROB retired the branch and its buddies. The machine restarted and refetched those newer instructions that already retired. :boom:

I like what you're trying to accomplish -- restart the frontend down the right path ASAP. This works fine if the branch isn't in the retirement window. I guess one "fix" is the branch's completion is delayed a cycle to line up with the flush, but the flush timing must always be in alignment with the branch resolution. :thinking: Technically, the ROB should be the last to know of a middle-machine flush. It's not holding things up.

I also noticed that the last parameter here should be set to true, as it is a zero cycle port?

Maybe, but that will put the flushing in the Tick phase which is outta whack with the standard flow. This means you'll have to set up a LOT of precedence rules to ensure things like retireInstructions is called after flushes.

So I think for now this fix will work (albeit not what you intended). I think the flush upper can be for middle machine components and is instigated by the branch unit while the ROB is the only one that actually instigates a lower flush. Thoughts?

danbone commented 5 months ago

Yeah, I did feel that it was a bit flimsy initially. What do you propose would be a better way? Establish some form of handshake between the branch unit and the flush manager to delay completing the branch until after the flush?

Maybe, but that will put the flushing in the Tick phase which is outta whack with the standard flow. This means you'll have to set up a LOT of precedence rules to ensure things like retireInstructions is called after flushes.

Bit confused by this. The flush arbitration, and output ports are currently scheduled on Tick. The flush input ports on the other units are scheduled on Flush.

So I think for now this fix will work (albeit not what you intended). I think the flush upper can be for middle machine components and is instigated by the branch unit while the ROB is the only one that actually instigates a lower flush. Thoughts?

What are you referring to with flush upper/lower?

klingaard commented 5 months ago

Yeah, I did feel that it was a bit flimsy initially. What do you propose would be a better way? Establish some form of handshake between the branch unit and the flush manager to delay completing the branch until after the flush?

I guess if we go back to the way it was, the branch unit could still mark the branch as mispredicted, the ROB would simply stop retiring once it detects that, and the ROB would wait for the flush. The ROB could set state waiting_for_mispredicted_flush_ to prevent further retirement. That way the flush could come at any time.

Bit confused by this. The flush arbitration, and output ports are currently scheduled on Tick. The flush input ports on the other units are scheduled on Flush.

That's true, but you can't send a flush, 0-cycle, out during any phase > the flush phase. In your example, the current phase is Tick, which is executing the branch instruction and detecting a misprediction. If you send a 0-cycle flush at that moment you're asking the model to "go back in time" and perform a flush. It's possible that the ROB already executed -- or it might not have! Either way, the behavior is now nondeterministic.

What are you referring to with flush upper/lower?

The code you referred to in FlushManager... https://github.com/riscv-software-src/riscv-perf-model/blob/bf5ae7e8adbd453a8aee2bc659a3c4559fc1b12a/core/FlushManager.hpp#L124-L125