openpower-cores / a2i

Other
243 stars 40 forks source link

How to deal with wrong target address prediction of link stack #41

Open Grubby-CPU opened 3 years ago

Grubby-CPU commented 3 years ago

Hi Guys,

Based on my understanding, the link stack can give a wrong prediction of the branch target address. How does A2 detect this case and flush the pipeline. I did not find any related logic in the iuq_bp.vhdl.

Many thanks

openpowerwtf commented 3 years ago

In what cases do you think it gives the wrong prediction? Stack overflow? Other cases where stack is messed up relative to stream? There would have to be a 'final arbiter' later in the pipe to validate the target address.

Looking around iuq_bp.vhdl....seems like this is the address creation and the valid...

iu5_redirect_ifar_d(EFF_IFAR'left to 61)        <= iu4_lnk(EFF_IFAR'left to 61) when iu4_bclr = '1' else
                                                   iu4_bta(EFF_IFAR'left to 61);

iu5_redirect_tid_d(0 to 3)                      <= iu4_redirect_tid(0 to 3) and not iu4_flush_tid(0 to 3);

-- came from this...

iu3_br_pred(0 to 3)             <= iu3_br_val(0 to 3) and
                                   (iu3_br_hard(0 to 3) or
                                   (iu3_hint_val(0 to 3) and iu3_hint(0 to 3)) or
                                   (iu3_br_dynamic(0 to 3) and iu3_br_hist0(0 to 3)) or
                                   (iu3_br_static(0 to 3)));

-- which depends on 'predecode bits'

From Manual [2.9 Branch Processing] - good info; appears that XU does the final check on target address...

Branch Conditional to Link Register Incoming BCLR instructions obtain a BTA from the branch predictor's link stack. The link stack is a LIFO buffer designed to keep track of nested subroutines. It holds a list of potential LINK register contents, which are maintained based on subroutine calls and returns. A subroutine call is defined as any taken branch where instruction field LK = '1'. When a subroutine call is detected, the NIA (incremented IFAR) is pushed onto the top of the link stack because this is the location to which the subroutine will return. A subroutine return is defined as a taken branch conditional to LR (BCLR) where instruction field BH = ‘00’ (while this is kept as a condition for a subroutine return, it is generally assumed that all BCLR instructions are intended as subroutine returns). When a subroutine return is detected, a previously stored NIA is popped off the top of the link stack, and used as a BTA for the current BCLR instruction. In the event of nested subroutines, multiple consecutive calls are followed by multiple consecutive returns, with the LIFO structure of the link stack keeping everything ordered properly. The link stack is isolated and replicated per thread to maintain proper instruction flow in and out of the buffer. Each stack is four entries deep, and wide enough to accommodate the entire IFAR (poten- tially 62 bits). A pointer is used to define the top of the stack.

Misalignment In the event of a stack misalignment, the stack must be realigned. Misalignment occurs when the branch direction for a subroutine call/return is predicted incorrectly and the stack pointer is consequently moved to the wrong location. Realignment of the stack pointer relies on the use of a shadow pointer. The shadow pointer is governed by the same rules as the stack pointer, except that it acts on resolved branches instead of predicted branches. This guarantees that the value of the shadow pointer is always correct (even though the data is too old to be useful to the branch predictor under normal circumstances). Any time the execution unit flushes (whether due to a branch misprediction or not), the stack pointer is overwritten with the value of the shadow pointer. The shadow pointer becomes valid for predictions at this point because all branch instruc- tions that have not yet been resolved by the execution unit will be flushed with the rest of the pipeline. In the special case that a subroutine call was predicted not taken, then resolved taken, simple realignment is not sufficient. The top of the realigned stack must also be updated with the subroutine call's NIA.

Overflow Because the link stack is only four entries deep, the logic can only handle four nested subroutines before overflowing. In the case of an overflow, the stack pointer wraps and continue storing NIAs, overwriting existing data in the oldest locations. In this way, the link stack is always able to return BTAs for the four most recent nested subroutine calls. If the nesting has gone deeper than this, the link stack returns garbage for anything less recent. This is unavoidable. A deeper stack could reduce the impact at the expense of area.

Corruption It should be noted that there is a danger of BTA corruption in the case of BCLR instructions, due to either stack misalignment or overflow conditions. The XU must compare the predicted BTA against the executed BTA for all BCLRs and flag a misprediction if they fail to match.