riscv-software-src / riscv-perf-model

Example RISC-V Out-of-Order/Superscalar Processor Performance Core and MSS Model
Apache License 2.0
133 stars 56 forks source link

Branch Prediction API #139

Closed arupc-vmicro closed 6 months ago

arupc-vmicro commented 8 months ago

A code introducing branch prediction API. Still a draft PR as I am working through a compilation issue.

Please feel free to review and provide feedback.

klingaard commented 8 months ago

@bdutro you might have some input...

danbone commented 8 months ago

Hi Arup, following up on our discussion last night. Could you help me understand how the API will provide configurability, and support speculative state updates?

For configurability, how do we pass in runtime parameters. Furthermore how can we enable reuse of predictor components. A simple example could be that we have two different branch predictors, one is a bimodal table (similar to what you have implemented), the other is a global history predictor such as gshare (i.e. a branch history table that indexed through a hash of the global history and the PC). Both predictors have different parameters, and you would likely want to swap them in and out but use the same BTB (and RAS/indirect predictor).

Like the Cache model allows users to change the address decoder, do we need similar methods for changing the components within the branch predictor i.e. the conditional branch predictor, the indirect target predictor, BTB and return address stack.

The other point is about supporting predictors which have internal speculative state. For example a global history predictor has the global history register which needs to be recovered following a flush. Similarly some predictors require meta data about the prediction to be available for the update, TAGE requires the provider table, and alternate prediction for example.

Finally we did speak about how to fit a block based BTB, like you've implemented, into a trace execution model. Branch predictors work in addresses, using trace is difficult because we have to pair instructions with their predictions based on their address. How do we handle malformed trace, or trace that has unexpected change of flow (i.e. interrupt/exception).

EDIT: I think we also need to consider how to collect statistics from the predictor.

danbone commented 8 months ago

Knute mentioned looking at other simulators. I've used gem5, but I'm also aware of champsim.

gem5 o3 model The branch predictor is an instance within the fetch unit. It contains the BTB, RAS, conditional and indirect target predictors. However the BTB, indirect target predictor and RAS are all objects within the branch unit, which can be swapped out, or disabled entirely.

Fetch
    ├── BranchPredictor
    │   ├── btb
    │   ├── ras
    │   ├── indirectBranchPred
    │   └── IndirectPredictor

As for the API you can read the documentation here for details but I'll summarize.

The interface with Fetch is just 3 methods: predict, squash(flush) and update. Predict is called for each instruction fetched, the instruction itself is passed through the API. The predictor relies on a sequence number (seqNum) to map predictions to squashes/updates, as the updates/history is stored within the class.

The BTB/RAS/Indirect predictor objects can be swapped out in the build configuration as long as they're a subclass of the original implementation.

The conditional predictor on the other hand, must be a subclass of the branch predictor unit itself. Many leave the base predict, squash and update implementations, and instead override the methods referenced in those. Mainly the lookup, updateHistories, update (internal one), and squash.

*Both referenced by predict.

ChampSim In ChampSim (looking at the ooo_cpu) it separates the BTB and BranchPredictor into two separate classes, though the BTB contains the RAS. The ooo_cpu does a prediction for each instruction fetched (as far as I can tell). It also validates the predictions against the trace, mispredictions stall the pipeline until their resolved.

Both the BTB and BranchPredictor take the PC to produce a prediction. The update methods take the PC, target, direction and branch type.

Olympia I think the makePrediction/doUpdate methods are fine, given how generic they are. But I do think more thought is needed on how to integrate these, providing parameterization and enable statistic collection.

It might be better to have separate objects for the conditional predictor, indirect target, BTB and RAS. With another API defined for each of them, and then a BranchPredict implementation which orchestrates predictions, updates, flushes across those objects.

That single implementation will most likely provide enough flexibility for many use cases. We also need to determine how it fits into the pipeline. My thoughts are that allow the predictor to maintain it's own state for history/pending updates, and like gem5, an ID is used to tag updates to instructions.

arupc-vmicro commented 8 months ago

For configurability, how do we pass in runtime parameters. Furthermore how can we enable reuse of predictor components. A simple example could be that we have two different branch predictors, one is a bimodal table (similar to what you have implemented), the other is a global history predictor such as gshare (i.e. a branch history table that indexed through a hash of the global history and the PC). Both predictors have different parameters, and you would likely want to swap them in and out but use the same BTB (and RAS/indirect predictor).

Addressing this part of question/comment: We can do this using ResourceFactories, which are used to configure the tree nodes. As an example, please see files core/CPUFactories.hpp. While sparta units such fetch, decode use a default factory, the unit execute uses a custom factory defined in core/Execute.cpp, which goes through the core extension yaml and creates execution units.

Extending that idea, we can do the following:

arupc-vmicro commented 8 months ago

Like the Cache model allows users to change the address decoder, do we need similar methods for changing the components within the branch predictor i.e. the conditional branch predictor, the indirect target predictor, BTB and return address stack.

IMO, the API should be generic enough to support simplest of the predictors, which may not have different sub-predictors as as long as API does not prevent or create undue burden on implementing more complicated predictors with sub-predictors.

The other point is about supporting predictors which have internal speculative state. For example a global history predictor has the global history register which needs to be recovered following a flush. Similarly some predictors require meta data about the prediction to be available for the update, TAGE requires the provider table, and alternate prediction for example.

Can we not use the updatePredictor method with appropriate input (sub-classed from DefaultUpdate or defined separately) to restore/change predictor states, as needed?

Finally we did speak about how to fit a block based BTB, like you've implemented, into a trace execution model. Branch predictors work in addresses, using trace is difficult because we have to pair instructions with their predictions based on their address. How do we handle malformed trace, or trace that has unexpected change of flow (i.e. interrupt/exception).

We do have to add proper mechanism to handle traces. I plan to address this later.

Also, similar to Sparta's cache API, the user of the API can maintain stats, as needed.

arupc-vmicro commented 8 months ago

I think the makePrediction/doUpdate methods are fine, given how generic they are. But I do think more thought is needed on how to integrate these, providing parameterization and enable statistic collection.

I agree and issue #143 will give us the opportunity to work on these aspects.

It might be better to have separate objects for the conditional predictor, indirect target, BTB and RAS. With another API defined for each of them, and then a BranchPredict implementation which orchestrates predictions, updates, flushes across those objects.

I am happy to build on top of the currently proposed API, once reviewed and finalized, to create sub-predictor-specific APIs, if that's needed.

My suggestion would be that we work towards implementing #143 using the proposed branch prediction API and Sparta subunits to create, parameterize sub-predictors and combine them. As we work through it, we can discuss how can we keep interfaces generic enough to be APIs

danbone commented 8 months ago

On a different note do we need to discuss the directory structure going forward for when others implement their own predictors? Do we need a separate branchpred directory under core?

arupc-vmicro commented 7 months ago

Please put everything in separate namespace

Have added branch prediction IF and SimplePredictor in the Olympia namespace, for now, in line with what we have currently for LSU, Dispatch etc. I will capture this request in a separate issue to create separate namespaces for different units of Olympia

arupc-vmicro commented 7 months ago

On a different note do we need to discuss the directory structure going forward for when others implement their own predictors? Do we need a separate branchpred directory under core?

Let us address this in a new PR when we add other branch predictors.

arupc-vmicro commented 7 months ago

Addressed review comments. Please take a look and approve, if looks good. I need at least one approval for merge.

arupc-vmicro commented 6 months ago

As a reminder, if there is no further comment, please approve so that we can merge this.