openpower-cores / a2i

Other
243 stars 40 forks source link

How to map the branch prediction algorithm to the RTL code #14

Open Grubby-CPU opened 3 years ago

Grubby-CPU commented 3 years ago

I read the branch prediction section which seems to be a common technique and is very easy to understand. However, when I tried to understand the RTL code in the iuq_bp.vhdl based on my understanding, I could not find the relation.

For example, there are 1024 entries in the BP table which are accessed by IFAR(50:59) based on Figure D-3 in A2_BGQ.pdf. Below code only generates 8-bit address which confused me.

iu1_bh_ti0gs1_rd_addr(0 to 7) <= (ic_bp_iu1_ifar(52 to 55) xor iu1_gshare(0 to 3)) & ic_bp_iu1_ifar(56 to 59); iu1_bh_ti1gs1_rd_addr(0 to 7) <= iu1_tid_enc(0 to 1) & (ic_bp_iu1_ifar(54 to 57) xor iu1_gshare(0 to 3)) & ic_bp_iu1_ifar(58 to 59);

I tried to track the flow of the signal "iu1_bh_ti0gs1_rd_addr" and found it is the input of the tri_bht.vhdl file. "ary_r_data" seems to be the output of the BP table, but what's "data_out" which is related to INIT_MASK and "r_addr_q(0)". I am also confused by the below code.

data_out(0 to 7) <= gate(ary_r_data(0 to 7) xor (INIT_MASK(0 to 1) & INIT_MASK(0 to 1) & INIT_MASK(0 to 1) & INIT_MASK(0 to 1)), r_addr_q(0) = '0') or gate(ary_r_data(8 to 15) xor (INIT_MASK(0 to 1) & INIT_MASK(0 to 1) & INIT_MASK(0 to 1) & INIT_MASK(0 to 1)), r_addr_q(0) = '1') ; .

After reading the data, these data are processed again in iuq_bp.vhdl.

`with ic_bp_iu3_ifar(60 to 61) select iu3_0_br_hist <= iu3_3_bh_rd_data(0 to 1) when "11", iu3_2_bh_rd_data(0 to 1) when "10", iu3_1_bh_rd_data(0 to 1) when "01", iu3_0_bh_rd_data(0 to 1) when others;

with ic_bp_iu3_ifar(60 to 61) select iu3_1_br_hist <= iu3_3_bh_rd_data(0 to 1) when "10", iu3_2_bh_rd_data(0 to 1) when "01", iu3_1_bh_rd_data(0 to 1) when others;

with ic_bp_iu3_ifar(60 to 61) select iu3_2_br_hist <= iu3_3_bh_rd_data(0 to 1) when "01", iu3_2_bh_rd_data(0 to 1) when others;

iu3_3_br_hist <= iu3_3_bh_rd_data(0 to 1); `.

Can someone tell me how to understand these codes from the architectural view? Any hints are welcome

openpowerwtf commented 3 years ago

As a start...

tri_bht uses the array tri_128x16_1r1w_1 (128x16, 1 read, 1 write = 2K bits). There are four 2-bit data outputs of tri_bht. It uses ra(1:7) to read its array, then selects hi/lo data using ra(0). So it acts like a 256x8b access (four entries read at once).

ifar(60:61) are doing the 'Branch History Rotate' in the figure, which is a partial left-shift.

I think the figure implies 4 arrays, which would be 1K x 2b x 4 = 8Kb. I suspect that is wrong (probably changed at some point), and ifar(50:51) are not part of the selection; the address is created like you show, using 52:59. The text says:

The BHT consists of 1024 2-bit counters.

Grubby-CPU commented 3 years ago

Thanks! The code looks much more clear now. I think ‘'Branch History Rotate’ is used to deal with the the case where ifar(60:61) is not aligned, right? For example, if ifar(60:61) is 2'b10, then "iu3_0_br_hist" is "iu3_2_bh_rd_data(0 to 1)", "iu3_1_br_hist" is "iu3_3_bh_rd_data(0 to 1)", "iu3_2_br_hist" is "iu3_2_bh_rd_data" and "iu3_3_br_hist" is "iu3_3_bh_rd_data".

What does this mean in the architecture view? It seems to me that iu3_0_br_hist and iu3_1_br_hist have found the right BHT entry. However, iu3_2_br_hist and iu3_3_br_hist have the wrong BHT entries?