Closed zhaoxiahust closed 3 years ago
I think you are correct - ex3_l2_op is just L/S ops of interest.
This looks like the qualified version to allow LMQ insertion:
ld_m_val <= (ex3_l_s_q_val and ex3_load_instr and not ld_queue_full and not ld_q_seq_wrap) or
pe_recov_ld_val_l2;
And this is the LMQ update/valid:
l_q_wrt_en(i) <= ld_m_val and ((not ld_rel_val_l2(i) and b and not pe_recov_ld_val_l2) or
(ex7_loadmiss_qentry(i) and pe_recov_ld_val_l2));
ld_rel_val_d(i) <= l_q_wrt_en(i) or
(ld_rel_val_l2(i) and not reset_lmq_entry(i));
Do you agree? This logic appears to have been rewritten for timing (probably because of flushes being included).
Hi openpower-cores,
Thanks for the quick reply. I found the logic related to l_q_wrt_en(i) and ld_rel_val_d(i) before. However, if ex3_l2_op is just L/S ops, I could not find the relation between "l_q_wrt_en(i) <= ld_m_val and ((not ld_rel_val_l2(i) and b and not pe_recov_ld_val_l2) or (ex7_loadmiss_qentry(i) and pe_recov_ld_val_l2));" and the L1 cache load miss.
The current logic seems to me that a load will be inserted into the LMQ whether it hits in the L1 or not. Can you give me any hints about how does A2I prevent inserting a load into the LMQ if it hits in the L1.
Many thanks Xia
@zhaoxiahust Good questions - see if you believe this 😀
I think the load is inserted, but I don't think the entry is eligible for selection for L2 request without 'ld_entry_val_l2'. 'ex3_drop_ld_req' looks like the indicator from L1 for whether an L2 request should be allowed. In ex4, it qualifies things like the address compares and blocks setting of ld_entry_val_l2.
I didn't look, but assume 'drop_ld_req' includes the L1 parity check, since hit+perr becomes an L1 invalidate/miss.
Hi openpower-cores,
I really appreciate your patience but I am not totally convinced.
In xuq_lsu_dc_cntrl.vhdl file.
ex3DropLd : ex3_drop_ld_req_b <= not ((ex3_hit and ex3_drop_cacheable) or ex3_drop_touch_int);
ex3_drop_ld_req <= not ex3_drop_ld_req_b;
It seems to me that if ex3_hit=0, then ex3_drop_ld_req_b=1, then ex3_drop_ld_req=0 which is the input singnal ex3_drop_ld_req of the "xuq_lsu_l2cmdq" module.
In xuq_lsu_l2cmdq.vhdl file, if ex3_drop_ld_req=1, as shown in the below code, ex4_flush_load will not be affected, right? Thus, ld_entry_val_d(i) can still be set as 1 normally.
ex4_flush_load <= (ex7_ld_par_err or ex8_ld_par_err_l2 or ex4_drop_ld_req or l_m_fnd_stg or my_ex4_flush_l2) and not recov_ignr_flush_d;
ld_entry_val_d(i) <= (ex4_loadmiss_qentry(i) and not ex4_flush_load) or
(ld_entry_val_l2(i) and not (load_sent and l_q_rd_en(i)) and
not(ex5_loadmiss_qentry(i) and (ex7_ld_par_err or ex5_flush_load_local)) and
not(ex6_loadmiss_qentry(i) and (ex7_ld_par_err or ex6_flush_l2)));
Did I miss anything?
Many thanks Xia
What you typed at the start is correct. Don't drop entry if miss.
hit=0 (miss) -> ex3_drop_ld_req=0
But if ex4_drop_ld_req=1, ex4_flush_load will be active and block the valid from being set. Or are you asking about ex3 vs ex4? I think ex4_loadmiss_qentry enables setting the valid in ex4.
Hi openpowerwtf,
I am sorry I did not list the full code tracking record related to "ex4_drop_ld_req" since I thought it is clear in the xuq_lsu_l2cmdq.vhdl file. I guess I might miss some details which are important.
latch_ex4_drop_ld_req : tri_rlmreg_p
generic map (width => 1, init => 0, expand_type => expand_type)
port map (nclk => nclk,
act => '1',
forcee => func_sl_force,
d_mode => d_mode_dc,
delay_lclkr => delay_lclkr_dc,
mpw1_b => mpw1_dc_b,
mpw2_b => mpw2_dc_b,
thold_b => func_sl_thold_0_b,
sg => sg_0,
vd => vdd,
gd => gnd,
scin => siv(ex4_drop_ld_req_offset to ex4_drop_ld_req_offset),
scout => sov(ex4_drop_ld_req_offset to ex4_drop_ld_req_offset),
din(0) => ex3_drop_ld_req,
dout(0) => ex4_drop_ld_req);
The above code seems to me if ex3_drop_ld_req=0 ( i.e., hit=0, the L1 miss case), ex4_drop_ld_req will be 0 after one cycle. In this case, ex4_flush_load will not be active. Thus, ex4_flush_load cannot block the valid from being set.
I remember ex4_loadmiss_qentry is related to ex3_loadmiss_qentry which comes from l_q_wrt_en. It enables setting the valid in ex4 but it will not be affected by L1 cache miss either.
That's why I am still not clear how A2I implements only sending the L1 cache miss to the L2 cache.
Many thanks Xia
I don't see any details you've missed. flush_load blocks entry valid, and flush_load is set by L1 hit through drop_ld_req. Without entry valid, there is no L2 request.
You describe L1 miss, for which you need to do the L2 load:
The above code seems to me if ex3_drop_ld_req=0 ( i.e., hit=0, the L1 miss case), ex4_drop_ld_req will be 0 after one cycle. In this case, ex4_flush_load will not be active. Thus, ex4_flush_load cannot block the valid from being set.
Wow, I got it! Thanks! I am confused by myself.
I knew you were close, just tangled up in code. 🤣 If you're going through hell, keep going!
The LMQ RTL is not that clear because of timing rewrites and negative logic, etc. And it's complicated because of special-case ops, flushes, parity error handling, etc.
Hi openpowertrf, Although I fully understood the code now. One question just came to my mind. Now, we first insert a load into the LMQ and then decide it is a load miss or a load hit. I am thinking why not only inserting the load into the LMQ after knowing it is a load miss. Any opinions about this?
Many thanks
Very likely it was done to make timing. The other possibility is to align pipe stages. Unless the L2 req could be faster by a cycle, it didn't matter.
Hi guys,
It seems to me that the input signal "ex3_l_s_q_val" of the "xuq_lsu_l2cmdq" module can be used to indicate a load miss happens. In this case, "xuq_lsu_l2cmdq" can use this signal to decide whether to insert the current load in the LMQ or not. However, after tracing the source of signal "ex3_l_s_q_val" in xuq_lsu_dc_cntrl.vhdl, I could not find its relation with load miss. Below is the source code in xuq_lsu_dc_cntrl.vhdl.
ex3_l2_op_d <= (l2_ctype or is_mem_bar_op or ex2_msgsnd_instr_q or ex2_mtspr_trace_q or ex2_dci_instr_q or ex2_ici_instr_q) and not ex2_stg_flush;
l_s_q_val <= ex3_l2_op_q;
Did I miss anything? If the signal "ex3_l_s_q_val" has no relation with load miss, how can xuq_lsu_l2cmdq decide to insert the current load into the LMQ or not?
Thanks for your help in advance! Let's enjoy the beauty of A2I!
Cheers, Xia