About the "ex3_l_s_q_val" signal in "xuq_lsu_l2cmdq" module

zhaoxiahust commented 3 years ago

Hi guys,

It seems to me that the input signal "ex3_l_s_q_val" of the "xuq_lsu_l2cmdq" module can be used to indicate a load miss happens. In this case, "xuq_lsu_l2cmdq" can use this signal to decide whether to insert the current load in the LMQ or not. However, after tracing the source of signal "ex3_l_s_q_val" in xuq_lsu_dc_cntrl.vhdl, I could not find its relation with load miss. Below is the source code in xuq_lsu_dc_cntrl.vhdl.

ex3_l2_op_d <= (l2_ctype or is_mem_bar_op or ex2_msgsnd_instr_q or ex2_mtspr_trace_q or ex2_dci_instr_q or ex2_ici_instr_q) and not ex2_stg_flush;

l_s_q_val <= ex3_l2_op_q;

Did I miss anything? If the signal "ex3_l_s_q_val" has no relation with load miss, how can xuq_lsu_l2cmdq decide to insert the current load into the LMQ or not?

Thanks for your help in advance! Let's enjoy the beauty of A2I!

Cheers, Xia

openpowerwtf commented 3 years ago

I think you are correct - ex3_l2_op is just L/S ops of interest.

This looks like the qualified version to allow LMQ insertion:

ld_m_val <= (ex3_l_s_q_val and ex3_load_instr and not ld_queue_full and not ld_q_seq_wrap) or
            pe_recov_ld_val_l2;

And this is the LMQ update/valid:

         l_q_wrt_en(i) <= ld_m_val and ((not ld_rel_val_l2(i) and b and not pe_recov_ld_val_l2) or
                                   (ex7_loadmiss_qentry(i)     and     pe_recov_ld_val_l2));
         ld_rel_val_d(i) <= l_q_wrt_en(i) or
                                   (ld_rel_val_l2(i) and not reset_lmq_entry(i));

Do you agree? This logic appears to have been rewritten for timing (probably because of flushes being included).

zhaoxiahust commented 3 years ago

Hi openpower-cores,

Thanks for the quick reply. I found the logic related to l_q_wrt_en(i) and ld_rel_val_d(i) before. However, if ex3_l2_op is just L/S ops, I could not find the relation between "l_q_wrt_en(i) <= ld_m_val and ((not ld_rel_val_l2(i) and b and not pe_recov_ld_val_l2) or (ex7_loadmiss_qentry(i) and pe_recov_ld_val_l2));" and the L1 cache load miss.

The current logic seems to me that a load will be inserted into the LMQ whether it hits in the L1 or not. Can you give me any hints about how does A2I prevent inserting a load into the LMQ if it hits in the L1.

Many thanks Xia

openpowerwtf commented 3 years ago

@zhaoxiahust Good questions - see if you believe this 😀

I think the load is inserted, but I don't think the entry is eligible for selection for L2 request without 'ld_entry_val_l2'. 'ex3_drop_ld_req' looks like the indicator from L1 for whether an L2 request should be allowed. In ex4, it qualifies things like the address compares and blocks setting of ld_entry_val_l2.

I didn't look, but assume 'drop_ld_req' includes the L1 parity check, since hit+perr becomes an L1 invalidate/miss.

zhaoxiahust commented 3 years ago

Hi openpower-cores,

I really appreciate your patience but I am not totally convinced.

In xuq_lsu_dc_cntrl.vhdl file.

ex3DropLd     : ex3_drop_ld_req_b    <= not ((ex3_hit and ex3_drop_cacheable) or ex3_drop_touch_int);
ex3_drop_ld_req    <= not ex3_drop_ld_req_b;

It seems to me that if ex3_hit=0, then ex3_drop_ld_req_b=1, then ex3_drop_ld_req=0 which is the input singnal ex3_drop_ld_req of the "xuq_lsu_l2cmdq" module.

In xuq_lsu_l2cmdq.vhdl file, if ex3_drop_ld_req=1, as shown in the below code, ex4_flush_load will not be affected, right? Thus, ld_entry_val_d(i) can still be set as 1 normally.

  ex4_flush_load  <= (ex7_ld_par_err or ex8_ld_par_err_l2 or ex4_drop_ld_req or l_m_fnd_stg or my_ex4_flush_l2) and not recov_ignr_flush_d;

   ld_entry_val_d(i) <= (ex4_loadmiss_qentry(i) and not ex4_flush_load) or
                        (ld_entry_val_l2(i) and not (load_sent and l_q_rd_en(i)) and
                              not(ex5_loadmiss_qentry(i) and (ex7_ld_par_err or ex5_flush_load_local)) and
                              not(ex6_loadmiss_qentry(i) and (ex7_ld_par_err or ex6_flush_l2)));

Did I miss anything?

Many thanks Xia

openpowerwtf commented 3 years ago

What you typed at the start is correct. Don't drop entry if miss. hit=0 (miss) -> ex3_drop_ld_req=0

But if ex4_drop_ld_req=1, ex4_flush_load will be active and block the valid from being set. Or are you asking about ex3 vs ex4? I think ex4_loadmiss_qentry enables setting the valid in ex4.

zhaoxiahust commented 3 years ago

Hi openpowerwtf,

I am sorry I did not list the full code tracking record related to "ex4_drop_ld_req" since I thought it is clear in the xuq_lsu_l2cmdq.vhdl file. I guess I might miss some details which are important.

latch_ex4_drop_ld_req : tri_rlmreg_p
  generic map (width => 1, init => 0, expand_type => expand_type)
  port map (nclk    => nclk,
            act     => '1',
            forcee => func_sl_force,
            d_mode  => d_mode_dc,
            delay_lclkr => delay_lclkr_dc,
            mpw1_b  => mpw1_dc_b,
            mpw2_b  => mpw2_dc_b,
            thold_b => func_sl_thold_0_b,
            sg      => sg_0,
            vd      => vdd,
            gd      => gnd,
            scin    => siv(ex4_drop_ld_req_offset to ex4_drop_ld_req_offset),
            scout   => sov(ex4_drop_ld_req_offset to ex4_drop_ld_req_offset),
            din(0)  => ex3_drop_ld_req,
            dout(0) => ex4_drop_ld_req);

The above code seems to me if ex3_drop_ld_req=0 ( i.e., hit=0, the L1 miss case), ex4_drop_ld_req will be 0 after one cycle. In this case, ex4_flush_load will not be active. Thus, ex4_flush_load cannot block the valid from being set.

I remember ex4_loadmiss_qentry is related to ex3_loadmiss_qentry which comes from l_q_wrt_en. It enables setting the valid in ex4 but it will not be affected by L1 cache miss either.

That's why I am still not clear how A2I implements only sending the L1 cache miss to the L2 cache.

Many thanks Xia

openpowerwtf commented 3 years ago

I don't see any details you've missed. flush_load blocks entry valid, and flush_load is set by L1 hit through drop_ld_req. Without entry valid, there is no L2 request.

You describe L1 miss, for which you need to do the L2 load:

The above code seems to me if ex3_drop_ld_req=0 ( i.e., hit=0, the L1 miss case), ex4_drop_ld_req will be 0 after one cycle. In this case, ex4_flush_load will not be active. Thus, ex4_flush_load cannot block the valid from being set.

zhaoxiahust commented 3 years ago

Wow, I got it! Thanks! I am confused by myself.

openpowerwtf commented 3 years ago

I knew you were close, just tangled up in code. 🤣 If you're going through hell, keep going!

The LMQ RTL is not that clear because of timing rewrites and negative logic, etc. And it's complicated because of special-case ops, flushes, parity error handling, etc.

zhaoxiahust commented 3 years ago

Hi openpowertrf, Although I fully understood the code now. One question just came to my mind. Now, we first insert a load into the LMQ and then decide it is a load miss or a load hit. I am thinking why not only inserting the load into the LMQ after knowing it is a load miss. Any opinions about this?

Many thanks

openpowerwtf commented 3 years ago

Very likely it was done to make timing. The other possibility is to align pipe stages. Unless the L2 req could be faster by a cycle, it didn't matter.

openpower-cores / a2i

About the "ex3_l_s_q_val" signal in "xuq_lsu_l2cmdq" module #11