[test-triage] chip_same_csr_outstanding

msfschaffner commented 2 years ago

Hierarchy of regression failure

Chip Level

Failure Description

~~UVM_ERROR @ us: (cip_base_vseq.sv:250) [chip_common_vseq] Check failed masked_data == exp_data ( [] vs []) addr read out mismatch has 1 failures:~~

New error message: UVM_ERROR @ 5596.731437 us: (tl_monitor.sv:179) uvm_test_top.env.m_tl_agent_chip_reg_block.monitor [uvm_test_top.env.m_tl_agent_chip_reg_block.monitor] Number of pending a_req exceeds limit 4

Steps to Reproduce

GH revision: 9c0b24ddb util/dvsim/dvsim.py hw/top_earlgrey/dv/chip_sim_cfg.hjson -i chip_same_csr_outstanding --build-seed 2483310614 --fixed-seed 850130150

Tests with similar or related failures

chip_csr_aliasing chip_csr_rw

johngt commented 2 years ago

According to the nightlies the last time that this was showing <100 pass rate was https://reports.opentitan.org/hw/top_earlgrey/dv/2022.08.24_15.46.18/report.html with 40% success rate.

It appears that in all nightlies after that this is resolved. Not sure which PR would have resolved this. This is a candidate for closing / resolved once tagged with PR that resolved this.

Edit: Looks like this was resolved through the update to riscv-dv per https://github.com/lowRISC/ibex/commit/4990aa268428e51a1c65d2d467f0b3bda2371925

Tagging @weicaiyang in case there was something else that resolved this chip level test. If this test was resolved by riscv-dv upgrade please just close out the issue, otherwise please comment with the PR that resolved this and close out. TIA

weicaiyang commented 2 years ago

Thanks @johngt for looking at this issue. Since this test has been passed in past 4 nightly, I think we can close it. It's probably fixed when we fix some other issues. I'm not sure if it's because of the update in riscv-dv, but it's very unlikely as we don't involve riscv-dv in the chip-level

engdoreis commented 2 years ago

This test is not passing consistently, so I think it's worth investigating.

Tests	latest	2022.10.22	2022.10.21	2022.10.17/2	2022.10.17/1	2022.10.15	2022.10.14	2022.10.12/2	2022.10.12/1	2022.10.11	Suite
chip_same_csr_outstanding	95.00	95.00	100.00	95.00	100.00	100.00	100.00	95.00	95.00	100.00	tl_d_outstanding_access, tl_d_partial_access

Failure Description

  UVM_ERROR @ 7290.395290 us: (tl_monitor.sv:179) uvm_test_top.env.m_tl_agent_chip_reg_block.monitor [uvm_test_top.env.m_tl_agent_chip_reg_block.monitor] Number of pending a_req exceeds limit 9
  UVM_INFO @ 7290.395290 us: (uvm_report_catcher.svh:705) [UVM/REPORT/CATCHER]
  --- UVM Report catcher Summary ---

Steps to Reproduce

GH revision: 9c0b24ddb util/dvsim/dvsim.py hw/top_earlgrey/dv/chip_sim_cfg.hjson -i chip_same_csr_outstanding --build-seed 2483310614 --fixed-seed 850130150

weicaiyang commented 2 years ago

@engdoreis Thanks for opening this. I changed the max outstanding number to be 8 recently. Looks like the xbar may take 9 outstanding items. Need to dump a waveform and confirm it with designer. If you haven't looked into this, I can take it.

weicaiyang commented 2 years ago

@eunchan @tjaychen

Looks like the ibex TL data port (tb.dut.top_earlgrey.u_rv_core_ibex.tl_adapter_host_d_ibex.tl_o/tl_i) can actually take 9 outstanding items. Sounds like the async fifos in xbar takes 8 items and peripheral takes 1 item. Can you help confirm if this is expected? If so, I can make a PR to increase the max outstanding to 9.

I reproduced the failure here.

/mnt/disks/filestores/opentitan-shared/users/weicai/scratch/HEAD/chip_earlgrey_asic-sim-vcs/0.chip_same_csr_outstanding/latest

tjaychen commented 2 years ago

hmm.. i'm not sure this was a conscious choice. It's more like the async fifo's are able to absorb a certain number of transactions to cover the clock crossing latency. I don't think it absorbs 8 though, (i thought it was either 2 or 4), so we'd have to take a look to see what's happening there.

eunchan commented 2 years ago

I checked the generated codes. As of now, TL async socket has 4 depth of req and rsp FIFOs. So total 8 req (4 + 4) can be outstanding inside xbar.

At that time the async FIFO should be greater than 2. So the depth is set to 4. I think current async FIFO can support depth of 2. We can revise Xbar and reduce the number of outstanding also.

On Oct 24, 2022, at 4:57 PM, tjaychen @.***> wrote:

hmm.. i'm not sure this was a conscious choice. It's more like the async fifo's are able to absorb a certain number of transactions to cover the clock crossing latency. I don't think it absorbs 8 though, (i thought it was either 2 or 4), so we'd have to take a look to see what's happening there.

— Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/14204#issuecomment-1289793526, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJDG3XARL7OJG4IC4A4WRTWE4O5NANCNFSM56GGMAWQ. You are receiving this because you were mentioned.

tjaychen commented 2 years ago

hmm... i took a peek at the timestamp @weicaiyang pointed me to where the test was erroring, and at least the tlul async fifo's response and request fifos all had write depths of 0.

I think my larger question is..what exactly are we testing? Because usually when I think of outstanding capacity, I think about a host's ability to continually issue before it sees a response. So that would usually be a fixed number on the host side. Or we could think of it as a device's ability to keep accepting transactions before it responds (in the case of a memory controller). But I'm not quite sure what we intend to do here with the fabric. The fabric should certainly be able to support at least the number of outstanding transactions a host can issue, but it's not clear what placing an upper limit number on it means.

Is the test trying to say the fabric should be able to support "at least" a certain number? Because that would make sense to me, but making sure it's smaller than a certain number...I'm not sure what that means.

weicaiyang commented 2 years ago

hmm... i took a peek at the timestamp @weicaiyang pointed me to where the test was erroring, and at least the tlul async fifo's response and request fifos all had write depths of 0.

I think my larger question is..what exactly are we testing? Because usually when I think of outstanding capacity, I think about a host's ability to continually issue before it sees a response. So that would usually be a fixed number on the host side. Or we could think of it as a device's ability to keep accepting transactions before it responds (in the case of a memory controller). But I'm not quite sure what we intend to do here with the fabric. The fabric should certainly be able to support at least the number of outstanding transactions a host can issue, but it's not clear what placing an upper limit number on it means.

Is the test trying to say the fabric should be able to support "at least" a certain number? Because that would make sense to me, but making sure it's smaller than a certain number...I'm not sure what that means.

Let me clarify. This is for a covergroup which comes from TL agent. Since we hijack the ibex data port with a TL agent, this coverage is added automatically. Now we see our CSR tests can occasionally hit 9 outstanding items. At the time stamp of the error, it queues 9 outstanding reqs with 9 different source ID. These reqs are accepted in A channel, but D channel rsp haven't completed.

We could either set the right max outstanding number or exclude this coverage. Since ibex can only support 2 outstanding item, this limit doesn't really matter. But may be worth understanding why it supports much more than the host could send. As @eunchan mentioned, this may be the area we could optimize.

engdoreis commented 2 years ago

Thanks @weicaiyang for working on this.

tjaychen commented 2 years ago

@weicaiyang i might have to check in with you later. I'm not sure if i'm looking at the wrong thing. When I looked at the failing timestamp, it was claiming that none of the FIFO's had any entries, so I might be looking at the wrong place.

tjaychen commented 2 years ago

thanks @weicaiyang and @eunchan for explaining. I finally see what's happening. It definitely makes sense to reduce the fifo depth...especially on the response side. I think our core in general does not issue new transactions until it sees a response to the previous (with the exception of a few cases that do not go to peripherals).

But we should probably double check with @GregAC to be sure before changing.

tjaychen commented 2 years ago

alright! i checked in with @GregAC. On ibex data side, we only issue 1 transaction at a time unless its a misaligned instruction. The instruction side can issue more, but that would never go through the async fifo path anyways.

So I think I will follow along @weicaiyang / @eunchan suggestion of shrinking the the async fifo depth.

weicaiyang commented 2 years ago

SG, thanks @tjaychen

weicaiyang commented 2 years ago

This test is still failing. Let me take a look.

weicaiyang commented 2 years ago

2 observations

I set the max_outstanding_num to 3, but flash_ctrl mem adapter can support 2 outstanding req, so we need to at least change this value to 4.
there are many async fifo with depth = 2, should we reduce them to one? If not, we may need to increase the max_outstanding_num to 6. @tjaychen Any thoughts?

waves: /mnt/disks/filestores/opentitan-shared/users/weicai/scratch/top_km/chip_earlgrey_asic-sim-vcs/0.chip_same_csr_outstanding/latest

The UVM_ERROR shows it hits 5 outstanding items.

Sorry, I didn't test my update thoroughly, and now this affects many CSR tests.

Please skip looking at this error in other tests.

UVM_ERROR @ 5596.731437 us: (tl_monitor.sv:179) uvm_test_top.env.m_tl_agent_chip_reg_block.monitor [uvm_test_top.env.m_tl_agent_chip_reg_block.monitor] Number of pending a_req exceeds limit 4

tjaychen commented 2 years ago

These are actually sync FIFO's inside socketm1 and socket1n. @eunchan and I actually went through this some time ago and reduced a lot of the depths to 0 already

Did we miss some of them..?

weicaiyang commented 2 years ago

These are actually sync FIFO's inside socketm1 and socket1n. @eunchan and I actually went through this some time ago and reduced a lot of the depths to 0 already

Did we miss some of them..?

This one in socket m1 is still using 2. So is this one in socket n1.

tjaychen commented 2 years ago

i thought all the n1s were instantiated with 0's? is that not the case? The m1 one does make sense to default it to 1.

weicaiyang commented 2 years ago

i thought all the n1s were instantiated with 0's? is that not the case? The m1 one does make sense to default it to 1.

I'm afraid not. Not all the parameters are set in xbar_main. Please look at the lines I pointed in the links below. Those still use default parameters which set depth to 2. And the depth passes down to a fifo.

This one in socket m1 is still using 2. So is this one in socket n1.

tjaychen commented 2 years ago

hm? but those just link to the socket1n module. Is there an instantiation of socket1n where you do not see it being set to 0? socketm1 i think that was indeed the case, but we can probably change the default.

eunchan commented 2 years ago

xbar_main.sv and xbar_peri.sv instantiates those modules with Depth 0 always, as far as I know. So, they are fine.

eunchan commented 2 years ago

let me correct. I checked xbar_main.sv. As Tim mentioned, socket_m1 has D side FIFO (depth of 2).

eunchan commented 2 years ago

Above my comments are incorrect. Weicai will add a comment here for the summary of the offline discussion.

weicaiyang commented 2 years ago

As discussed with @eunchan offline, perhaps we need to update max_outstanding to 6

flash mem can take 2 outstanding req and it goes through this m1 socket, which can take 2 items in each req and rsp fifo.

  tlul_socket_m1 #(
    .HReqDepth (12'h0),
    .HRspDepth (12'h0),
    .DReqPass  (1'b0),
    .DRspPass  (1'b0),
    .M         (3)
  ) u_sm1_31 (
    ....

The other M1 don't have more than 2 items in each req/rsp fifo and the other IPs don't support higher than 2 outstanding req.

The socket 1n that ibex D port goes through doesn't have extra fifo. So, we will have max_outstanding = 6 for ibex D port.

@tjaychen does this make sense to you?

EDIT: flash mem isn't a regular RW mem, which doesn't simply return the value we program, so CSR tests don't test it at all. The sram mem can also take 2 outstanding req, but its m1 socket doesn't have a fifo. Hence, the max_outstanding should be 5

weicaiyang commented 2 years ago

Thanks to Tim, this should be fixed by #15918. the xbar can hold maximum 2 items now, plus another one in peripheral, the total is 3 which is what we set on DV side.

engdoreis commented 2 years ago

Closing this issue as the tests passed 4 regressions already:

	Tests	2022-11-06 07:11:02	2022-11-05 07:11:23	2022-11-04 07:12:05	2022-11-03 07:11:23	2022-11-02 07:03:26	2022-11-01 07:10:18	2022-10-27 07:09:48	2022-10-26 07:06:46	2022-10-25 07:06:33	2022-10-24 07:05:24	Suite
1	chip_csr_aliasing	100	100	100	60	80	60	100	100	100	100	csr_aliasing, regwen_csr_and_corresponding_lockable_csr, tl_d_outstanding_access, tl_d_partial_access
5	chip_csr_rw	100	100	100	100	95	65	100	100	100	100	csr_rw, regwen_csr_and_corresponding_lockable_csr, tl_d_outstanding_access, tl_d_partial_access
20	chip_same_csr_outstanding	100	100	100	100	70	70	100	95	100	100	tl_d_outstanding_access, tl_d_partial_access

lowRISC / opentitan