openhwgroup / cv-hpdcache

RTL sources of the High-Performance L1 Dcache (HPDcache) for OpenHW CV cores
Other
62 stars 20 forks source link

[Question] How are the error responses treated #31

Open khandelwaltanuj opened 1 month ago

khandelwaltanuj commented 1 month ago

Hello,

I would like to understand, how error responses are treated. This is what I am observing.

  1. In the case where memory responds to a READ with an error, all the subsequent load which are hit at the same addresse have an error response.
  2. In the case where memory responds to a READ with an error, the store which follows at the same adresse has a response withtout and error.

Is it correct ?

Thanks and Regards Tanuj Khandelwal

cfuguet commented 1 month ago

Hello @khandelwaltanuj,

The second case,

In the case where memory responds to a READ with an error, the store which follows at the same adresse has a response withtout and error.

The behaviour of the cache depends on the write policy used for the store. Let me explain: when there is a read miss, if the response has the error flag set, then the cache does not refill the cacheline and the response is dropped. Then, if there is a store after that on the same cacheline there will be a write miss.

The HPDcache implements a write-non-allocate policy for write-through stores, and a write-allocate policy for write-back stores. In the first case, after a write miss, the cache writes the store data into the write-buffer and responds right away to the core (with no error0. In the second case, the cache first reads the cacheline from the memory, then:

We can say that write-through stores are acknowledged asynchronously to the core, then it is not possible to know at response time if the store will be an error. On the other hand, write-back stores are synchronous, thus it is possible to respond with an eventual error to the core.

I hope it is clear.

Cheers,

César

khandelwaltanuj commented 1 month ago

Hi César ,

Thanks for the response.

I assume in the case where a load follows are load. If first load is an error, the cache will not make a new memory access for the second load. It will just send another error ?

Regards Tanuj

cfuguet commented 1 month ago

No, it will again try the access to the memory. This is for two reasons:

  1. It will be too expensive to save the state of every accessed cacheline.
  2. Some errors are transient (e.g. error detection on data transmission from the DRAM controller). A given read could return an error in a given time, but succeed in subsequent one. In such cases, it would be wrong to tag the address as bad indefinitely. Moreover, some of these transient errors are not even related to a given address.
khandelwaltanuj commented 1 month ago

Hello

Thanks for your reponse.

Regards Tanuj

khandelwaltanuj commented 1 month ago

Hello @cfuguet

Can you please look into following scenerio:

I have a LOAD (NEED_RSP = 0, TID=25), followed by a STORE (NEED_RSP=1, TID=26). Both are write through. For load memory replies with an error and we observe that STORE (TID=26) response from cache is with an error. Here I believe that it is not correct for STORE to respond with an error. Can you please take a look at the following part of log. If you think there is an issue here, I can share the test.

UVM_INFO @ 358311500 ps [SB HPDCACHE REQ 4] OP=HPDCACHE_REQ_LOAD SID=4(x), TID=25(x), ADDR=cc8eef1b801e(x) SET=0(d), TAG=3323bbc6e(x), WORD=3(x) DATA=2f975367c81873b607a455d88a559909(x) BE=8000(x) SIZE=1(x) NEED_RSP=0(x) PHYS_IDX=0(x) UNCACHEABLE=0(x) WRITE POLICY=HPDCACHE_WR_POLICY_WT

UVM_INFO @ 358313000 ps [SB MEM REQ] ID=0(x), ADDR=cc8eef1b8000(x) SET=0(d), TAG=3323bbc6e(x), WORD=0(x) SIZE=6(d) LEN=0(d), CMD=HPDCACHE_MEM_READ ATOMIC=HPDCACHE_MEM_ATOMIC_ADD CACHEABLE=1(x)

UVM_INFO @ 358313500 ps [SB HPDCACHE REQ 4] OP=HPDCACHE_REQ_STORE SID=4(x), TID=26(x), ADDR=cc8eef1b8018(x) SET=0(d), TAG=3323bbc6e(x), WORD=3(x) DATA=b05fd77fa3cea49fdb552465dfbda0c4(x) BE=d700(x) SIZE=3(x) NEED_RSP=1(x) PHYS_IDX=1(x) UNCACHEABLE=0(x) WRITE POLICY=HPDCACHE_WR_POLICY_WT

UVM_INFO @ 358321000 ps [SB MEM READ RSP] ID=0(x), SET=0(d), TAG=3323bbc6e(x), WORD=0(x) ERROR=1(x), LAST=1(x) DATA=9359450d0c53ec359aabc8853128668f92cc8abc2e460c833b516b320623d9aec73eeb02fd3d2c4db9c1dc886deab60bb8948865d41fd2658c37f79cfbbf1800(x)

UVM_INFO @ 358326500 ps [SB HPDCACHE RSP 4] RSP SID=4(x), TID=26(x), ADDR=cc8eef1b8018(x) SET=0(d), TAG=3323bbc6e(x), DATA=0(x) ERROR=1(x)

UVM_ERROR @ 358326500 ps: uvm_test_top.env.m_hpdcache_sb [SB HPDCACHE ERROR ERROR] SET=0(d), TAG=3323bbc6e(x), Expected : 0(b), RECIEVED : 1(b)

Thanks and Regards Tanuj Khandelwal

cfuguet commented 1 month ago

Hello @khandelwaltanuj,

Yes, thank you. I will take a look into it.

I think I know where the problem comes from. It is a side-effect of the modifications to respond with an error in case of a write miss with error response from the memory.

When a read misses and, while waiting for its response, there is a write on the cacheline, the write is put on-hold into the Replay Table (RTAB). When the read error response arrives to the cache, the miss handler tags the write with an error, so when this write is replayed, the cache responds immediately with an error to the core.

I need to change the condition to tag with an error the pending write. I need only to do it when it is a write-back write miss. Otherwise, I can replay it normally.

I will do the modification and let you know,

Thanks,

César

khandelwaltanuj commented 1 month ago

Thanks @cfuguet

You mean to say in the case where the STORE request (TID=26) was write back, the cache would reply with an error=1 because the previous load on the same entry has an Error=1 ?

Regards Tanuj

cfuguet commented 1 month ago

No, if the write TID=26 is write-back, it should trigger a read to the memory because it will miss in the cache (the previous load TID=25 was an error), then the cache will respond with an error to the write if the read to the memory responds with an error. But this is unrelated to the previous load TID=25.

cfuguet commented 1 month ago

@khandelwaltanuj, this issue is now fixed. Let me know if you are able to validate it on your side.

Thanks

khandelwaltanuj commented 1 month ago

Hi César,

As this issue is opened by me, I prefer that I vérify the fix and close it.

Thanks a lot Regards Tanuj

cfuguet commented 1 month ago

Ok, that's fair.

I used the mechanism of Github to make a pull_request to automatically close related issues...

But let's keep it open until you validate the fix on your side.

khandelwaltanuj commented 1 month ago

Hello @cfuguet

I have a following scenario in one of my test: There are multiples stores with write_policy_auto followed by a store with write_policy_wb. I have cfg_default_wb_i == 0 and following parameters set. wtEn : 1, wbEn : 1

In the following scenarion, I see a read with ID=27 (the first one) with an error response=1. I am not able to understand which request is causing this read request, is it the write with write back policy ou the write with auto policy ?

If it is write back policy that is causing this read, in that case the UVM ERROR is probably because of an issue in the scoreboard, otherwise it may comme from an issue in the design. Can you please take a look at it ?

UVM_INFO @ 387331500 ps:[SB HPDCACHE REQ 0] OP=HPDCACHE_REQ_LOAD SID=0(x), TID=66(x), ADDR=2bd96baed9d2(x) SET=103(d), TAG=af65aebb(x), WORD=2(x) DATA=f5493511c5c6d6df15a130f639126aa0(x) BE=8(x) SIZE=1(x) NEED_RSP=0(x) PHYS_IDX=1(x) UNCACHEABLE=0(x) WRITE POLICY=HPDCACHE_WR_POLICY_AUTO UVM_INFO @ 387486000 ps:[SB MEM WRITE RSP(SOME OLD REQUEST)] ID=0(x), SET=103(d), TAG=af65aebb(x), WORD=0(x) ERROR=0(x), ATOMIC=0(x) UVM_INFO @ 387574500 ps:[SB HPDCACHE REQ 0] OP=HPDCACHE_REQ_STORE SID=0(x), TID=2c(x), ADDR=2bd96baed9d0(x) SET=103(d), TAG=af65aebb(x), WORD=2(x) DATA=eeea323220bf50ef0732a61af82b429e(x) BE=40(x) SIZE=3(x) NEED_RSP=0(x) PHYS_IDX=1(x) UNCACHEABLE=0(x) WRITE POLICY=HPDCACHE_WR_POLICY_AUTO UVM_INFO @ 387577000 ps:[SB MEM REQ] ID=3(x), ADDR=2bd96baed9c0(x) SET=103(d), TAG=af65aebb(x), WORD=0(x) SIZE=6(d) LEN=0(d), CMD=HPDCACHE_MEM_WRITE ATOMIC=HPDCACHE_MEM_ATOMIC_ADD CACHEABLE=1(x) UVM_INFO @ 387579000 ps:[SB MEM EXT REQ] ID=3(x), ADDR=2bd96baed9c0(x) SET=103(d), TAG=af65aebb(x), WORD=0(x) Data=3200000000000000000000000000000000000000000000(x) BE=400000(x) SIZE=x(d) LEN=x(d), CMD=HPDCACHE_MEM_WRITE ATOMIC=HPDCACHE_MEM_ATOMIC_SMAX CACHEABLE=1(x) UVM_INFO @ 387600500 ps:[SB HPDCACHE REQ 0] OP=HPDCACHE_REQ_STORE SID=0(x), TID=62(x), ADDR=2bd96baed9d0(x) SET=103(d), TAG=af65aebb(x), WORD=2(x) DATA=6ca7487bed7c02ebbff6f68a2c07df92(x) BE=b4(x) SIZE=3(x) NEED_RSP=0(x) PHYS_IDX=1(x) UNCACHEABLE=0(x) WRITE POLICY=HPDCACHE_WR_POLICY_WB UVM_INFO @ 387624000 ps:[SB MEM WRITE RSP] ID=3(x), SET=103(d), TAG=af65aebb(x), WORD=0(x) ERROR=0(x), ATOMIC=0(x) UVM_INFO @ 387629000 ps:[SB MEM REQ] ID=27(x), ADDR=2bd96baed9c0(x) SET=103(d), TAG=af65aebb(x), WORD=0(x) SIZE=6(d) LEN=0(d), CMD=HPDCACHE_MEM_READ ATOMIC=HPDCACHE_MEM_ATOMIC_ADD CACHEABLE=1(x) UVM_INFO @ 387682000 ps:[SB MEM READ RSP] ID=27(x), SET=103(d), TAG=af65aebb(x), WORD=0(x) ERROR=1(x), LAST=1(x) DATA=509582ff84d2cb7e67bee1423e7843d2c9b7ebdf4af2406cf068ea7cac233d8c9c08dfddb2724fc069324eec5096056d0c699f768ea18e852a546a5f73b991e5(x) UVM_INFO @ 387821500 ps:[SB HPDCACHE REQ 4] OP=HPDCACHE_REQ_LOAD SID=4(x), TID=c(x), ADDR=2bd96baed9d2(x) SET=103(d), TAG=af65aebb(x), WORD=2(x) DATA=95f064238655a1d13594099c21c698be(x) BE=4(x) SIZE=0(x) NEED_RSP=1(x) PHYS_IDX=0(x) UNCACHEABLE=0(x) WRITE POLICY=HPDCACHE_WR_POLICY_WT UVM_INFO @ 387823000 ps:[SB MEM REQ] ID=27(x), ADDR=2bd96baed9c0(x) SET=103(d), TAG=af65aebb(x), WORD=0(x) SIZE=6(d) LEN=0(d), CMD=HPDCACHE_MEM_READ ATOMIC=HPDCACHE_MEM_ATOMIC_ADD CACHEABLE=1(x) UVM_INFO @ 387855000 ps:[SB MEM READ RSP] ID=27(x), SET=103(d), TAG=af65aebb(x), WORD=0(x) ERROR=0(x), LAST=1(x) DATA=509582ff84d2cb7e67bee1423e7843d2c9b7ebdf4af2406cf068ea7cac233d8c9c08dfddb2724fc069324eec5096056d0c699f768ea18e852a546a5f73b991e5(x) UVM_INFO @ 387857500 ps:[SB HPDCACHE RSP 4] RSP SID=4(x), TID=c(x), ADDR=2bd96baed9d2(x) SET=103(d), TAG=af65aebb(x), DATA=9c08dfddb2724fc069324eec5096056d(x) ERROR=0(x) UVM_INFO @ 387857500 ps: uvm_test_top.env.m_hpdcache_sb [SB HPDCACHE LOAD/AMO RSP] OP=HPDCACHE_REQ_LOAD ADDR=2bd96baed9c0(x) SET=103(d), TAG=af65aebb(x) Offset=18(d) WORD=1(d) DATA=9c08dfddb2724fc0bf32f68a5007056d(x) ERROR=0(x) ERROR=0(x) UVM_ERROR @ 387857500 ps: uvm_test_top.env.m_hpdcache_sb [SB HPDCACHE DATA ERROR] ADDR=2bd96baed9d2(x), SET=103(d), TAG=af65aebb(x) BYTE=2(d) ACC DATA=96(x) EXP DATA=7(x)

Regards Tanuj Khandelwal

khandelwaltanuj commented 3 weeks ago

Hi @cfuguet

Any update on this one please ?

Thanks and Regards Tanuj Khandelwal

cfuguet commented 1 week ago

Hello @khandelwaltanuj,

I do not have yet access to QuestaSim on my side, thus I cannot replay the test. Whatsoever, the STORE with ID=63 indicates the WB mode, thus it can trigger a MEM_READ in case of miss. This is probably what is happening here.

César

khandelwaltanuj commented 1 week ago

Hi @cfuguet

do have access to any industrial simulator like vcs or something else ? I can try to shift to that simultor if we have acces ?

Regards Tanuj Khandelwal

cfuguet commented 1 week ago

Unfortunately, for the moment I'm only able to use Verilator... but I will have soon again access to commercial tools.