sergeykhbr / riscv_vhdl

Portable RISC-V System-on-Chip implementation: RTL, debugger and simulators
http://sergeykhbr.github.io/riscv_vhdl/
Apache License 2.0
628 stars 104 forks source link

Potential ICache Bug #19

Closed mateoconlechuga closed 6 years ago

mateoconlechuga commented 6 years ago

It seems that the cache can request two read addresses in sequence, without waiting for the first one to come back. I see this occur often on returns and jumps; causing the processor to enter an undefined state.

In other words, ARVALID is held high for two clocks, and the address is different on each clock, causing 2 requests. As near as I can determine it is caused by r.double_req in icache.vhd.

I would be happy to provide more information if nesassary.

sergeykhbr commented 6 years ago

The latest version actively tested including many-hours FPGA tests. It is always possible that some bug was skipped but it should be exceptionally rare case.

I suppose you have some misunderstanding. The memory transaction considered accepted only when AR_VALID = 1 && AR_READY = 1. There's possible situation when fetch module changes read request address while AR_READY=0.

mateoconlechuga commented 6 years ago

Yes, AR_VALID = 1 && AR_READY = 1 are held both high in consecutive clocks. Here's the hardware capture on the axi bus, along with the corresponding code.

Where the yellow vertical line is there are two axi requests: (Click for larger) screenshot from 2018-07-31 15-53-07

000000001000000a <StartHello>:
    1000000a:   1141                    addi    sp,sp,-16
    1000000c:   45b9                    li  a1,14
    1000000e:   00007517            auipc   a0,0x7
    10000012:   8ca50513            addi    a0,a0,-1846 # 100068d8 <__errno+0xc>
    10000016:   e406                    sd  ra,8(sp)
    10000018:   016000ef            jal ra,1000002e <PrintUart>
    1000001c:   60a2                    ld  ra,8(sp)
    1000001e:   45b9                    li  a1,14
    10000020:   00007517            auipc   a0,0x7
    10000024:   8b850513            addi    a0,a0,-1864 # 100068d8 <__errno+0xc>
    10000028:   0141                    addi    sp,sp,16
    1000002a:   0040006f            j   1000002e <PrintUart>

000000001000002e <PrintUart>:
    1000002e:   87aa                    mv  a5,a0
    10000030:   400006b7            lui a3,0x40000
    10000034:   40a7873b            subw    a4,a5,a0
    10000038:   00b74363            blt a4,a1,1000003e <PrintUart+0x10>
    1000003c:   8082                    ret
    1000003e:   4698                    lw  a4,8(a3)
    10000040:   8b21                    andi    a4,a4,8
    10000042:   ff75                    bnez    a4,1000003e <PrintUart+0x10>
    10000044:   0007c703            lbu a4,0(a5)
    10000048:   0785                    addi    a5,a5,1
    1000004a:   c2d8                    sw  a4,4(a3)
    1000004c:   b7e5                    j   10000034 <PrintUart+0x6>

From icache.vhd, this line: https://github.com/sergeykhbr/riscv_vhdl/blob/master/rtl/riverlib/cache/icache.vhd#L261-L263 Appears to issue this because instruction address 1000002e has lower bits "110". This causes r.double_req to be set, which then causes a second request before the first one completes.

I'm not sure if my reasoning is correct.

sergeykhbr commented 6 years ago

It doesn't look good. But I noticed that you're using another system. In my opinion the interconnect has to lower AR_READY signal just after first transaction was accepted.

There's another possibility for such behavior if your interconnect implements buffered ports accepting several transactions.

mateoconlechuga commented 6 years ago

Yes, the interconnect is able to accept several transactions (but that is part of the axi standard?) Anyway, I guess I can try lowering AR_READY when the transaction gets accepted, and then raising it when the response comes back?

sergeykhbr commented 6 years ago

I agree that you found a problem occurring with the buffered interconnect with the queue length more than one and it should be fixed somehow. But it's a bit another use case that I cannot reproduce right now. If you provide solution it would be very nice.

Accordingly to my experience fix for such problems may have undesirable side effects on long tests.

sergeykhbr commented 6 years ago

Can you try the following patch to avoid bus request signal from caches before previous response has been received?

rtl_cache_buffered_out_patch.txt