sequential mtspr calls fail

stffrdhrn commented 3 years ago

I have converted to or1k-tests to use inline calls to mtspr() and mfspr(). This causes failures in the following cases in or1k-mmu.

See: https://github.com/openrisc/or1k-tests/commit/32f0ba26bd581ca8cf7d3b4c46a53e2509acdcce

static int itlb_match_test (int way, int set)
{
...

  // Flush these areas incase cache doesn't write them through immediately
  mtspr(OR1K_SPR_DCACHE_DCBFR_ADDR, ta - 8);
  mtspr(OR1K_SPR_DCACHE_DCBFR_ADDR, ta);
(CASE 1 - assembly stores ta to stack at this time)

  mtspr(OR1K_SPR_DCACHE_DCBFR_ADDR, ta + PAGE_SIZE - 8);
  mtspr(OR1K_SPR_DCACHE_DCBFR_ADDR, ta + PAGE_SIZE);

  /* Shifting one in ITLBMR */
  add = (PAGE_SIZE*itlb_sets);
  // Space we'll access and expect the MMU to translate our requests
  ea = add + (set*PAGE_SIZE);

  /* Enable IMMU */
  immu_enable ();

  while (add != 0x00000000) {
(CASE 2)
    mtspr (OR1K_SPR_IMMU_ITLBW_MR_ADDR(way, set), ea | OR1K_SPR_IMMU_ITLBW_MR_V_MASK);
    mtspr (OR1K_SPR_IMMU_ITLBW_TR_ADDR(way, set), ta | ITLB_PR_NOLIMIT);

Case 1 - this happens when 1 nop after mtspr.

The first case is where we have mtspr following by a l.sw, this causes an invalidate to be sent to the dcache. At the same time as the write operation. The write fails.

Case 2 - this happens with 0 nop after mtspr.

The mtspr followed by mtspr causes the second write to OR1K_SPR_IMMU_ITLBW_TR_ADDR to be lost.

Workaround

Adding 2 l.nops after mtspr fixes the issue. Below in case 1 the l.sw came before the l.nop due to some optimizations from gcc, to fix it the asm for l.mtspr contains the nop inline i.e. l.mtspr %0, %1, 0; l.nop; l.nop. But the below is without the fix exposing the CPU issue, but it should still work.

Trace case1 - dcache spr mix with l.sw fails

    2ff8:       9e b9 20 00     l.addi r21,r25,8192
    2ffc:       aa 20 18 02     l.ori r17,r0,0x1802
    3000:       c0 11 d8 00     l.mtspr r17,r27,0x0  <--- mtspr(OR1K_SPR_DCACHE_DCBFR_ADDR, ta - 8);
    3004:       15 00 00 00     l.nop 0x0
    3008:       c0 11 c8 00     l.mtspr r17,r25,0x0   <--- mtspr(OR1K_SPR_DCACHE_DCBFR_ADDR, ta);
    300c:       d4 01 c8 24     l.sw 36(r1),r25          <--- This store fails to go to dcache (save ta)
    3010:       15 00 00 00     l.nop 0x0
    3014:       c0 11 b8 00     l.mtspr r17,r23,0x0
    3018:       15 00 00 00     l.nop 0x0
    301c:       c0 11 a8 00     l.mtspr r17,r21,0x0
    3020:       15 00 00 00     l.nop 0x0
    3024:       18 80 00 00     l.movhi r4,0x0

Trace case2 - immu spr fails

When we have a l.mtspr writing to the IMMU followed by another l.mtspr writing to the IMMU the second spr write may fail.

    24e0:       aa 94 08 00     l.ori r20,r20,0x800
    24e4:       aa 52 08 00     l.ori r18,r18,0x800
    24e8:       aa 10 03 c0     l.ori r16,r16,0x3c0
    24ec:       9f 51 ff ff     l.addi r26,r17,-1
    24f0:       aa 3e 00 01     l.ori r17,r30,0x1
    24f4:       c0 14 88 00     l.mtspr r20,r17,0x0  <--- This write to IMMUMR goes through
    24f8:       c0 12 80 00     l.mtspr r18,r16,0x0  <--- This write to IMMUTR never happens
    24fc:       d4 02 00 44     l.sw 68(r2),r0
    2500:       d4 02 00 48     l.sw 72(r2),r0
    2504:       d4 02 00 1c     l.sw 28(r2),r0

stffrdhrn commented 3 years ago

Find the test newlib binaries and vcd files attached.

The file *mtspr.sw* is case 1, 0x300c is the pc address where the issue happens The file *mtspr.mtspr* is case 2, 0x2fb4 is the pc address where the issue happens

stffrdhrn commented 3 years ago

I was able to fix case 2 with patch https://github.com/stffrdhrn/mor1kx/commit/8d7b044e0675817bae0546579e66cf894c372259

The test still fails due to case 1.

Case 2 - mtspr / mtspr succeeds

We can see itlb_trans_we goes high indicating a successful write Screenshot from 2021-06-25 02-23-57

Case 1 - store still fails

We can see that after the invalidate the cpu_req_i signal is not high during the WRITE phase causing the dcache to miss the write` Screenshot from 2021-06-27 21-07-18

stffrdhrn commented 3 years ago

To test this I am using fusesoc with iverilog a fusesoc.conf as:

[library.elf-loader]
location = /home/shorne/work/openrisc/local-cores/elf-loader
sync-uri = /home/shorne/work/openrisc/local-cores/elf-loader/
sync-type = local
auto-sync = true

[library.intgen]
location = fusesoc_libraries/intgen
sync-uri = https://github.com/stffrdhrn/intgen.git
sync-type = git
auto-sync = true

[library.mor1kx-generic]
location = /home/shorne/work/openrisc/local-cores/mor1kx-generic
sync-uri = /home/shorne/work/openrisc/local-cores/mor1kx-generic
sync-type = local
auto-sync = true

[library.mor1kx]
location = /home/shorne/work/openrisc/local-cores/mor1kx
sync-uri = /home/shorne/work/openrisc/local-cores/mor1kx
sync-type = local
auto-sync = true

Then running fusesoc: fusesoc run --target mor1kx_tb mor1kx-generic --elf_load /home/shorne/work/openrisc/or1k-tests/native/build/or1k/or1k-mmu --vcd

The or1k-mmu version is the same as or1k-mmu.fail.0x2fb4.mtspr.mtspr.elf.gz uploaded earlier.

stffrdhrn commented 3 years ago

The investigation so far:

 PC 0x2b10   store FFFF FFFE into the dcache at way location 0x7ed
 PC 0x2f1c   store 0041 6000 into the dcache at way location 0x7ed - this fails
 PC 0x2f68   load FFFF FFFE from way location 0x7ed - this is written to the IMMU and causes a failure

openrisc / mor1kx