mit-pdos / xv6-riscv

Xv6 for RISC-V
Other
6.63k stars 2.4k forks source link

fix instruction synchronization bug on a real RISC-V processor #67

Open sashimi-yzh opened 3 years ago

sashimi-yzh commented 3 years ago

@kaashoek Hi, I think the bug reported by #5 is still a problem. I am trying to explain in detail.

This bug is related to instruction cache (ICache) in hardware.

  1. Process A loads its code to a new physical page P0 in xv6-riscv/kernel/exec.c. We assume that P0 is not in the ICache before loading the code. Note that from the aspect of hardware, loading code is similar to an invocation of memcpy(). This means that loading code does not cause such code to enter ICache, since it only involves ordinary load and store instructions. For a processor with data cache (DCache), the code may be even located in DCache.
  2. A executes code. Since P0 is not in the ICache, this will cause ICache miss. ICache will get the correct code from DCache or from memory.
  3. A exits, and P0 is reclaimed.
  4. Now a new process B is loaded, and B get exactly the same physical page P0 to load its code. Remember that the loaded code is not in the ICache after loading.
  5. B executes code. Now disaster happens. When B is going to access ICache, it may get a hit since the code of A is still in the ICache! This causes B to execute wrong code!

The key to avoid such disaster is to update ICache every time new code is loaded. Such update can be performed in either hardware or software.

Therefore, the solution to the bug above is FENCE.I. The discussion about memory barrier at the end of section 9.3 in the text book is about FENCE. This discussion is still irrelevant to such bug, since memory barrier is talking about ordinary load instructions, but not instruction fetches.

We encounter this bug when we are trying to run xv6 on a simple in-order RISC-V processor designed by undergraduates. There are some buffers and caches in the processor. The table below shows the behavior of the three instructions discussed above (F = flush, K = keep). ICache DCache TLB BTB
SFENCE.VMA K K F F
FENCE K K K K
FENCE.I F K K F

Since TLB and BTB is indexed by virtual address, they should be flushed under SFENCE.VMA. ICache and DCache are simply implemented as PIPT, so they are not flushed under SFENCE.VMA. For FENCE.I, ICache and BTB will be flushed, since they are related to instruction. Note that the processor is in-order, the behavior required by FENCE is naturally satisfied. Therefore FENCE can be implemented as nop and does not flush anything. This processor can successfully boot Linux and Debian, but fails to run xv6 without applying the patch #5 to fix this bug.

The bug is not exposed in QEMU. This is because in QEMU, all buffers and caches related to instruction are virtually indexed. They will be flushed under SFENCE.VMA. xv6 will always execute SFENCE.VMA on context switch. At this time, the code cache (a key component to implement JIT) in QEMU will also be flushed. Compared to our simple RISC-V processor, the main difference is that there is a PIPT ICache in the processor, which is not affected by SFENCE.VMA and FENCE.

Welcome for further discussion. :)

BigBrotherJu commented 5 months ago

I'm trying to run xv6 on t-head c906 with VIPT icache and dcache. It seems that changes introduced in #5 are not enough to boot xv6. UART outputs init: star and just hangs.