@kaashoek Hi, I think the bug reported by #5 is still a problem. I am trying to explain in detail.
This bug is related to instruction cache (ICache) in hardware.
Process A loads its code to a new physical page P0 in xv6-riscv/kernel/exec.c. We assume that P0 is not in the ICache before loading the code. Note that from the aspect of hardware, loading code is similar to an invocation of memcpy(). This means that loading code does not cause such code to enter ICache, since it only involves ordinary load and store instructions. For a processor with data cache (DCache), the code may be even located in DCache.
A executes code. Since P0 is not in the ICache, this will cause ICache miss. ICache will get the correct code from DCache or from memory.
A exits, and P0 is reclaimed.
Now a new process B is loaded, and B get exactly the same physical page P0 to load its code. Remember that the loaded code is not in the ICache after loading.
B executes code. Now disaster happens. When B is going to access ICache, it may get a hit since the code of A is still in the ICache! This causes B to execute wrong code!
The key to avoid such disaster is to update ICache every time new code is loaded. Such update can be performed in either hardware or software.
For the hardware method, hardware will examine the address of every store instructions to see whether ICache is also holding the same cache block. If it is the case, ICache will invalidate such block it holds. This guarantees that ICache will not get a hit next time it access the same address. x86 is such case.
For the software method, we should execute some special instructions to explicitly update the state of ICache. This operation is also called synchronization according to the RISC-V manual. In RISC-V, there are three special instructions belong to this type.
SFENCE.VMA. This instruction is used to update the state of hardware components related to virtual memory, such as TLB. Note that SFENCE.VMA is NOT guaranteed to update ICache. It depends on the hardware implementation.
If the ICache is implemented as PIPT (physically indexed, physically tagged), it is nothing to do with virtual memory and unnecessary to be updated under SFENCE.VMA.
Even though the ICache is implemented as VIPT (virtually indexed, physically tagged), it is still unnecessary to update ICache under SFENCE.VMA. This is because the index field of an address keeps the same after address translation.
But if the ICache is implemented as VIVT (virtually indexed, virtually tagged), it is necessary to update ICache under SFENCE.VMA. This is because the tag is from virtual address, and the mapping to the physical address may be changed after SFENCE.VMA.
FENCE. This instruction is used to synchronize the visibility of store instructions before the FENCE itself. Every load instructions on other CPUs after FENCE should see the result of store instructions before FENCE. Note that FENCE is defined to only guarantee the visibility to load instructions. FENCE does NOT guarantee the visibility to instruction fetches. Therefore, executing FENCE will not guarantee to update ICache.
FENCE.I (different from FENCE). According to the RISC-V manual,
Currently, this instruction is the only standard mechanism to ensure that stores visible to a hart will also be visible to its instruction fetches.
Therefore, the solution to the bug above is FENCE.I. The discussion about memory barrier at the end of section 9.3 in the text book is about FENCE. This discussion is still irrelevant to such bug, since memory barrier is talking about ordinary load instructions, but not instruction fetches.
We encounter this bug when we are trying to run xv6 on a simple in-order RISC-V processor designed by undergraduates. There are some buffers and caches in the processor. The table below shows the behavior of the three instructions discussed above (F = flush, K = keep).
ICache
DCache
TLB
BTB
SFENCE.VMA
K
K
F
F
FENCE
K
K
K
K
FENCE.I
F
K
K
F
Since TLB and BTB is indexed by virtual address, they should be flushed under SFENCE.VMA. ICache and DCache are simply implemented as PIPT, so they are not flushed under SFENCE.VMA. For FENCE.I, ICache and BTB will be flushed, since they are related to instruction. Note that the processor is in-order, the behavior required by FENCE is naturally satisfied. Therefore FENCE can be implemented as nop and does not flush anything. This processor can successfully boot Linux and Debian, but fails to run xv6 without applying the patch #5 to fix this bug.
The bug is not exposed in QEMU. This is because in QEMU, all buffers and caches related to instruction are virtually indexed. They will be flushed under SFENCE.VMA. xv6 will always execute SFENCE.VMA on context switch. At this time, the code cache (a key component to implement JIT) in QEMU will also be flushed. Compared to our simple RISC-V processor, the main difference is that there is a PIPT ICache in the processor, which is not affected by SFENCE.VMA and FENCE.
I'm trying to run xv6 on t-head c906 with VIPT icache and dcache. It seems that changes introduced in #5 are not enough to boot xv6. UART outputs init: star and just hangs.
@kaashoek Hi, I think the bug reported by #5 is still a problem. I am trying to explain in detail.
This bug is related to instruction cache (ICache) in hardware.
A
loads its code to a new physical pageP0
inxv6-riscv/kernel/exec.c
. We assume thatP0
is not in the ICache before loading the code. Note that from the aspect of hardware, loading code is similar to an invocation ofmemcpy()
. This means that loading code does not cause such code to enter ICache, since it only involves ordinary load and store instructions. For a processor with data cache (DCache), the code may be even located in DCache.A
executes code. SinceP0
is not in the ICache, this will cause ICache miss. ICache will get the correct code from DCache or from memory.A
exits, andP0
is reclaimed.B
is loaded, andB
get exactly the same physical pageP0
to load its code. Remember that the loaded code is not in the ICache after loading.B
executes code. Now disaster happens. WhenB
is going to access ICache, it may get a hit since the code ofA
is still in the ICache! This causesB
to execute wrong code!The key to avoid such disaster is to update ICache every time new code is loaded. Such update can be performed in either hardware or software.
synchronization
according to the RISC-V manual. In RISC-V, there are three special instructions belong to this type.SFENCE.VMA
. This instruction is used to update the state of hardware components related to virtual memory, such as TLB. Note thatSFENCE.VMA
is NOT guaranteed to update ICache. It depends on the hardware implementation.SFENCE.VMA
.SFENCE.VMA
. This is because the index field of an address keeps the same after address translation.SFENCE.VMA
. This is because the tag is from virtual address, and the mapping to the physical address may be changed afterSFENCE.VMA
.FENCE
. This instruction is used to synchronize the visibility of store instructions before theFENCE
itself. Every load instructions on other CPUs afterFENCE
should see the result of store instructions beforeFENCE
. Note thatFENCE
is defined to only guarantee the visibility to load instructions.FENCE
does NOT guarantee the visibility to instruction fetches. Therefore, executingFENCE
will not guarantee to update ICache.FENCE.I
(different fromFENCE
). According to the RISC-V manual,Therefore, the solution to the bug above is
FENCE.I
. The discussion about memory barrier at the end of section 9.3 in the text book is aboutFENCE
. This discussion is still irrelevant to such bug, since memory barrier is talking about ordinary load instructions, but not instruction fetches.F
= flush,K
= keep).Since TLB and BTB is indexed by virtual address, they should be flushed under
SFENCE.VMA
. ICache and DCache are simply implemented as PIPT, so they are not flushed underSFENCE.VMA
. ForFENCE.I
, ICache and BTB will be flushed, since they are related to instruction. Note that the processor is in-order, the behavior required byFENCE
is naturally satisfied. ThereforeFENCE
can be implemented asnop
and does not flush anything. This processor can successfully boot Linux and Debian, but fails to run xv6 without applying the patch #5 to fix this bug.The bug is not exposed in QEMU. This is because in QEMU, all buffers and caches related to instruction are virtually indexed. They will be flushed under
SFENCE.VMA
. xv6 will always executeSFENCE.VMA
on context switch. At this time, the code cache (a key component to implement JIT) in QEMU will also be flushed. Compared to our simple RISC-V processor, the main difference is that there is a PIPT ICache in the processor, which is not affected bySFENCE.VMA
andFENCE
.Welcome for further discussion. :)