Open mingyuan-xia opened 2 months ago
Note: fence.i
is kind of equivalent to ARM's isb
. ARM's toIR emits an Imbe_Fence node upon isb
so I did the same for fence.i
.
ref: https://github.com/petrpavlu/valgrind-riscv64/blob/riscv64-linux/VEX/priv/guest_arm_toIR.c#L16012
Yet, I am not totally sure, as fence.i
is stronger than Imbe_Fence's semantic specified in VEX IR.
Additionally, a quick search shows that the unprivileged fence.i
is not common in userspace programs, even for a JIT like OpenJDK, see: https://github.com/openjdk/jdk/pull/9770
In the ARM backend, the Imbe_Fence
node corresponds to three barrier instructions, including an ISB
instruction to ensure instruction synchronization, similar to what fence.i
did:
This means that the ARM backend has strengthened the semantics of the Imbe_Fence
node.
However, in the RISC-V backend, the Imbe_Fence
node corresponds to a single fence
instruction for all types of device I/O and memory accesses.
We might consider doing something similar, or perhaps introduce a new Imbe_Fence_I?
node with correct semantics.
Additionally, a quick search shows that the unprivileged
fence.i
is not common in userspace programs, even for a JIT like OpenJDK, see: openjdk/jdk#9770
Right, I wouldn't expect to see FENCE.I used in userspace applications. The RISC-V Instruction Set Manual Volume I: Unprivileged Architecture has a note on it:
Because FENCE.I only orders stores with a hart’s own instruction fetches, application code should only rely upon FENCE.I if the application thread will not be migrated to a different hart. The EEI can provide mechanisms for efficient multiprocessor instruction-stream synchronization.
In the ARM backend, the
Imbe_Fence
node corresponds to three barrier instructions, including anISB
instruction to ensure instruction synchronization, similar to whatfence.i
did: [...] We might consider doing something similar, or perhaps introduce a newImbe_Fence_I?
node with correct semantics.
I think that from the perspective of the VCPU that Valgrind implements, FENCE.I would effectively need to result in discarding all translations. Something as Ijk_InvalICache
with guest_CMSTART
and guest_CMLEN
covering the whole address range.
SMC (self-modifying code, somewhat seen in malware/injected code/thunk) is one more use case of fence.i
, which changes the code near PC. But again, I am not convinced to support such cases with great efforts (changing backend/incurring great runtime overhead for other cases).
Based on the very rare use cases listed, I would recommend emitting an Imbe_Fence
with a warning. Consider to run instead of crash on some random fence.i
-s emitted by an old compiler/JIT, like the OpenJDK above-mentioned. After all, fence.i
is a standardized instruction allowable in userspace. Would like to hear your opinion on this @petrpavlu.
I'm somewhat in favor of implementing this instruction properly if it is added. Turning it into Ijk_InvalICache
should be fairly simple, without a need for any backend changes.
Userspace programs shouldn't really need to use this instruction, but instead need to invoke the riscv_flush_icache()
syscall. IMO the instruction is then not worth optimizing for and I wouldn't worry about the penalty incurred by Ijk_InvalICache
.
19 partially
sorry for the inconsistent commit log, should be:
riscv64: Support fence.i
The project-level README is left unmodified.