RISC-V - Make updates to PTE accessed and dirty bits atomic

michaeljclark commented 6 years ago

The RISC-V Privileged Specification requires that updates to PTE accessed and dirty bits are atomic. There is a small race window in the current code between reading the PTE and updating it. The only reasonable way to make updates atomic is to use atomic_cmpxchg. To use platform atomics, we need to translate the guest physical address to a host address.

In the rare case that the atomic_cmpxchg of the PTE fails, it means the PTE is stale, and we must restart the walk. The PTE may longer be valid, the permissions may have changed or the PTE may even point to another physical page. The simplist approach to handling a conflict is to re-walk the page tables for the current translation. This is safe and fast as the likelihood the PTE is stale is rare.

Maliciously crafted supervisor code could attempt to limit forward progress by hammering the PTE with updates, but it is likely the page table walker will eventually win the race. Frequent atomic update failures may exhibit as a pathological side-effect of a specially crafted supervisor page-walker denial of service attack. In the uncontended common case, translation overhead is the same as stl_phys/stq_phys.

This code uses the QEMU address space translation API to lookup the memory region in the same manner as ldl_phys/ ldq_phys but elides operations if the PTEs are in IO space or non-writable RAM. This seems to be a reasonable behaviour for the typical case where PTEs are in RAM, and the atypical case where PTEs are in ROM or an IO device, where it is not possible for the updates to occur or be atomic.

michaeljclark commented 6 years ago

@sorear @palmer-dabbelt This is an alternative to PR https://github.com/riscv/riscv-qemu/pull/103

I've tested it. You could try something like this.

diff --git a/target/riscv/helper.c b/target/riscv/helper.c
index a607f6a..d4efff4 100644
--- a/target/riscv/helper.c
+++ b/target/riscv/helper.c
@@ -213,6 +213,7 @@ restart:
                     target_ulong old_pte =
                         atomic_cmpxchg(pte_pa, pte, updated_pte);
                     if (old_pte != pte) {
+                        printf("rare\n");
                         goto restart;
                     } else {
                         pte = updated_pte;

I don't think we'll see Linux hit the contended PTE case, nevertheless, I believe this is a correct fix. This change suppresses updates to PTEs in IO space or ROM. I might like to raise a clarification as to what should happen if a PTE is in ROM with the AD bits clear (the spec advises that the AD bits should be set by the supervisor to suppress updates, however, a supervisor could misconfigure PTEs in ROM or unusually point PTEs into IO space).

We use the public guest physical to memory region API (address_space_translate) and (memory_access_is_direct) so that we can perform a regular atomic_cmpxchg as defined in ./docs/devel/atomics.txt. This is the same codepath as is used in stl_phys/stq_physso the performance will be the same.

The previous code had some hacks (static variables or somesuch) to prevent PTEs outside RAM. This code is clean, as it detects the case where PTEs are not in RAM, and just elides the PTE AD bit updates.

It has no measurable impact on performance as the uncontended case is similar to the present, except that we use atomic_cmpxchg instead of a stl_phys/stq_phys. It takes a different approach to PR https://github.com/riscv/riscv-qemu/pull/103 and instead makes PTE updates atomic. In the unlikely contended case, the page walker restarts the walk, which I believe is the most sensible approach, after researching the topic and looking into how other paging implementations handle concurrency and contention. i.e. in the rare case of an invalid PTE, we just restart the page table walk.

Interested in your feedback. That said, it's likely rare. It would be nice to have a test for this...

michaeljclark commented 6 years ago

This does require 64-bit atomic_cmpxchg on 32-bit hosts. This should be okay on x86 assuming QEMU maps atomic_cmpxchg on long long to CMPXCHG8B. I will test boot RV64 Linux on i386 qemu (it may crash and burn there... it worked last time I tested RV64 on i386 so hopefully we haven't regressed that)...

michaeljclark commented 6 years ago

Okay i'll merge it. It also fixes i386 build by making mip uint32_t so we can close PR https://github.com/riscv/riscv-qemu/pull/106

Tested on i386, x86_64 Linux and x86_64 macOS

riscvarchive / riscv-qemu

RISC-V - Make updates to PTE accessed and dirty bits atomic #105