rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.83k stars 12.77k forks source link

Volatile reads and writes on aarch64 sometimes generate instructions not suitable for MMIO in protected VMs #131894

Open qwandor opened 1 month ago

qwandor commented 1 month ago

core::ptr::write_volatile and core::ptr::read_volatile are documented as being intended to act on I/O memory, i.e. for MMIO. These are indeed widely used by many crates providing drivers for MMIO devices across the ecosystem.

When running in a virtual machine, MMIO must be emulated by the hypervisor. This is done (on aarch64 at least) by having the MMIO region unmapped in the stage 2 page table, which results in a data abort to the hypervisor when the VM attempts to read or write the MMIO region. The hypervisor then decodes the exception syndrome register (esr_el2) and uses the fault address register (far_el2) to determine which MMIO address is being accessed and perform the appropriate operation in response.

Unfortunately, rustc sometimes compiles core::ptr::write_volatile on aarch64 to something like str w9, [x0], #4. We've seen this happen particularly since Rust 1.78, but it may be possible with earlier Rust versions too. The problem with this is that this post-addressing mode is performing register writeback (in this case, incrementing x0 by 4), and so doesn't set the exception syndrome register. This prevents the hypervisor from emulating the MMIO access, as it has no way of decoding the instruction syndrome or finding the faulting address.

In an unprotected VM (e.g. regular KVM), it is possible for the VMM to work around this by reading the guest VM's memory to find the relevant instruction, decoding the instruction manually, and finding the MMIO address that way. This has a performance overhead and adds extra complexity. In the case of a protected VM where the host doesn't have access to the guest VM's memory (e.g. protected KVM), this is not possible as the VMM is not able to read the guest VM's memory and so cannot do instruction decoding. There is thus no way to emulate these attempted MMIO accesses in a protected VM on aarch64.

The net result of this is that instructions which perform register writeback (e.g. post-increment addressing modes) are not suitable for MMIO in aarch64 VMs. This is arguably a flaw in the aarch64 architecture, but as that's not feasible to fix at this point it must be fixed in the compiler instead. rustc should therefore avoid generating such instructions for volatile_read and volatile_write calls.

The only alternative I can see to fixing this in rustc is for every crate which performs MMIO to use inline assembly rather than volatile_read / volatile_write, but that is not a very feasible or scalable solution.

Noratrieb commented 1 month ago

Does clang make this work correctly with C volatile? If yes, how does rustc's assembly and LLVM IR differ from clang?

workingjubilee commented 1 month ago

Unfortunately, rustc sometimes compiles core::ptr::write_volatile on aarch64 to something like str w9, [x0], #4

I suppose we sometimes have a reproducer then?

workingjubilee commented 1 month ago

The only alternative I can see to fixing this in rustc is for every crate which performs MMIO to use inline assembly rather than volatile_read / volatile_write, but that is not a very feasible or scalable solution.

I wouldn't be so quick to assume this is not feasible or preferable, because if you want a very specific instruction, this might in fact be the preferred solution.

The reason why is this ain't gonna lower to a single str in any addressing mode:

ptr.cast::<[u8; 1024]>().write_volatile(bytes);

It can't. That is why I am very interested in seeing what Rust source code actually causes this to be a problem.

maurer commented 1 month ago

From @Noratrieb

Does clang make this work correctly with C volatile? If yes, how does rustc's assembly and LLVM IR differ from clang?

We have observed this eissue with clang as well in another context - this started happening b/c LLVM got smarter, and clang and rustc both had failing VMs as a result. Our solution in both cases so far has been to put an arch-specialized method into the code which does explicit inline asm to address this (arch-specific header for C, a #[cfg]'d trait implementation for Rust).

From @workingjubilee

The reason why is this ain't gonna lower to a single str in any addressing mode:

The issue is not that it should lower to a single str, the problem is showing up in using str variants that have updates to additional registers - @qwandor 's example shows it doing a writeback to x0 after the store to update the pointer. While it's true that this is still following normal volatile semantics (the write is not being elided), the usual aarch64 VM MMIO implementation can't handle these (see the syndrome register docs linked earlier). This means that if someone tries to write an MMIO driver using write_volatile, it may mysteriously fail on ARM if it tries to generate a more efficient loop, because the hypervisor won't be able to read the fault address out of the syndrome register.

I see a few possible solutions to this:

  1. Make write_volatile only generate MMIO compatible operations. On aarch64, this means restricting writeback. There might be other similar restrictions on other architectures.
  2. Add a write_mmio method that is like write_volatile, but with the architecture-specific restrictions above. Given that it's for MMIO, it might even make sense to restrict it to word sizes for the relevant architecture, but that's not relevant to this specific issue.
  3. As option 2, but do it in a support crate rather than in stdlib and implement via inline asm on archs that need a special restriction.
  4. What we're doing now - each crate or mini-ecosystem that cares about this will encounter this and resolve it themselves when relevant.
workingjubilee commented 1 month ago

I mean, the Rust compiler could be the one emitting the inline asm when it monomorphizes a word-sized-or-less write_volatile instruction, for all it matters here.

Noratrieb commented 1 month ago

Given that clang has the same issue and rustc just compiles it to load volatile, this should be addressed on the LLVM side if possible.

workingjubilee commented 1 month ago

@qwandor @maurer Can this be reported as an LLVM bug, then, so we can find out if they want to fix it?

the8472 commented 1 month ago

If LLVM doesn't do it, is this pattern even blessed by anyone? Does GCC guarantee that volatile writes are virtualizable?

workingjubilee commented 1 month ago

Well, the C language's specification for volatile accesses sure doesn't say a single goddamn thing about MMIO, and even less about syndrome registers and hypervisors. I imagine some hardware-specific compilers might guarantee this, but in general C compilers tend to outright miscompile volatile, nevermind guarantee the access is virtualizable.

As for gcc, its C frontend actually specifically requires you to write asm volatile in some cases to even guarantee your asm appears.

The Linux Kernel advises against using volatile, even for MMIO, but advises using specific accessor functions... so, equivalent to some kind of Aarch64MmioPtr.

workingjubilee commented 1 month ago

Minimal repro: https://godbolt.org/z/crGo6xzev