linux-nvme / nvme-cli

NVMe management command line interface.
https://nvmexpress.org
GNU General Public License v2.0
1.41k stars 637 forks source link

Mark pointers in 32bit MMIO as volatile #2392

Closed tachyonwill closed 1 week ago

tachyonwill commented 1 week ago

We observed a situation where the compiler combined the reads of two distinct calls to mmio_read32 into a single ldp instruction on ARM. This caused crashes on VMs where KVM could not support that instruction for MMIO. Recent refactors to stdout_ctrl_registers fixed the instance of the crash that we observed but we should fix the mmio helpers to prevent future occurences.

ikegami-t commented 1 week ago

If possible let me confirm below. Can you show the situation log observed or the code combined by the compiler? Also why the instance of the crash fixed by the refactoring?

tachyonwill commented 1 week ago

https://github.com/linux-nvme/nvme-cli/blob/v2.8/nvme-print-stdout.c#L1444 The reading of NSSR and CSTS were being combined into a single ldp instruction. The code is now a loop that makes it harder for the compiler to optimize but in theory it could still be optimized the same way as before. Ideally, there should be architecture specific assembly for MMIO so as to keep the compiler from picking unsupported instructions, but volatile is the next best thing.

oupton commented 1 week ago

Even if volatile defeats the merging of loads from MMIO, it doesn't necessarily guarantee that the compiler uses a 'safe' instruction. For example, pre- and post-incremented loads/stores (where the register containing the addr gets incremented) are entirely legal for volatile accesses, but do not work for MMIO under virtualization.

ikegami-t commented 1 week ago

Sorry still I could not understand the crash reason by the unsupport instruction. Is it possible to be just caused by the compiler bug and it was able to be just avoided the bug behavir itself by the volatile at that time but not complete or correct fix?

    csts = mmio_read32(bar + NVME_REG_CSTS);
    nssr = mmio_read32(bar + NVME_REG_NSSR);
oupton commented 1 week ago

Sent out #2397 as an example of how to fix this completely, no preference between @tachyonwill just adopting that in their PR or accepting mine.

This is not a compiler bug, the compiler is allowed to use any load/store instruction it chooses for the memory access. It is a bug in the CPU architecture + hypervisor (KVM) that necessitates a workaround anywhere that's doing MMIO. When KVM detects the guest has done an unsupported MMIO instruction, it'll either terminate the VM or send an external abort.

Using the volatile keyword has the effect of ruling out an entire class of 'bad' instructions (LDP/STP), but does not preclude the compiler from using pre- or post-incremented load/store instructions.

igaw commented 1 week ago

I've merged #2397. Closing this PR. Thanks anyway for your PR.