seL4 / seL4

The seL4 microkernel
https://sel4.systems
Other
4.74k stars 672 forks source link

Emulation with `KernelRiscvUseClintMtime=ON` fails for QEMU >= 8.1.0 #1258

Closed nspin closed 4 months ago

nspin commented 4 months ago

I've observed that emulating seL4 on RISCV{32,64} platforms (at least virt, spike, and hifive) using QEMU fails with QEMU >= 8.1.0. In particular, PMP rejects accesses to the CLINT MTIME register. OpenSBI configures the MTIMER regions without any SBI_DOMAIN_MEMREGION_SU_* permissions [1]. seL4 was able to access the MTIME register on older versions of QEMU, even with the same version of OpenSBI, up until this commit, which appears to fix a bug in PMP policy enforcement:

https://github.com/qemu/qemu/commit/a574b27af4fe340684ca9f20560a1b01905e4364

Is anyone who knows more about all of this able to shed light on how KernelRiscvUseClintMtime=ON was able to work on platforms which use OpenSBI PLATFORM=generic in the first place, given that PMP configuration code I'm seeing?

Is KernelRiscvUseClintMtime=ON suitable for all of the platforms it is currently set for?

Tagging @Ivan-Velickovic, who drove the PR adding KernelRiscvUseClintMtime (https://github.com/seL4/seL4/pull/1076).

[1] https://github.com/riscv-software-src/opensbi/blob/d962db280725b03a0340e05a07e4c85c93f35bc5/lib/utils/timer/aclint_mtimer.c#L225-L260

axel-h commented 4 months ago

I think it is a platform specific choice if access is allowed there - or if things are implemented this way at all behind the scenes of rdtime and the CSRs time/mtime. KernelRiscvUseClintMtime came in a a faster way to do things based on how many platforms technically implement things, copying what SiFive implemented. I think it should not be used unless one knows that a platform support this, and SBI allows this, and timer reads (or even timer compare updates) must be as fast as possible. Otherwise stick with the standard RISC-V methods and hope platforms implement it in a efficient way instead of having M-Mode trap and emulate it.

Ivan-Velickovic commented 4 months ago

@nspin more details here https://github.com/seL4/seL4/issues/1094.

I tried to resolve the general issue but didn’t come to a conclusion. It seems clear now though that it should probably be an opt-in option.

I will make a PR in the next couple days.

nspin commented 4 months ago

Do we know which platforms it works for?

By the way, this is the OpenSBI patch, between v0.9 and v1.0, which breaks it: https://github.com/riscv-software-src/opensbi/commit/8b569803475e130b85987b97fdd47c19e900bf82

Ivan-Velickovic commented 4 months ago

I think right now it’s set for all RISC-V platforms supported by seL4. They all have a CLINT with mtime.

The seL4 projects like sel4test pin a version of OpenSBI that does not restrict S-mode access to the CLINT which is why sel4test still works.

Any setup that uses an OpenSBI version with that patch would be affected.

nspin commented 4 months ago

Ah, yes, now I understand how this has been working at all. The manifests use OpenSBI 0.9 [1].

I tried to resolve the general issue but didn’t come to a conclusion. It seems clear now though that it should probably be an opt-in option.

I think this is a good idea. I think the default should be compatible with up-to-date versions of OpenSBI (that is, KernelRiscvUseClintMtime=OFF). Projects which intentionally use an old version of OpenSBI or patch it to enable this optimization could opt-in.

[1] e.g. https://github.com/seL4/sel4test-manifest/blob/master/common.xml#L39