rust-osdev / bootloader

An experimental pure-Rust x86 bootloader
Apache License 2.0
1.31k stars 204 forks source link

Debug why `Rsdp::search_for_on_bios` is optimized to `ud2` #425

Open phil-opp opened 4 months ago

phil-opp commented 4 months ago

See https://github.com/rust-osdev/bootloader/pull/420

We worked around the issue for now by preventing inlining of the map_physical_region function, but we should figure out the root cause of this.

Freax13 commented 4 months ago

This only occurs when using rsdp 2.0.0 and seems to be fixed by rsdp 2.0.1. I've tracked the fix down to this commit: https://github.com/rust-osdev/acpi/commit/a0db984096d49a499117760f913ea86ce5e9b604, but I'm not entirely sure why this fixes the bug.

At first glance, I thought this might be because the added iterator complexity slows down the inlining a bit: search_for_on_bios doesn't get fully inlined and its implementation is split into two parts, one setting up the addresses and one doing the work; perhaps the lack of inlining was preventing LLVM from seeing UB. To work around that, I removed the .step(16) call, which resulted in all functions being fully inlined. This didn't break the code though, it wasn't optimized down to ud2.

AFAICT https://github.com/rust-osdev/acpi/commit/a0db984096d49a499117760f913ea86ce5e9b604 doesn't fix anything that LLVM should care about. It prevents some out-of-bounds accesses, but for memory that's not managed by LLVM, so LLVM shouldn't be able to optimize out the accesses. Even if LLVM felt that optimizing out the later out-of-bounds accesses was legal, the rsdp could still be in the first address slot and LLVM should never allow that to be optimized out.

In conclusion, even though https://github.com/rust-osdev/acpi/commit/a0db984096d49a499117760f913ea86ce5e9b604 fixes this bug, I'm not yet convinced that this didn't just happen by accident.

phil-opp commented 3 months ago

We should retry to reproduce this with the latest nightlies. There was another LLVM update in https://github.com/rust-lang/rust/pull/121395, which fixed some miscompilations.

Freax13 commented 3 months ago

I can still reproduce this on 1.78.0-nightly (tested with 090f0309fb41c2ff19b1a79b5567cecc3ca084df).

toothbrush7777777 commented 2 months ago

Shouldn't this raw pointer dereference in rdsp::search_for_on_bios be a volatile read, so the compiler doesn't optimise it out?

https://github.com/rust-osdev/acpi/blob/35f9fba0aed2b00cfd16d0560a991c705666f704/rsdp/src/lib.rs#L106

Freax13 commented 2 months ago

Can you elaborate on why the compiler would be allowed to optimize out the non-volatile read?

toothbrush7777777 commented 2 months ago

Can you elaborate on why the compiler would be allowed to optimize out the non-volatile read?

Yes, I'm not sure that constructing a slice using slice::from_raw_parts and then reading from a raw pointer constructed from the slice's address with a non-volatile read without side-effects guarantees that the optimiser will keep that read.

Optimising compilers, and particularly LLVM, try to reduce the number of unnecessary memory accesses to improve performance. They will often remove memory accesses that they consider to have no side-effects. Using read_volatile() guarantees that the optimiser will not remove the memory access, even if it considers that there is no side-effect to reading that memory.

Freax13 commented 2 months ago

I'd argue that side effects from the reads are irrelevant here. What matters are the values that are read. Unless LLVM can somehow statically determine the value at the addresses without doing the reads, I don't see why it would be allowed to optimize out the reads. AFAICT the memory accesses are necessary here because the values are not known until runtime.

toothbrush7777777 commented 2 months ago

True, but LLVM 18 appears to be (mis)optimising this function away, including the memory accesses. I was wondering if that would fix it.