rust-osdev / x86_64

Library to program x86_64 hardware.
https://docs.rs/x86_64
Apache License 2.0
797 stars 132 forks source link

AHCI IRQ is mishandled as a `#NP` exception #388

Closed kennystrawnmusic closed 2 years ago

kennystrawnmusic commented 2 years ago

Had a very long, intriguing, persisting problem with getting an AHCI driver modeled after Redox's (but differing greatly from it) properly set up: the instant the first disk is identified, a #NP fault occurs.

To say that this is strange is an understatement: that type of fault is extremely rare in the modern APIC/UEFI/paging world where segmentation is a thing of the past. Today it's usually the result of a missing IDT entry, so I thought "maybe I should use AML to properly map the interrupt handler to the correct IDT entry" but even doing exactly that was of no use. Same goes for disabling the 8259 before enabling the APIC — notice the exhaustive down-and-dirty port manipulation going on there to make sure the i8259 is disabled and off the hook.

This left me scratching my head for a while until I came across this Reddit thread from a year ago, which brought up a very important point in response to someone else running into this exact same hurdle: if an IDT isn't configured properly it can result in IRQs being mishandled as exceptions. AHCI is typically IRQ 11 in QEMU, and the #NP fault is Exception 11. Something definitely needs to be looked at, therefore, in the code to the IDT implementation provided by this crate to see how this IRQ and some others are routed, because getting such a fault immediately after the first disk is detected is definitely something that shouldn't be happening.

Freax13 commented 2 years ago

Try running qemu with -d int.

Also at https://github.com/kennystrawnmusic/cryptos/blob/4881369fc1d27c9df97ac2eab6fdcef14e8df4ed/src/apic_impl.rs#L118 you should use set_vector instead of set_dest.

kennystrawnmusic commented 2 years ago

Here's my -d int output redirected to a file: https://github.com/kennystrawnmusic/cryptos/blob/master/d-int.txt

And I just tried .set_vector — still nothing.

Freax13 commented 2 years ago

https://github.com/kennystrawnmusic/cryptos/blob/e81704bc3be33985635460a49757ca6ca82d313f/d-int.txt#L2893

The CPU threw an invalid opcode exception which you don't have an exception handler for.

kennystrawnmusic commented 2 years ago

Aha! So something in there is causing a #UD exception which I don't have a handler for, and the lack of a handler for that in turn is causing the #NP. Trying to add a handler for such to see if that makes a difference.

kennystrawnmusic commented 2 years ago

Now that I have a #UD handler I'm getting that exception instead — so closing this. Now it's time to figure out what's causing that exception.

kennystrawnmusic commented 2 years ago

Oops, didn't see your comment in which you beat me to that.