nepx / halfix

x86 PC emulator that runs both natively and in the browser, via WebAssembly
https://nepx.github.io/halfix-demo/
GNU General Public License v3.0
681 stars 87 forks source link

Windows 8 BSOD #1

Open nepx opened 4 years ago

nepx commented 4 years ago

Windows 8 doesn't boot. It BSODs (or rather GSODs -- gray screen of death) right after the Windows logo disappears.

Screenshot from 2020-01-01 17-57-54

It's because of a Kernel Data Inpage Error, which, according to this source indicates that Windows 8 couldn't read from the pagefile.

I think it's a hardware issue, specifically pertaining to how hard disk information is read/written. It's not a IDE DMA issue -- I created a Windows 8 disk image that uses the READ/WRITE MULTIPLE command, and it still BSODs this way.

Multi-sector reads/writes to the disk image haven't been extensively tested. I think there's a subtle bug where cross-block writes don't work properly. The only operating system that uses them is OS/2 (which works fine).

It's possible that the BSOD is due to a cross-block bug. Windows XP had trouble booting in the Emscripten version because single sector reads at the end of the block were corrupting the next one. OS/2 sometimes has trouble booting on my Android phone because of multi-block reads.

More investigation is required.

nepx commented 4 years ago

It definitely looks like a hard drive issue. During normal boot, VESA is switched off after a while, a cursor renders, and the Gray Screen of Death appears.

Setting MAX_MULTIPLE_SECTORS to zero, effectively disabling READ/WRITE MULTIPLE, keeps VESA on. The crash still happens, except that VESA isn't disabled.

Interestingly, despite forcing READ/WRITE MULTIPLE to fail, Windows 8 still uses the command.

The problem starts around here:

[I/O] writeb: port=0x01f6 data=0xa0
[IDE] Chose master drive on primary
[I/O] writeb: port=0x01f6 data=0xa0
[IDE] Chose master drive on primary
[I/O] writeb: port=0x01f1 data=0x00
[I/O] writeb: port=0x01f2 data=0x10
[IDE] -1
[I/O] writeb: port=0x01f3 data=0x00
[I/O] writeb: port=0x01f4 data=0x00
[I/O] writeb: port=0x01f5 data=0x00
[I/O] writeb: port=0x01f7 data=0xc6
[IDE] Command: SET MULTIPLE MODE (16)
[IDE] SET MULTIPLE MODE command failed <-- key line
[PIC] Raising IRQ 14
[APIC] Received bus message: vector=91 type=3 trigger=0
[APIC] Sending interrupt 91
[I/O] readb: port=0xc002 res=0x26
[I/O] writeb: port=0xc002 data=0x04
[APIC] EOI'ed: 91 Next highest: ffffffff
[I/O] writeb: port=0x01f6 data=0xb0
[IDE] Chose slave drive on primary
[I/O] writeb: port=0x01f6 data=0xb0
[IDE] Chose slave drive on primary
[I/O] writeb: port=0x01f1 data=0x03
[I/O] writeb: port=0x01f2 data=0x0b
[IDE] -1
[I/O] writeb: port=0x01f3 data=0x00
[I/O] writeb: port=0x01f4 data=0x00
[I/O] writeb: port=0x01f5 data=0x00
[I/O] writeb: port=0x01f7 data=0xef
[IDE] Command: SET FEATURES [idx=03]
[PIC] Raising IRQ 14
[APIC] Received bus message: vector=91 type=3 trigger=0
[APIC] Sending interrupt 91
[I/O] readb: port=0xc002 res=0x26
[I/O] writeb: port=0xc002 data=0x04
[APIC] EOI'ed: 91 Next highest: ffffffff
[I/O] writeb: port=0x0176 data=0xa0
[I/O] writeb: port=0x0376 data=0x06
[IDE] Device Control Register: 06
[I/O] writeb: port=0x0376 data=0x00
[IDE] Device Control Register: 00
[IDE] Reset controller id=1
[PIC] Raising IRQ 0
[I/O] readb: port=0x0177 res=0x00
[I/O] writeb: port=0x0176 data=0xa0
[I/O] readb: port=0x0177 res=0x00

After a few hundred page faults and #NM exceptions, the following READ MULTIPLE command is executed:

[I/O] writeb: port=0x01f6 data=0xa0
[IDE] Chose master drive on primary
[I/O] writeb: port=0x01f6 data=0xe0
[IDE] Chose master drive on primary
[I/O] writeb: port=0x01f1 data=0x00
[I/O] writeb: port=0x01f2 data=0x10
[IDE] -1
[I/O] writeb: port=0x01f3 data=0xa0
[I/O] writeb: port=0x01f4 data=0x28
[I/O] writeb: port=0x01f5 data=0xb7
[I/O] writeb: port=0x01f7 data=0xc4
[IDE] Command: READ MULTIPLE [w/o LBA]
[IDE] READ MULTIPLE failed
[PIC] Raising IRQ 14
[APIC] Received bus message: vector=91 type=3 trigger=0
[APIC] Sending interrupt 91
[I/O] readb: port=0xc002 res=0x26
[I/O] writeb: port=0xc002 data=0x04
[I/O] readb: port=0x01f1 res=0x04
[APIC] EOI'ed: 91 Next highest: ffffffff

Note that this command is never retried, despite the IDE controller aborting the command. All the other reads are done correctly.

A few failed disk reads later, it crashes with a BSOD (this time, the screen is blue), further indicating that the READ MULTIPLE command is probably the reason behind this crash.

BSOD

So far, I have determined the following:

nepx commented 4 years ago

Windows 8 doesn't seem to do any kind of error checking on multi-sector reads. It assumes the read succeeded, even if it returned garbage, transferred zero bytes, or did something else entirely.

I created a build of Bochs that disabled multi-sector reads completely and Windows 8 crashed with a similar BSOD, indicating that READ MULTIPLE is most likely at fault. Or maybe it's a problem with WRITE MULTIPLE. I tried installing Windows without multiple sector support, but it hung during installation.

Why do multi-sector reads work in OS/2 but not on Windows 8?