joncampbell123 / dosbox-x

DOSBox-X fork of the DOSBox project
GNU General Public License v2.0
2.77k stars 381 forks source link

PC-98 Peret em Heru locks #1162

Closed meunierd closed 5 years ago

meunierd commented 5 years ago

Describe the bug Game locks consistently in the first few minutes.

To Reproduce Steps to reproduce the behavior:

  1. Start playing
  2. Tap through dialogue (<space>), it'll usually freeze before you gain control of your character but if not it's within the first couple minutes
  3. When it finally chokes this is what I see in the console:
...
LOG: A1 port attempt to write FONT ROM char 0x56
LOG: A1 port attempt to write FONT ROM char 0x56
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: PC-98 INT 18 AH=42h CH=0xC0
LOG: PC-98 INT 18 AH=42h CH=0xC0
LOG: 8251 warning: RX overrun
LOG: 8251 warning: RX overrun
LOG: 8251 warning: RX overrun
LOG: 8251 warning: RX overrun
LOG: 8251 warning: RX overrun
LOG: 8251 warning: RX overrun
LOG: 8251 warning: RX overrun
LOG: 8251 warning: RX overrun
LOG: 8251 warning: RX overrun
LOG: 8251 warning: RX overrun
LOG: 8251 warning: RX overrun

Environment (please complete the following information):

[sdl]

output: What video system to use for output.

Possible values: surface, overlay, opengl, openglnb, openglhq, ddraw, direct3d.

autolock: Mouse will automatically lock, if you click on the screen. (Press CTRL-F10 to unlock)

sensitivity: Mouse sensitivity.

mouse_emulation: When is mouse emulated ?

integration: when not locked

locked: when locked

always: every time

never: at no time

If disabled, the mouse position in DOSBox-X is exactly where the host OS reports it.

When using a high DPI mouse, the emulation of mouse movement can noticeably reduce the

sensitiveness of your device, i.e. the mouse is slower but more precise.

Possible values: integration, locked, always, never.

output=overlay autolock=true sensitivity=50 mouse_emulation=never

[dosbox]

title: Additional text to place in the title bar of the window

machine: The type of machine DOSBox tries to emulate.

Possible values: hercules, cga, cga_mono, cga_rgb, cga_composite, cga_composite2, tandy, pcjr, ega, vgaonly, svga_s3, svga_et3000, svga_et4000, svga_paradise, vesa_nolfb, vesa_oldvbe, amstrad, pc98, pc9801, pc9821, fm_towns, mcga, mda.

title=Peret em Heru - For the Prisoners machine=pc98 cascade interrupt ignore in service=true # does it with and without this option

[cpu] cycles=max

[render]

scaler: Scaler used to enlarge/enhance low resolution modes. If 'forced' is appended,

then the scaler will be used even if the result might not be desired.

Possible values: none, normal2x, normal3x, normal4x, normal5x, advmame2x, advmame3x, advinterp2x, advinterp3x, hq2x, hq3x, 2xsai, super2xsai, supereagle, tv2x, tv3x, rgb2x, rgb3x, scan2x, scan3x, hardware_none, hardware2x, hardware3x, hardware4x, hardware5x, xbrz, xbrz_bilinear.

scaler=normal2x forced

[dos]

xms: Enable XMS support.

ems: Enable EMS support. The default (=true) provides the best

compatibility but certain applications may run better with

other choices, or require EMS support to be disabled (=false)

to work at all.

Possible values: true, emsboard, emm386, false.

umb: Enable UMB support.

xms=false ems=false umb=false

[autoexec]

Lines in this section will be run at startup.

You can put your MOUNT lines here.

imgmount 2 "Peret em Heru - For the Prisoners.hdi" -t hdd -fs none BOOT -l c



**Additional context**
This is happening both with and without the fan translation. I can email you an HDI if you're having trouble tracking it down.
joncampbell123 commented 5 years ago

I'm having trouble tracking it down, the collection I have to test is only whatever was on the Internet Archive at the time I started PC-98 emulation.

joncampbell123 commented 5 years ago

Thank you. I see it run normally then suddenly jump into an undefined (FF) region of memory.

joncampbell123 commented 5 years ago

I found I can avoid the crash IF you set the cycle count to a value around 100000.

I can also avoid the crash if I imgmount the HDI image to drive a: and run the game directly instead of booting the HDI image, but there is no sound for some reason.

joncampbell123 commented 5 years ago

It's still playable with a high cycle count if you use dynamic core.

It seems to run without crashing at 70000 cycles.

joncampbell123 commented 5 years ago

Since the game is said to have come out in 1998 it's entirely possible it was only tested on 1998-era Pentium hardware. Does this game crash the same way on anything lower end, like a 486?

joncampbell123 commented 5 years ago

A log of the CPU execution shows that when it crashes, it gets very involved with the FM interrupt, then eventually IRETs to a random junk address.

joncampbell123 commented 5 years ago

If I start the opening sequence or game and then immediately use the debugger to mask IRQ 12 (FM interrupt) to prevent it from working, the game does not crash. Of course, FM music is "hung" as well but the game continues running anyway.

joncampbell123 commented 5 years ago

The game seems to require cputype=pentium and a minimum cycle count of 65000 to run with FM music enabled without crashing.

meunierd commented 5 years ago

For me with cputype=pentium and cycles=100000 it crashes later but still consistently crashes.

yksoft1 commented 5 years ago

This game is built with the Dante98: RPG Maker II. RPG Maker II was released in 1996, requires 386+ CPUs to run so it shouldn't need 100000+ cycles at all.

joncampbell123 commented 5 years ago

Since the issue seems to center around the FM music I am looking into the MUSIC.COM program right now.

Apparently the interface is through INT 48h

When you run the game by IMGMOUNTing the HDI file to A: and running it without booting, FM music does not work, but the game does not crash.

Apparently the reason FM music doesn't work in that way is that MUSIC.COM assumes it is installed if the segment value of INT 48h is anything other than 0060h. DOSBox-X vectors unknown INTs to handlers in the BIOS (segment F000h), so it doesn't load.

joncampbell123 commented 5 years ago

If I patch that check out of MUSIC.COM so that it loads anyway in that case, then the FM music works and the game eventually crashes.

joncampbell123 commented 5 years ago

The fault seems to be within the FM music driver. Something causes it to corrupt memory and crash the game. It seems to be CPU speed dependent.

joncampbell123 commented 5 years ago

MUSIC.COM's IRQ 12 ISR appears to switch stacks before processing the interrupt, and switch back before return. The crash appears to be a corner case where it does NOT switch the stack back after processing and IRETs back to the program, except the fault means that it IRETs back to a random address instead.

joncampbell123 commented 5 years ago

Actually it's a re-entrancy problem. The game is using vsync interrupt IRQ2 to call the driver per vsync. Most of the IRQ 12 FM interrupt handling is done without interrupts enabled, however there is one point in that ISR where interrupts are briefly enabled for one CPU cycle, like this:

STI
CLI

The game crashes if the vsync interrupt is handled during that brief window.

The reason it crashes is because of the stack switching performed by MUSIC.COM's FM interrupt handler. It's not re-entrant. That stack switching also happens through the callback called by the vsync interrupt. The saved stack address is overwritten and the wrong one restored.

joncampbell123 commented 5 years ago

You can fix the crashing by masking IRQ 2 from the debugger:

PIC MASKIRQ 2

This does mean though that if the game relies on counting vsync it will fail at some point.

EDIT: You have to do it after starting the game. IRQ 2 is normally masked at emulator/system startup, the game unmasks it.

joncampbell123 commented 5 years ago

I can test this later, but does anyone know if a 386/486/Pentium will service interrupts if the instruction pipeline contains STI and CLI right after each other?

joncampbell123 commented 5 years ago

According to a 486 laptop (Intel 486DX/50MHz), the CPU will not service interrupts at the "STI" given the sequence.

STI
CLI

Apparently the window for handling interrupts is too short for the CPU to service any pending interrupts.

Which explains how this reentrancy bug could go unnoticed on real hardware.

It happens in DOSBox-X however because the normal core is set up to service pending interrupts right away when processing STI.

joncampbell123 commented 5 years ago

By the way, I can easily imagine this game having problems if run in virtual 8086 mode under EMM386.EXE since STI/CLI in virtual 8086 mode trigger a General Protection Fault and the time it takes for EMM386.EXE to emulate STI/CLI for the VM is enough time for the CPU to process an interrupt, given this problem.

joncampbell123 commented 5 years ago

The latest commit fixes normal core not to immediately process STI so that STI + CLI does not cause interrupt processing.

Peret has been running for 5 minutes now without crashing and playing FM music.

joncampbell123 commented 5 years ago

Also, the tutorial mode has an interesting easter egg. See it? :)

guest os_000

sikthehedgehog commented 5 years ago

Maybe it's possible x86 always waits one more instruction after STI before acknowledging interrupts? I know there's a similar hack with MOV SS, <...> so you can change the stack safely (which can't be done in a single instruction, as you need to change both SS and SP), this doesn't seem directly related but I wouldn't be surprised and wouldn't be the only CPU to have a similar quirk.

I admit I don't know why anybody would do STI immediately followed by CLI, however.

sikthehedgehog commented 5 years ago

OK seems I'm right

https://www.felixcloutier.com/x86/sti

If IF = 0, maskable hardware interrupts remain inhibited on the instruction boundary following an execution of STI. (The delayed effect of this instruction is provided to allow interrupts to be enabled just before returning from a procedure or subroutine. For instance, if an STI instruction is followed by an RET instruction, the RET instruction is allowed to execute before external interrupts are recognized. No interrupts can be recognized if an execution of CLI immediately follow such an execution of STI.) The inhibition ends after delivery of another event (e.g., exception) or the execution of the next instruction.

http://faydoc.tripod.com/cpu/sti.htm

Set interrupt flag; external, maskable interrupts enabled at the end of the next instruction

joncampbell123 commented 5 years ago

@sikthehedgehog In the PC-98 MS-DOS world you'll find plenty of code doing weird things. Some things that are right up there with the weird things early IBM PC demoscene prods like to do.

Here's some off the top of my head:

meunierd commented 5 years ago

I can confirm the latest master works for me! Thanks for looking into this

meunierd commented 5 years ago

This seems similar to behaviour mentioned in the readme for Night Slave which only worked on older versions of dosbox-x, I'll see if this works (warning it's got h content)

joncampbell123 commented 5 years ago

It also seemed to fix lockups for one particular demoscene prod "saga by dust" that until now required the cs_equ_ds IRQ hack in order not to lock up in certain parts.

joncampbell123 commented 5 years ago

I'm glad you mentioned Night Slave, the CPU_Cycles = 4 hack seems to trigger the right conditions that Night Slave causes DOSBox-X to hang (normal core never exits the loop), so a new commit was added that emulates the STI+CLI condition by peeking ahead to the next opcode instead.

meunierd commented 5 years ago

On latest master I get this for Night Slave:

...
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: A1 port attempt to write FONT ROM char 0x57
LOG: PC-98 INT 18 AH=42h CH=0x80

This is right after Starting. Neat that it and Saga by Dust use the same starscape effect!

Can confirm that Peret em Heru still works though.

joncampbell123 commented 5 years ago

Night slave works on my end, it just takes awhile to get through the starfield effect and into the game. Starting a game should work, just as it does on my end.

Saga by Dust, if it hangs, tends to hang on the second part (with the scroller and 3D sphere objects) because some race condition eventually causes the Sound Blaster driver to try to operate the sound card with DS set to VGA memory (A000h), which of course is not valid.

meunierd commented 5 years ago

Maybe it's just the fan translation of Night Slave that's broken? You can find a pre patched hdi of it on the Retronomicon Games Patreon

On Sun, Jul 7, 2019, 2:12 PM Jonathan Campbell notifications@github.com wrote:

Night slave works on my end, it just takes awhile to get through the starfield effect and into the game. Starting a game should work, just as it does on my end.

Saga by Dust, if it hangs, tends to hang on the second part (with the scroller and 3D sphere objects) because some race condition eventually causes the Sound Blaster driver to try to operate the sound card with DS set to VGA memory (A000h), which of course is not valid.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/joncampbell123/dosbox-x/issues/1162?email_source=notifications&email_token=AAAJHWPZ4HROX2ZK5LVB5G3P6IWYLA5CNFSM4H5Z4P5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZLQM4Y#issuecomment-509019763, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAJHWMR3266YKWOAVMZKPLP6IWYLANCNFSM4H5Z4P5A .

joncampbell123 commented 5 years ago

@meunierd I'm saying Night Slave does work, no problems. It's not a fan translation, obviously.

guest os_000

meunierd commented 5 years ago

Are you using the HD or FD release from Neo Kobe? Would you mind sharing your conf? I still keep locking for some reason

On Sun, Jul 7, 2019 at 2:22 PM Jonathan Campbell notifications@github.com wrote:

@meunierd https://github.com/meunierd I'm saying Night Slave does work, no problems. It's not a fan translation, obviously.

[image: guest os_000] https://user-images.githubusercontent.com/6245486/60772318-7850f700-a0a9-11e9-8761-9d9666307d74.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/joncampbell123/dosbox-x/issues/1162?email_source=notifications&email_token=AAAJHWPVV7NUDU5DMKZR7NLP6IX6FA5CNFSM4H5Z4P5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZLQR2Q#issuecomment-509020394, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAJHWNVNQQMWFEXXP57F23P6IX6FANCNFSM4H5Z4P5A .

joncampbell123 commented 5 years ago

I am running the game from an HDI image (hard disk).

dosbox.conf.txt

EDIT: I am using normal core. Dynamic core has not been updated to obey the STI+CLI rule.

meunierd commented 5 years ago

That works thanks! Definitely something to do with my config

On Mon, Jul 8, 2019 at 3:21 PM Jonathan Campbell notifications@github.com wrote:

I am running the game from an HDI image (hard disk).

dosbox.conf.txt https://github.com/joncampbell123/dosbox-x/files/3369910/dosbox.conf.txt

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/joncampbell123/dosbox-x/issues/1162?email_source=notifications&email_token=AAAJHWOE4DMCWYIKJMXKOQLP6OHTNA5CNFSM4H5Z4P5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZOC2FY#issuecomment-509357335, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAJHWNOYWN6SDCZH5LAHGTP6OHTNANCNFSM4H5Z4P5A .

joncampbell123 commented 5 years ago

It may be the option to ignore cascade interrupt in service.

I'm not entirely sure how to approach it yet but PC-98 seems to use the 8259 slightly differently than IBM PC that allows the slave PIC to fire interrupts even if the ISR does not acknowledge the cascade interrupt. On the IBM PC failure to acknowledge the cascade interrupt would prevent IRQ 8-15 from working.