jsgroth / jgenesis

Sega Genesis / Sega CD / SNES / Master System / Game Gear emulator
MIT License
40 stars 3 forks source link

Errors with Krikzz's Mega-CD Verificator #105

Open benderscruffy opened 2 weeks ago

benderscruffy commented 2 weeks ago

mcd-verificator.zip

jsgroth commented 2 weeks ago

I used this when I was first working on Sega CD and it helped fix bugs in a bunch of games that depend on obscure behaviors, but I obviously didn't get all of the tests passing.

Current results with a disc in the drive: mcd-verificator

VAR tests error 02, IRQ test error 09, and REG 8030 error 07 are all timing errors that seem to indicate that the Genesis 68000 is running slightly too fast relative to the Sega CD hardware. All three of these pass if I change the SCD oscillator frequency from 50 MHz to 50.52 MHz but I don't think that's right.

Word RAM error 20 is caused by not emulating some of the word RAM access behaviors. Based on this test, if the sub CPU accesses word RAM while it's in 2M mode and owned by the main CPU, the sub CPU should halt until the main CPU returns word RAM. This is currently not emulated at all.

CDC flags error 22 is because of incorrect behavior when CDC DMA is set to transfer data to one of the CPUs directly via the CDC host data register. The CDC interrupt should fire when the final word is moved to the host data register, not when the CPU reads it.

CDC DMA2 error 04 looks like it's testing an obscure behavior where CDC DMA to PRG RAM will only ever transfer an even number of bytes, so if the DMA length is odd then it will transfer one less byte than requested.

Finally, CDC DMA3 error 04 is caused by the EDT bit in the CDC status register not being set correctly when one of the CPUs is reading words via the CDC host data register and there is only one word left to read. This is closely related to the CDC flags error.

The CDC DMA tests check lots of other things after where they're currently failing so there will probably be other issues after fixing the current errors.

jsgroth commented 2 weeks ago

Stalling the main CPU for 2 out of every 128 mclk cycles fixes the VAR tests but doesn't fix the IRQ test or the REG 8030 test. It causes them to fail in the other direction, where it looks like the main CPU is running very slightly too slow compared to the Sega CD hardware:

mcd-verificator

Expected value ranges are 224-226 for the IRQ test and 1286-1288 for the 8030 test. The IRQ test counts how many timer interrupts occur with interval=1 in the time that it takes the main CPU to generate 1024 software interrupts for the sub CPU, and the 8030 test counts how many iterations of a simple loop pass before a timer interrupt triggers with interval=255.

I suspect the refresh delay is slightly different compared to the standalone Genesis when the main CPU is executing out of Sega CD BIOS ROM, but this aspect of the hardware is not documented at all and is very hard to test. All three of these tests pass if I stall the main CPU for 2 out of every 172 mclk cycles instead of 2/128:

mcd-verificator

jsgroth commented 1 week ago

IRQ error 0E is caused by a timing/synchronization issue between the two CPUs. This test has the main CPU set the software interrupt flag 3 times in rapid succession and it only expects the sub CPU to execute its INT2 handler twice. Because of how execution between the two CPUs is interleaved right now, the sub CPU ends up handling all 3 interrupts which is wrong.

Part of the reason this is so tight is that the main CPU routine has this C code:

    ga->IFL2 = 1;
    ga->IFL2 = 1;
    ga->IFL2 = 1;

Which compiles to this:

move.l ($0196A4), a1
move.b #1, (a1)
move.l ($0196A4), a1
move.b #1, (a1)
move.l ($0196A4), a1
move.b #1, (a1)

Those MOVE.L instructions are very slow and give the sub CPU a lot of time to handle the interrupt and execute its very short INT2 routine, which is just this:

move.w #2, ($8026)
add.w #1, ($8028)
rte

Right now the sub CPU ends up acknowledging the second interrupt 7 CPU cycles before the main CPU sets the flag for the third time.

Based on this, when the 68000 handles an interrupt, it doesn't begin to acknowledge the interrupt until 10 cycles into its 44-cycle interrupt handling process. My 68000 implementation doesn't currently support any level of sub-instruction timing, but buffering and delaying the sub CPU interrupt acknowledge by 10 cycles fixes this test: mcd-verificator

jsgroth commented 1 week ago

The word RAM and CDC flags tests now pass: mcd-verificator

Word RAM 20 was fixed by halting the sub CPU if it accesses word RAM in 2M mode while it's owned by the main CPU. The sub CPU remains halted until the main CPU writes DMNA=1. This also fixes #101

Actual hardware halts the CPU mid-instruction, but that's very difficult to emulate efficiently with my 68000 implementation, so instead I halt it immediately after the instruction that accessed word RAM. If that instruction wrote to word RAM then the write is buffered until the sub CPU is unhalted. This isn't completely accurate but it's good enough to pass the test and to fix Marko's Magic Football.

There were a number of bugs revealed by the CDC flags tests:

jsgroth commented 1 week ago

All tests now pass: mcd-verificator

The various CDC DMA issues were:

I don't love emulating memory refresh delay as stalling the main CPU for 2 out of every 172 mclk cycles, but it shouldn't hurt anything since it's less of a delay than games are likely to encounter (and there was no delay emulation at all before).

benderscruffy commented 1 week ago

great job