hoglet67 / PiTubeDirect

Bare-metal Raspberry Pi project that attaches to the Acorn TUBE interface and emulates many BBC Micro Co Processors
GNU General Public License v3.0
188 stars 23 forks source link

Acornsoft Chess crashes on higher difficulties #75

Closed glvas closed 4 years ago

glvas commented 4 years ago

M128 - Internal ReCo Mini second processor - External Pi Zero, latest Fer-De-Lance image, default core.

Nitpicking, but on my setup, Acornsoft Chess, when set to autoplay and difficulty levels above 5, will eventually (in 2-3 minutes) corrupt the screen and crash, or freeze. Ctrl-Break will sometimes reset and other times will result in screen corruption, with full powercycling needed to reset the machine. Internal ReCo Mini second processor has no issues with this scenario.

I tried using either Kjell's or Ken Lowe's on-port tube interface and result is the same. Also tried with a small heatsink on the raspberry pi processor with no effect.

Not a big issue by any stretch, but maybe an indication of something bigger.

hoglet67 commented 4 years ago

Thanks for submitting this issue.

Can I ask what speed you are running the ReCo 6502 Mini at (when using Chess)?

I'm able to reproduce it from the SSD image version included in the STH archive as disk 41:

*DIN 41
*CHESS
A
6
<return>

For the other developers, here are some screen captures: capture3

capture4

capture5

What ever is going on, it's causing the Pi to crash (with the debug kernel you no longer see reset events)

I'll do a bit more digging....

glvas commented 4 years ago

Hi, I am running it at its top speed of 16MHz.

hoglet67 commented 4 years ago

I've bisected this issue, and it appears to be introduced with this commit: 79a72a4b

The issue is the change to the PLP macro, used in PLP and RTI:

Old code:

.macro  PLP
        SPOP    r0
        mrs     r2, CPSR
        and     flags, flags, #0xffffff00
        orr     flags, flags, r0        // preserve the other bits in flags, e.g. the slowdown bit
        and     r1, r0, #N_FLAG6502
        bic     r2, #(N_FLAG + Z_FLAG + C_FLAG)
        orr     r2, r2, r1, lsl #24
        and     r1, r0, #Z_FLAG6502+C_FLAG6502
        bic     flags, flags, #N_FLAG6502+X_FLAG6502+B_FLAG6502+Z_FLAG6502+C_FLAG6502
        orr     r2, r2, r1, lsl #29
        //msr     CPSR_flg, r2
.endm

New code:

.macro  PLP
        SPOP    r1
        ldr     r0, =tube_irq           // prefetch Tube IRQ flags
        and     flags, flags, #0xffffff00
        orr     flags, flags, r1        // preserve the other bits in flags, e.g. the slowdown bit
        and     r2, r1, #N_FLAG6502
        and     r1, r1, #Z_FLAG6502+C_FLAG6502
        ldr     r0, [r0]
        mov     r2, R2, lsl #24
        bic     flags, flags, #N_FLAG6502+X_FLAG6502+B_FLAG6502+Z_FLAG6502+C_FLAG6502
        orr     r2, r2, r1, lsl #29
        //   Need to update the ARM flags later with msr     CPSR_flg, r2
.endm

In the old code, r2 was initialized from the CPSR, meaning that bits 0..7 (the ARM mode and interrupt bits) are included. Thus it's safe to restore the flags using msr CPSR, r2 or msr CPSR_flg, r2.

(restore here means copying the the 6502 N, Z and C flags into the ARM N, Z and C flags)

In the new code, r2 contains only the ARM N, Z and C flags; all other bits are zero. So (as the comment says) the flags must be restored using msr CPSR_flg, r2. If msr CPSR, r2 is used, the ARM mode and interrupt bits will be set to zero. A mode of 00000 is an illegal mode, so bad things will probably happen!

Unfortunately, there is one path in the code where msr CPSR, r2 is used:

.macro CHECK_IRQ
        tst     flags, #I_FLAG6502        // Test whether interrupts are enabled (bit 2 = 0)
        eoreq   r0, r0, #1
        tsteq   r0, #1                    // Test for IRQ
        beq     handle_irq_adjust_regPC
        msr     CPSR_flg, r2              // Restore the 6502 flags
.endm
...
handle_irq_adjust_regPC:
        sub     regPC, regPC ,#1
handle_irq:
        msr     CPSR, r2                // Restore the 6502 flags and re-enable ARM interrupts

Now, the PLP macro followed by the CHECK_IRQ macro is used by RTI and PLP.

So I think this bug occurs only if:

All other paths to handle_irq are preceeded by a mrs r2, CPSR so the mode bits are not clobbered.

It's hard to fix this in the PLP macro, as you soon exceed the 16 ARM instructions allows per 6502 instruction slot.

One possible fix is to add three instructions to handle_irq_adjust_regPC:

handle_irq_adjust_regPC:
        sub     regPC, regPC ,#1
        msr     CPSR_flg, r2            // Safely restore the 6502 flags and prepare to
        mrs     r2, CPSR                // re-enable ARM interrupts without affecting the MODE bits
        bic     r2,r2, #(CPSR_FIQ_INHIBIT | CPSR_IRQ_INHIBIT)

This will have a negligable impact on performance, and this code is only executed when an interrupt needs to be serviced following RTI or PLP.

Interestingly, I've also managed to reproduce this issue in the Dorman 6502 tests, by first enabling the display events:

*FX 14,4
*D6502

You have to hit R many times. but eventually the system will mess up and crash.

With the fix in place, this doesn't happen.

I'll push a version of the fix to gecko-dev after I have verified it a bit more.

hoglet67 commented 4 years ago

Here's a game now running to completion.

capture6