Revisit the 'mystery constant' in M6532.cxx

DirtyHairy commented 7 years ago

According to commit history, the 'mystery constant' from Z26 was reintroduced to fix a hang in Mr. Roboto, while a simple comparision timer >= 0 is all that should be necessary.

  if(!(timer & 0x40000))
  {
    // Return at 'divide by TIMxT' interval rate
    return (timer >> myIntervalShift) & 0xff;
  }

While working on fixing the remaining bugs in the 6502.ts RIOT implementation, I have made an interesting observation: while I promptly got the hang after the intro screen in 6502.ts (it is fixed now), I cannot reproduce it in Stella by changing the code to timer >= 0 anymore!

Analysis suggests that RSYNC actually was the culprit and that our new RSYNC fixes the issues, while the old code above is more of a band aid to avoid the side effects :smirk: What happens is the following: after the title screen, Mr. Roboto writes to TIM64T, then goes through the canonical write-zero-to-everything-in-page-zero loop, supposedly from the original Bezerk code. In this init loop, RSYNC is anadvertedly hit and, if implemented correclty, fast-forwards the TIA playfield counter. Further down the lane, the code does a WSYNC, and the number of cycles that is skipped depends on the preceeding RSYNC. The hang is then caused by a loop that polls INTIM until it is zero. At this point, the counter will have wrapped and be counting down each cycle --- if timing is off, it will never read zero, and the game hangs :8ball:

The old implementation gets out of this situation by switching back to counting in TIM64T mode after bit 18 of timer flips again --- in this mode, the zero value is held for 64 cycles, which is long enough to escape the loop. With the new RSYNC simulation, timing works out in the right way, and the loop exits naturally.

@sa666666 , do you have any other examples of ROMs that failed with timer > 0? If my assumptions are correct, these should work now, too, and we can remove the cludge (which I strongly suspect differs from real hardware after bit 18 flips for the second time).

sa666666 commented 7 years ago

I'm sorry, but I don't remember any of the other ROMs that required the old code. If we want to test this, then I suggest making the change and having it part of pre5. Then if we get regressions we can test the change on the reported ROMs.

In any event, your analysis makes sense to me. I really like it when fixing a bug in one part of the system fixes a kludge somewhere else. It means that the emulation is coming together, and each part of the system is depending on the other parts correctly.

DirtyHairy commented 7 years ago

:smile: OK, I have remove the kludge in https://github.com/stella-emu/stella/commit/10245ea1023d4819f7a365d14ae2b8e7cf98865e --- let's see what comes of it...

sa666666 commented 7 years ago

This constant was present in two other places too; they are fixed in https://github.com/stella-emu/stella/commit/2a2797cbafe28825755df5aeb1ccbc672673502b.

sa666666 commented 7 years ago

Acid Drop fails because of this change, so I'm reopening this. Though it may be an issue in the ROM, not the change itself.

acid_drop.zip

I suggest to leave the change in for pre5, to see if we can find any other ROMs that fail.

sa666666 commented 7 years ago

Another one that's broken (after pressing Fire/Space):

panda_chase.zip

DirtyHairy commented 7 years ago

Very interesting. I can confirm that both run on real hardware, and at least Panda Chase (I took a closer look at that one) fails like Mr. Robot by going in and endless loop polling INTIM for zero while the counter wraps over and over again. I see three possiblities: an aspect of RIOT behavior that we do not know about yet, a timing glitch somewhere that causes the mismatch (one clock would suffice) or a write cycle to the timer that is not emulated (i.e. a side effect of a seemingly unrelated instruction).

For future reference analysis, some observations on Panda Chase:

The relevant timer write is a STA to TIM64T at address 0xF041
There is no RSYNC or WSYNC after the final write to TIM64T (before the hang)
I have attached a trace captured by hacking 6502.ts below. The first instruction is the final write to TIM64T, and the last instruction is the first INTIM of the infinite loop. The breakpoint at 0xF0A1 is the check on the fire button, so setting a breakpoint at 0xF0A5 allows to reproduce the same trace.

If nothing else helps, it might be worthwhile to hack the ROM to print the first INTIM value in the "loop of doom" :smirk: The fact that this loop is part of a common codepath is a complication, but creating an 8k ROM that switches to an instrumented version of the code in bank 0 somewhere around 0xF0A5 might do the trick.

    $F041:   STA $0296
    $F044:   LDA $0282
    $F047:   LSR
    $F048:   BCS $2D ; -> $F077
    $F077:   LDX #$00
    $F079:   LSR
    $F07A:   BCS $1F ; -> $F09B
    $F09B:   STX $DA
    $F09D:   LDA $E9
    $F09F:   BMI $3E ; -> $F0DF
(B) $F0A1:   LDA $3C
    $F0A3:   BMI $1A ; -> $F0BF
    $F0A5:   LDA $EB
    $F0A7:   BPL -$5B ; -> $F04E
    $F04E:   DEC $EB
    $F050:   LDA $C6
    $F052:   STA $EF
    $F054:   LDX #$9C
    $F056:   LDA #$00
    $F058:   STA $4E,X
    $F05A:   DEX
    $F05B:   BNE -$05 ; -> $F058

[loop cut out]

    $F05D:   LDA $EF
    $F05F:   STA $C6
    $F061:   JSR $F069
    $F069:   CLC
    $F06A:   ADC #$01
    $F06C:   CMP #$0A
    $F06E:   BCC $04 ; -> $F074
    $F074:   STA $AA
    $F076:   RTS
    $F064:   JSR $F83B
    $F83B:   LDX #$24
    $F83D:   LDA #$78
    $F83F:   STA $92
    $F841:   LDA #$48
    $F843:   STA $93
    $F845:   RTS
    $F067:   BPL $63 ; -> $F0CC
    $F0CC:   LDA #$00
    $F0CE:   STA $19
    $F0D0:   STA $1A
    $F0D2:   JSR $F424
    $F424:   LDX #$00
    $F426:   LDY $C4
    $F428:   LDA $FC18,Y
    $F42B:   STA $9E,X
    $F42D:   INY
    $F42E:   INY
    $F42F:   INY
    $F430:   INY
    $F431:   LDA $FC18,Y
    $F434:   STA $98,X
    $F436:   INY
    $F437:   INY
    $F438:   INY
    $F439:   INY
    $F43A:   LDA $FC18,Y
    $F43D:   STA $AC,X
    $F43F:   INX
    $F440:   CPX #$02
    $F442:   BEQ $08 ; -> $F44C
    $F444:   CLC
    $F445:   LDA #$0C
    $F447:   ADC $C4
    $F449:   TAY
    $F44A:   BNE -$24 ; -> $F428
    $F428:   LDA $FC18,Y
    $F42B:   STA $9E,X
    $F42D:   INY
    $F42E:   INY
    $F42F:   INY
    $F430:   INY
    $F431:   LDA $FC18,Y
    $F434:   STA $98,X
    $F436:   INY
    $F437:   INY
    $F438:   INY
    $F439:   INY
    $F43A:   LDA $FC18,Y
    $F43D:   STA $AC,X
    $F43F:   INX
    $F440:   CPX #$02
    $F442:   BEQ $08 ; -> $F44C
    $F44C:   RTS
    $F0D5:   LDX #$01
    $F0D7:   STX $0A
    $F0D9:   INX
    $F0DA:   STX $A7
    $F0DC:   JMP $F550
    $F550:   LDX #$0A
    $F552:   LDY #$00
    $F554:   LDA $E5
    $F556:   BPL $02 ; -> $F55A
    $F55A:   JSR $F560
    $F560:   LDA $00A8,Y
    $F563:   AND #$F0
    $F565:   LSR
    $F566:   ADC $FC32
    $F569:   STA $AE,X
    $F56B:   LDA #$FC
    $F56D:   STA $AF,X
    $F56F:   DEX
    $F570:   DEX
    $F571:   LDA $00A8,Y
    $F574:   AND #$0F
    $F576:   ASL
    $F577:   ASL
    $F578:   ASL
    $F579:   ADC $FC32
    $F57C:   STA $AE,X
    $F57E:   LDA #$FC
    $F580:   STA $AF,X
    $F582:   INY
    $F583:   DEX
    $F584:   DEX
    $F585:   BPL -$27 ; -> $F560
    $F560:   LDA $00A8,Y
    $F563:   AND #$F0
    $F565:   LSR
    $F566:   ADC $FC32
    $F569:   STA $AE,X
    $F56B:   LDA #$FC
    $F56D:   STA $AF,X
    $F56F:   DEX
    $F570:   DEX
    $F571:   LDA $00A8,Y
    $F574:   AND #$0F
    $F576:   ASL
    $F577:   ASL
    $F578:   ASL
    $F579:   ADC $FC32
    $F57C:   STA $AE,X
    $F57E:   LDA #$FC
    $F580:   STA $AF,X
    $F582:   INY
    $F583:   DEX
    $F584:   DEX
    $F585:   BPL -$27 ; -> $F560
    $F560:   LDA $00A8,Y
    $F563:   AND #$F0
    $F565:   LSR
    $F566:   ADC $FC32
    $F569:   STA $AE,X
    $F56B:   LDA #$FC
    $F56D:   STA $AF,X
    $F56F:   DEX
    $F570:   DEX
    $F571:   LDA $00A8,Y
    $F574:   AND #$0F
    $F576:   ASL
    $F577:   ASL
    $F578:   ASL
    $F579:   ADC $FC32
    $F57C:   STA $AE,X
    $F57E:   LDA #$FC
    $F580:   STA $AF,X
    $F582:   INY
    $F583:   DEX
    $F584:   DEX
    $F585:   BPL -$27 ; -> $F560
    $F587:   LDX #$08
    $F589:   LDY $FC3C
    $F58C:   LDA $B0,X
    $F58E:   CMP $FC32
    $F591:   BNE $06 ; -> $F599
    $F593:   STY $B0,X
    $F595:   DEX
    $F596:   DEX
    $F597:   BPL -$0D ; -> $F58C
    $F58C:   LDA $B0,X
    $F58E:   CMP $FC32
    $F591:   BNE $06 ; -> $F599
    $F593:   STY $B0,X
    $F595:   DEX
    $F596:   DEX
    $F597:   BPL -$0D ; -> $F58C
    $F58C:   LDA $B0,X
    $F58E:   CMP $FC32
    $F591:   BNE $06 ; -> $F599
    $F593:   STY $B0,X
    $F595:   DEX
    $F596:   DEX
    $F597:   BPL -$0D ; -> $F58C
    $F58C:   LDA $B0,X
    $F58E:   CMP $FC32
    $F591:   BNE $06 ; -> $F599
    $F593:   STY $B0,X
    $F595:   DEX
    $F596:   DEX
    $F597:   BPL -$0D ; -> $F58C
    $F58C:   LDA $B0,X
    $F58E:   CMP $FC32
    $F591:   BNE $06 ; -> $F599
    $F593:   STY $B0,X
    $F595:   DEX
    $F596:   DEX
    $F597:   BPL -$0D ; -> $F58C
    $F599:   RTS
    $F55D:   JMP $F59A
    $F59A:   LDA #$FF
    $F59C:   STA $2C
(B) $F59E:   LDX $0284

thrust26 commented 7 years ago

Acid Drop gets stuck in a timer loop, which includes WSYNC. So the timer is checked every 76 cycles. If an odd number is read, it can never read 0. Panda Chase is similar, here the loop is 12 cycles long.

Probably the timer is off by just 1 cycle.

thrust26 commented 7 years ago

In Stella 4.7.3, the timer switches back to 64T after a (large) number of time underflows. That's why the games work.

sa666666 commented 7 years ago

Yes, that's probably the effect of the "!(timer & 0x40000)" condition. But I'm thinking this isn't correct behaviour, which is why the code was changed to "timer >= 0".

thrust26 commented 7 years ago

There is a noticeable delay after pressing fire in Panda Chase before the picture is displayed again. Does this happen on real hardware too?

DirtyHairy commented 7 years ago

Does this happen on real hardware too?

It doesn't feel like it, the transition seems instant to me on real hardware, while the delay in Stella is evident. The switch back to 64T is definitiely an effect of the old condition ("!(timer & 0x40000)) and the reason why those ROMs work in old Stella after all. I, too, think that this is not correct behavior, but more of an "overflow with benefits" :smirk:

In Mr. Roboto, the decisive change is the RSYNC implementation that affects the phase between CPU and TIA and, via WSYNC, in turn affects the phase between CPU and timer. As for Panda Chase, there is no RSYNC and no WSYNC involved.

sa666666 commented 7 years ago

There are several variations of 'Panda Chase' that also exhibit the same issue. Not surprising, really, since they're basically the same ROM with perhaps only colours and/or logo changed.

thrust26 commented 7 years ago

I wrote a simple test program to understand what's really going on on real hardware. But as of now the results are more than confusing: After the (64T) timer has run down to 0, it continues to count down every 64 cycles. This is obviously completely different to the documented and well tested behavior. Since the test code works perfectly in Stella, there must be a weird bug in my test code that I yet have to hunt down and fix. Maybe I can identify some undocumented function that way.

thrust26 commented 7 years ago

Now I am officially lost.

Attached you find the result of my test program and the code itself (the timer check logic is in GameCalc subroutine). In the middle column of the output you can see the delta timer value at the top and the following 40 timer values (in intervals of 13 cycles). The left and right playfields show the end result of the loop counters (running from 255 to 0). The timer is initialized with 1 and then the code waits three scan lines to be sure there is an underflow. So in theory the timer should run in intervals of 1 cycle when the test code starts.

At the top you see the output of 4.7.3. The loop ends before the loop counters reach zero and the timer counts in intervals of 64 cycles (decreasing every 4-5 values). This is as expected. testtimer_4_7_3

Below the final output of 5.0 pre5. The loop ends when loop counters reach zero and the timer counts in intervals of 1 cycles (decreasing by 13 every value). This is as expected too. testtimer_5_0_pre5

The last picture shows the (faked by setting the initial timer to $ad instead of 1) output on my TV. The loop ends immediately, with a delta of 13 cycles and the timer counts in intervals of 64 cycles. This would mean, that the timer does NOT count in 1 cycle intervals after reaching zero. And that's where I am lost. Any ideas??? testtimer_2600

BTW: Adding up to 256 lines of WSYNC before checking the timer doesn't change the general behavior at all. The timer still counts in 64 cycle intervals.

TimerTest_001.zip

sa666666 commented 7 years ago

Should I be seeing the same output as you for these test runs? Because I'm not getting the same output.

thrust26 commented 7 years ago

Yes.

What do you get?

sa666666 commented 7 years ago

Second screen from real console (light-sixer):

Second screen from Stella 4.7.3: 2-473

Second screen from latest git (5.0.0-pre5 + last few commits): 2-latest_git

So at least compared to my system, neither Stella 4.7.3 nor the latest code is completely correct.

EDIT: When I revert the change "timer >=0" to "!(timer & 0x40000)" in the latest code, the results match 4.7.3 behaviour. So nothing else is changing the output; it is simply incorrect in 4.7.3 too.

sa666666 commented 7 years ago

Output from Stellarator is closest to the real thing (but the middle column has some pixels rearranged).

Output from z26 matches Stella 4.7.3.

Output from MESS matches Stella git.

I don't know what to say about this :disappointed:

thrust26 commented 7 years ago

Yea, there is a delay of about 2 seconds (don't ask me why) before the test is done. So ignore the first screen, please. I should have mentioned that, sorry.

For the second screens, there seems to be a small difference in the results (the code I posted seems not to be 100% identical with the one I used for my tests, but the general behavior is identical.

The question is: Why does the timer on the real console count in steps of 64 cycles?

sa666666 commented 7 years ago

From Stellerator: screenshot_20170310_173441

sa666666 commented 7 years ago

From Javatari: screenshot_20170310_173718

thrust26 commented 7 years ago

Bingo! That's like the real console!

sa666666 commented 7 years ago

Javatari matches the real console the closest, with only the ends of the last 4 lines in the middle column differing from my console.

I can't answer the question of why it's happening, but at least we can look at the source for Stellerator and Javatari to see what they do differently.

sa666666 commented 7 years ago

Acid Drop and Panda Chase also work correctly (and don't lock after pressing Fire) in Javatari, but still lock in Stellerator. So output-wise, Stellerator is very close, but Javatari is almost exact in both output and the ROMs actually working.

It would be nice to see some more output from real consoles. I only have one hooked up right now; @DirtyHairy, what does the ROM look like for you?

timer.bin.zip

thrust26 commented 7 years ago

The initial read value on the real console and in Javatari are both $E6. But the internal counter in the real console is ~32 cycles higher, so the 64 cycle timer is decreased 2-3 rows later.

thrust26 commented 7 years ago

The Javatari code in Pia.js looks pretty straightforward as far as I can tell. And the comment says: // If timer underflows, return to 1 cycle interval per specification

But somehow it emulates differently.

ale-79 commented 7 years ago

Look at the attached "KIM HINTS" doc for the Kim-1. The Kim-1 uses a 6530 ROM-RAM-I/O-TIMER (RRIOT), and I believe that the timer part works the same as the 6532 in the 2600. According to this document (page 4), if you read the timer from the "TIM64T" addresses ($296 and $29E in the case of the 2600) after it has counted past zero, the divide ratio is restored to its previously programmed value, the interrupt option is disabled and the timer keeps its current count.

kim1_hints.pdf EDIT Actually, there's no reason this shouldn't apply to the other "read timer" addresses ($284,$286,$28c,$28e,$294,$29c).

thrust26 commented 7 years ago

Good catch! That would be this code from Javatari:

    var readFromINTIM = function() {
        INSTAT &= 0xbf; // Resets bit 6 (Overflow since last INTIM read)
        // If fastDecrement was active (currentTimerInterval == 1), interval always returns to set value after read per specification
        if (currentTimerInterval === 1)
            timerCount = currentTimerInterval = lastSetTimerInterval;
    };

It starts making sense...

Maybe changing timerCount is wrong here.

DirtyHairy commented 7 years ago

Seems I am late to the party :smirk: This is a very interesting find, and it is definitely not part of the datasheets I have been looking at.

sa666666 commented 7 years ago

I've never heard of this before either. I used both the Rockwell R6532 and Commodore NMOS 6532 datasheets, and neither mention anything like this. Hmm, always something new to learn about the 2600 :smiley:

DirtyHairy commented 7 years ago

I have made an experiment and implemented this behavior into 6502.ts; check out the current build of stellerator. The improvement is obvious, though not quite there yet. Unsurprisingly, Panda Chase and Acid Drop work with this change :smirk:

fix1

The surplus lines from Javatari are there, and there is a gap missing in the first line, but there are five blocks in the lowest right column (as on real hardware, and opposed to Javatari). I have to go to bed now, but I'll revisit this later this weekend when we're back home and I can check my Jr.

thrust26 commented 7 years ago

Don't focus on the bottom value too much. The top value is slightly too high ($E7 instead of $E6). And values below are all slightly too low. That's why the bottom row is $9F instead of $A0.

BTW: Are you from Germany too?

DirtyHairy commented 7 years ago

OK, that makes sense. I'll guess I'll later be taking a look at the source to understand a bit better what I am actually seeing here 😊

Indeed I am - from Würzburg, to be precise 😉

thrust26 commented 7 years ago

After sorting out the problem, I simplified my test program. Now it uses TIM8T and simply reads the timer every 7 cycles. That way we can exactly identify if the internal counter is correct too.

Attached the (faked) output of my console and the test code with ROM. testtimer_2600_8 The internal timer of the latest Stellerator is slightly off. It is 4 cycles too low, maybe that is related to the timer divide ratio...? stellarator

TimerTest_002.zip

thrust26 commented 7 years ago

The same as above, but now using TIM64T (and more samples).

Real 2600: testtimer_2600_64 Here the internal timer of Stellerator is ~56 cycles too low: stellarator64 TimerTest_003.zip

thrust26 commented 7 years ago

BTW: I went through the 6532 documentation Stephen has mentioned above. With the new knowledge, you can find the information hidden there: "After the interrupt flag is set the internal clock begins counting down at the system clock rate...". And a few paragraphs later it says: "The interrupt flag will be reset whenever the Timer is accessed by a read or write."

😊

sa666666 commented 7 years ago

Hmm, there it is. I guess hindsight really is 20/20; it was there all along :smiley:

DirtyHairy commented 7 years ago

Woha, I have read those two lines, but I never would have deduced that :smile:

Actually, thanks to your tests, Thomas, I think I have nailed it. The current stellerator build seems to perfectly match all three tests to me (including my Jr.). I guess what is really going on in that chip is that, after wrapping and switching to increment-by-one, the configured divider still gets clocked and, after INTIM is read, resumes driving the counter in whatever state it is this point.

As the tests measure a beating pattern, it is hard to tell for sure the fix is true, but I am very confident that this actually is correct (the way to be sure would be to let the counter wrap, read INTIM and then measure the precise number of cycles it needs to wrap again). If we all agree, I'll port this behavior to Stella in the next few days.

thrust26 commented 7 years ago

I wrote a dead simple program to test what you describe in your last paragraph. I tested with 1T, 8T, 64T 1024T. For all four Stellerator works 100% like my console.

So your theory is spot on. And I think Javatari would work 100% correct too if it wouldn't assign the lastSetTimerInterval to timerCount.

Case closed! :)

DirtyHairy commented 7 years ago

05b10be reworks the RIOT implementation to match real hardware. @thrust26 , could you attach your test programs? The implementation was tricky, and it would be great to make sure that there is no +-1 bug lurking somwhere :smirk:

@sa666666 could you take a second look at the commit? The changes affect the debugger, and I am not completely sure that I have preserved the semantics of the "total clocks" and "INTIM clocks" watches.

thrust26 commented 7 years ago

Attached TimerTest_004.zip

sa666666 commented 7 years ago

I won't be able to look at this until this evening, or possibly even tomorrow morning. However, on first glance it concerns me that some of the const accessor methods are changed to non-const. The problem is that the accessor methods for intim() and timint() can be called multiple times from the debugger (ie, when a frame is redrawn, etc), but they should not change the internal state of the RIOT class. Removing the const seems to imply that state is changed.

But like I said, I haven't had a chance to properly review this yet ...

DirtyHairy commented 7 years ago

@sa666666 I removed the const because they do change state --- they fast-forward the RIOT state to reflect the current emulation time in terms of system clocks :smirk: Before my change, the internal state of the RIOT that relates to those accessors did not change with time (the change was calculated on the fly and discarded).

However, the methods are still idempotent and do not execute any side effects that would be triggered by reading from the bus, so calling them multiple times from the debugger isn't going to cause any undesired side effects.

sa666666 commented 7 years ago

OK, I did find some time to look at this after all. Everything looks fine, and the read-only methods are still read-only, even if not marked as const. I've done all the testing I can, and everything is working fine.

I updated the bracing in the new code to match what was in the rest of the M6532 class (sorry, just OCD kicking in I guess), and also updated the changelog. So unless anyone else has issues, I think this can be closed.

DirtyHairy commented 7 years ago

No worries, happens to my code at work all the time --- adhering to coding style is not my strong side apparently :smile: Thomas' testcases work fine with my changes, closing...

stella-emu / stella

Revisit the 'mystery constant' in M6532.cxx #97