Scanline counter in debugger updated too late

thrust26 commented 7 years ago

When you step through code, after writing to WSYNC, the counter updates after the next instruction.

BTW #1: Also Scan+1 points stops after the next instruction after WSYNC. (and even later in 4.7.3) IMO it should stop at CPU cycle 76.. (instruction ending after cycle 75). BTW: #2: Nice to have the length of the previous scanline displayed. When was that introduced? I missed that.

DirtyHairy commented 7 years ago

I just encountered the same effect and, after staring at the code and scratching my head in desparation, I found the reason. This was introduced some time ago when I fixed RDY behavior to match real hardware (#42). What happens on real hardware is that the write on the last cycle of the STA WSYNC pulls RDY low, but the CPU will actually on stop on the next read cycle --- and that's the first cycle of the next instruction. Stella now emulates this correctly, and what you see is the result: the jump to the next scanline only happens during the next instruction.

So, it's not a bug, but a feature and, although I don't like it, there is not much that can be done about it. A remark in the documentation about this is definitely indicated imho.

thrust26 commented 7 years ago

Ok, I see.

DirtyHairy commented 7 years ago

Actually, thinking about it in the morning, there is something we could do in order to hide this. After executing a block of instructions, we could just go ahead and stop the CPU if RDY is low --- we know that the next cycle is going to be a read cycle anyway.

However, I am not sure whether I like this any better: it would restore the old behavior, but it's a (albeit safe) hack and would obscure the way how hardware actually works. @thrust26 , @sa666666 , @ale-79, what do you think?

thrust26 commented 7 years ago

If I get you right, both versions are not 100% exact, because actually the CPU will stop in the middle of the next instruction (after the first read cycle). Correct?

Then, if we stop at the end of the instruction, it has been executed already. So the effect is displayed in the debugger already. IMO this is more wrong than stopping at the beginning.

DirtyHairy commented 7 years ago

If I get you right, both versions are not 100% exact, because actually the CPU will stop in the middle of the next instruction (after the first read cycle). Correct?

Not quite. If I interpret the manual correctly, an actual 6502 will stop at the beginning of a read cycle (if RDY is pulled low during phi1), before the actual bus access happens during phi2 (every cycle is composed of a phi1 and a phi2 half-cycle). At this point, the 6502 will relinquish the bus and stop. Once RDY is pulled high again, the CPU will resume operation in the next phi2 cycle and resume by reading from the bus.

This is pretty close to what the emulation is doing: after starting the cycle, but before incrementing the clock and the actual read, it checks whether RDY is low. In this case, the TIA is signalled, and the handler will subsequently fast forward the system clock to the start of the next line before the call returns and the actual read is executed (at this point, RDY is asserted again).

thrust26 commented 7 years ago

Sorry, but I don't get you (probably me). I don't understand why the first instruction after WSYNC would be executed in the debugger before it stops. If I get your explanation right, this does not happen on real hardware.

Maybe it makes more sense from a pure technical perspective (I am not very firm here), but for debugging it IMO makes no sense. If I press CTRL+L I just want to finish the current scanline and not see the effect of the first instruction of the next scanline already.

Anyway, maybe a change of definition helps, especially in code without WSYNC?

So that e.g. a STA RESP0 starting at cycle 75 should not be executed with CTRL+L. Except when the Scan cycle is already at 75 of course.

DirtyHairy commented 7 years ago

Maybe that's my bad, and I am not getting you and am trying to answer a different question :smirk: What I am trying to explain is the fact that, after stepping a STA WSYNC, the pixel clock will not wrap to zero but instead just increment by nine --- the effect of the WSYNC is only reflected by this value after the next instruction has executing (e.g. 6 after an immediate or implied opcode).

My idea was to restore the old behavior (STA WSYNC -> line clock zero) by concluding stepping with signalling the TIA if RDY is low --- this would be a hack, but technically okay as we know that each instruction starts with a ready cycle anyway. The next instruction would still be executed in the next step; just the "forward-clock-to-next-line" part would be moved at the end of the previous instruction. However, as I said, I am a bit dubious whether this is a good idea as the current behavior more closely matches what actual hardware does.

This only applied to stepping a WSYNC; stepping a whole line is unaffected as this is debugger logic, no RDY involved. Also, this has nothing to do with BTW 1 per se --- I haven't looked at this yet :smirk:

thrust26 commented 7 years ago

I really thought you were talking about BTW#1. After rereading with that in mind, I get your answers much better. :)

So its a matter of definition, isn't it? After WSYNC you are either at the end of the current scanline (Scan Cycle=76) or at the beginning of the next one (Scan Cycle = 0). That's how I see it from a debugger perspective.

As soon as Scan Cycle is showing the correct value after executing WSYNC (currently it just adds 3 cycles, the corrected value only shows after the next instruction), and the Scanline update fits(*) to this value, we should be fine.

(*) Scan Cycle = 0 -> Scanline++; or Scan Cycle = 76 -> Scanline

sa666666 commented 7 years ago

I will defer this decision to you guys (@DirtyHairy and @thrust26); whatever you decide is fine with me.

DirtyHairy commented 7 years ago

I really thought you were talking about BTW#1. After rereading with that in mind, I get your answers much better. :)

Hihi, communication can move in in strange ways.

So its a matter of definition, isn't it? After WSYNC you are either at the end of the current scanline (Scan Cycle=76) or at the beginning of the next one (Scan Cycle = 0). That's how I see it from a debugger perspective.

I would put it differently. After a STA WSYNC, nothing has changed, really, and the clock has just incremented by three cycles. It is only when the next instruction starts executing that the WSYNC actually does its thing and advances the clock to the beginning of the next line. So, for example, we have:

STA GRP0     ; scan cycle 15
STA WSYNC    ; scan cycle 18
LDA #0       ; scan cycle 2

DirtyHairy commented 7 years ago

Actually, thinking about it, this also relates to BTW#1: stepping the scanline will only stop when the pixel clock wraps, and this happens during the first cycle of the LDA in my example. As the debugger has a granularity of a single instruction, it stops after the LDA.

Considering this, I think I will just make the change I suggested. This will restore intuitive behavior, and if you interpret the debugger as "stopping after the first half-clock of the next opcode", it is even technically correct :smiling_imp:

thrust26 commented 7 years ago

OK, I get you (finally?). And while I now understand what exactly is going on, I don't like the consequences for the debugger. It's very irritating to see the next instruction advancing the clock that much (but only when it is preceded by WSYNC). I am pretty sure people will complain.

We got used to a different WSYNC behavior for ~20 years now. And that behavior (while technically not correct) is much easier to grab. Also I cannot see any benefit for the debugger from emulating WSYNC differently now. So I would very much prefer to stick to the old behavior.

DirtyHairy commented 7 years ago

The same conclusion, and almost the same second. I'll call that agreement :smirk:

thrust26 commented 7 years ago

/signed

ale-79 commented 7 years ago

In view of the explanation of how the hardware actually works, I like the new behaviour better, but I absolutely agree that it would cause complaints. Especially since the old behaviour matches better what is described in the "Stella Programmers Guide":

[...] Simply writting to WSYNC causes the microprocessor to halt until the electron beam reaches the right edge of the screen, then the microprocessor resumes operation at the beginning of the 68 color clocks for horizontal blanking. [..]

So I agree with the agreement.

ale-79 commented 7 years ago

It might be worth adding a note to the debugger documentation about this.

stella-emu / stella

Scanline counter in debugger updated too late #155