lgblgblgb / xemu

Emulations (running on Linux/Unix/Windows/macOS, utilizing SDL2) of some - mainly - 8 bit machines, including the Commodore LCD, Commodore 65, and the MEGA65 as well.
https://github.com/lgblgblgb/xemu/wiki
GNU General Public License v2.0
201 stars 31 forks source link

MEGA65: ROM v920391 sticks at initialization giving only blue screen #395

Closed lgblgblgb closed 5 months ago

lgblgblgb commented 8 months ago

Reported by @mteufel.

ROM v920390 works fine, however v920391 does not: it only gives blue screen. It seems, execution at around $0003:$561B. @dansanderson checked that too, but it worked for him. However it seems he used run-time ROM switching, while I used command line to specify the new ROM. So it seems, if older ROM could run first, it's OK to start the new ROM.

According to the address I saw, the execution sticks around this code fragment of the ROM:

              **********
5613          Get_TI_CIA:
              **********
              ;   Read timer A & B of CIA2 and convert to seconds

5613 20 cb 35     jsr start_timer ;make sure, that timer runs
5616 a9 ff        lda #$ff
5618 aa           tax
5619 a8           tay
561a 4b           taz

              ;Here we ensure TALO isn't about to underflow
              ;If it does while we're reading, we will get
              ;'tearing' results as we read the subsequent
              ;Timer registers. e.g. TAHI will have have
              ;decremented before we read it causing TI to
              ;run backwards for a tick.

561b 2c 04 dd 10$ bit CIA2_TALO   ;Check if bit 7 & 6 are clear
561e 70 02        bvs 20$         ; 6 is set, read timers
5620 10 f9        bpl 10$         ; 6 and 7 are clear; wait -- about to underflow

It is part of commit e65349574314ce9b411c8c8e42f3179cdc5f4042 in mega65-rom (from @johnwayner I believe) with the following description:

BASIC: SLEEP/TI: Don't hang on large SLEEP values (#58) Fixes: https://github.com/MEGA65/mega65-rom-public/issues/37

This commit is indeed done before v920391 but after v920390, so I guess this change causes the problem.

Of course I assume that the commit is correct otherwise (ie, it works without a problem on a real MEGA65), so this must be a CIA emulation problem on Xemu's side. Unfortunately I am not sure if I can fix this, since that's a design limitation currently that all emulation subsystem can do only scanline level precision, even CIA emulation decrements timers etc with a value corresponding the time needed for a scanline to be emulated. It's not possible to modify this without reworking the whole emulator.

Though, I am not entirely sure, why isn't this a problem when some start the new ROM later, and not at "power-on time".

johnwayner commented 8 months ago

Am I being dense? Was a version called 920392 created? I don't see a tag for it or any mention of it on Discord.

I am seeing the issue when I boot xemu with 920391 but not 920390. I use the menu ROM switching, but I don't have a default ROM set up -- so no "good" ROM to fix things up. If I use the menu to switch to 391 after booting 390, I get a READY prompt.

So I don't doubt that something in 391 is causing this hang. It's not obvious to me what in the start up is calling GET_TI or GET_TI_CIA, but it's totally possible.

It looks like the CIA2 timer isn't running (using the xemu matrix monitor) when booting straight to 391. Not sure why.

johnwayner commented 8 months ago

Using a local build off the master branch, I can confirm that the start_timer routine is getting called. The start bits (LSB on DD0E and DD0F) are set as well.

johnwayner commented 8 months ago

@lgblgblgb Not that this would cause this issue, but watching the values of the CIA2 timers in the xemu matrix monitor, $DD05 (Timer A, HI) seems to be going the faster than $DD04 (Timer A, LO) -- which is backward from what I would expect. Timer B seems ok. Surely this isn't broken, right? Maybe it's somehow an illusion caused by the monitor's refresh rate and the speed of the timer? Or I need more coffee?

lgblgblgb commented 8 months ago

@lgblgblgb Not that this would cause this issue, but watching the values of the CIA2 timers in the xemu matrix monitor, $DD05 (Timer A, HI) seems to be going the faster than $DD04 (Timer A, LO) -- which is backward from what I would expect. Timer B seems ok. Surely this isn't broken, right? Maybe it's somehow an illusion caused by the monitor's refresh rate and the speed of the timer? Or I need more coffee?

@johnwayner Hmm that can be what you've said, or the fact what I've mentioned before: CIA "tick" emulation can be only done once in a scanline, so I do so many ticks at once which corresponds to the time for a scanline. Though to avoid ill effects, if the timer overflows, I do the task it requires (ie maybe it generates an IRQ etc) but the remaining part of overflow is taken over for the next shot to avoid that only integer multiple of that "time quantum" works. However it can cause some interesting effect, that it seems to move backwards as it should be. For the monitor refresh rate, is once per frame only, thus there can be strange effects like "stroboscopic" view a spinning wheel which seems to rotate backwards.

For the version number: yes, indeed, sorry, I have "off by one" error here it seems :-O I've edited the title and the description now, sorry again for the confusion.

johnwayner commented 8 months ago

Here's what I know so far. Basic's hard_reset routine almost immediately calls init_storage. That routine uses rnd_zero which uses Get_TI_CIA which hangs because even though it calls start_timer, the initial values for the timer latches have not yet been set to 0xFFs -- because hard_reset does that later. If I move Clear_TI, which sets those latches, to be called before init_storage, we get to the READY prompt.

I haven't figured out what change caused this to be a problem -- although my change made it obvious by waiting forever. We were probably always seeding the random timer with a seed of 0. Also, I looked at the VHDL to try to understand why real hardware doesn't have this problem. The initial latch value for timer A 1 and B 0 which slightly differs from xemu's 0 and 0. But it's not clear to me how that would avoid this issue. If I boot straight to the monitor, the timer is at 1 0 0 0 and not running. It seems like it's up to basic to get it going.

So I can fix this in the ROM, but I'd really like to fully answer the outstanding questions first.

lgblgblgb commented 8 months ago

@johnwayner Thanks for the details. Of course if it works on mega65-core, then it should be Xemu's fault, but I love your details here, and helps me understanding the problem as well. Unfortunately it's very hard to emulate CIAs well inside Xemu because of the precision issue I've mentioned, so I always feel nervous about any issue involving CIA emulation to be honest

lgblgblgb commented 8 months ago

Just by setting TLAL and TLAH (Timer-LAtch-Low and Timer-LAtch-High) both 1 on Xemu CIA reset seems to fix the issue, btw. However this is much-much-much more a guess at this point than a proper fix of the core issue ..... TLAL itself alone was not enough.

johnwayner commented 8 months ago

That makes sense as the ROM is waiting for bit 6 or 7 to be set in TALO. I'll have time tomorrow to try to figure out how this is even working on real hardware and hopefully that will allow us to figure out how xemu is different.

lgblgblgb commented 5 months ago

I close this for now, it seems my workaround functions fine with newer ROMs.