stella-emu / stella

A multi-platform Atari 2600 Emulator
https://stella-emu.github.io
GNU General Public License v2.0
627 stars 112 forks source link

Stargunner (1983) (Telesys) won't boot #298

Closed acepow3r closed 6 years ago

acepow3r commented 6 years ago

Whatever the title says, Stargunner ROM (CRC32:FCA9C17B), stays at black screen. with Stella 5.1 win64. Maybe It has something to do with the settings, perhaps the autodetect ROM type? I tried to tweak some of them but didn't work...

ale-79 commented 6 years ago

Weird. It works if you enter the debugger and issue the "reset" command in the prompt.

sa666666 commented 6 years ago

Ahh, pounced before I got a chance to respond; I was about to write the same thing. :smiley:

Also if you start from the commandline and set the '-tv' property to something other than 'color' (even something that is invalid??). Indicates a different path is being taken when the TV property is set, and could also indicate that we should generate better error messages.

ale-79 commented 6 years ago

The reset vector is $0 in this rom! The interrupt vector is $f0d8, which should be the start address.

sa666666 commented 6 years ago

Yes, I noticed the reset vector being off. Question is, why did the ROM work before? And does it even work on real hardware??

Seems to be at least partly related to D3 of Switches (aka, the color/bw bit of SWCHB). If this bit is disabled, the ROM starts every time. If enabled, it never starts.

Also makes me think there should be a reset method in Switches.

sa666666 commented 6 years ago

ROM is working in all other emulators, so it cannot be a reset vector issue (unless all other emulators are doing something wrong).

sa666666 commented 6 years ago

Manually patching the ROM to fix the reset vector fixes the load issue, but still doesn't explain why it works in other emulators, and worked prior to 5.1. We need to test on real hardware to know for sure if this ROM is valid.

ale-79 commented 6 years ago

The rom works on real hardware (with an harmony, I don't have the real cart). Maybe is related to the initial value of TIA registers? Execution starts in TIA address space and it has to read a $00 (opcode for "BRK") in order to jump to the start routine at $f0d8.

sa666666 commented 6 years ago

I guess that leads to several other questions:

thrust26 commented 6 years ago

It worked in 5.0.2. Weird.

When the game jumps to 0000, it reads from TIA collision registers. Did we change something here?

thrust26 commented 6 years ago

Also if you start from the commandline and set the '-tv' property to something other than 'color' (even something that is invalid??).

That's completely odd and works with "Power-on options" too. I suppose the bug we find (hopefully soon) will be quite interesting.

sa666666 commented 6 years ago

I found this issue completely by accident. I meant to force NTSC or PAL for the ROM, but that property is 'format'. I mis-remembered it as 'tv', and found the issue from there :smile:

Anyway, being related to D3 of SWCHB could just be a side effect of something else. Maybe it's somehow using what's on the databus at that point, and it just happens to be sensitive to D3 of SWCHB??

sa666666 commented 6 years ago

Changing randomization of A/X/Y/PS also allows this ROM to occasionally start.

EDIT: To be clear, turning off randomization of the above makes startup fail every time. Turning it on allows the ROM to start, but not consistently.

thrust26 commented 6 years ago

Debugging shows that it reads 3F from 00 when failing and 00 when it works. So the unused TIA bits are either set or not.

What's correct? I think the bits are coming from the last address. Which is 0, so all bits should be 0, no?

sa666666 commented 6 years ago

Stranger and stranger. If you enable "drive unused TIA bit randomly", then when the ROM does start up, it has sound but no picture :confused:

As for what's correct, I have no idea. We had a similar discussion in the past, where we wondered if TIA bits are randomized or not. Perhaps there's even an issue for it.

thrust26 commented 6 years ago

The behavior is no surprise. With random bits, everything can happen at startup.

sa666666 commented 6 years ago

If lastDataBusValue is set to 0 in TIA::peek(), then the problem goes away. Of course we can't do that, since I think it was determined that using the databus value in peek is correct, and removing that would break other ROMs.

thrust26 commented 6 years ago

IMO the question is, why is lastDataBusValue sometimes != 0?

sa666666 commented 6 years ago

Relevant block of code from TIA::peek()

  // If pins are undriven, we start with the last databus value
  // Otherwise, there is some randomness injected into the mix
  // In either case, we start out with D7 and D6 disabled (the only
  // valid bits in a TIA read), and selectively enable them
  uInt8 lastDataBusValue =
    !myTIAPinsDriven ? mySystem->getDataBusState() : mySystem->getDataBusState(0xFF);

It will only be non-zero when there is something on the databus that is non-zero. So that's fairly obvious.

I guess you mean why is it sometime non-zero with this particular ROM? If that's what you're asking, then it might be related to randomizing other things. As I mentioned above, turning off randomization of the CPU regs makes it fail; turning it on makes it succeed sometimes.

thrust26 commented 6 years ago
void RiotDebug::saveOldState()
   ...
  myOldState.SWCHB_R = swchb();

This causes the problem because it sets myDataBusState.

sa666666 commented 6 years ago

Why is that even being called, if the debugger isn't being activated?

I suppose its call to TIA::peek is a problem; maybe it should reference the M6532 class directly?

thrust26 commented 6 years ago

I added a method call to saveOldState()in Debugger:initialize() to be able to track changes better when entering the debugger. If I remove that call the ROM works.

But I think the error is, that saving a state changes TIA internal states. That should not happen.

sa666666 commented 6 years ago

This is still only a symptom of a larger problem. Accessing swchb in the debugger should not have side effects, so it shouldn't hurt anything when you call saveOldState(). Now, whether is should be called is a separate issue. But it should be possible to call it and not break anything.

thrust26 commented 6 years ago

Yes, that's what I meant.

I briefly checked the saveOldState() methods in Debugger::saveOldState(), and there are quite a lot of peek() calls.

Maybe we could use myDataBusLocked here? Better lockBankswitchState(), I think this is missing.

sa666666 commented 6 years ago

I guess we initially passed all peeks through System so that what we see in the debugger is properly 'massaged' into whatever the end user will see.

The real overarching problem is that when the debugger does a peek, it needs to be a const method (aka, read-only), but when the emulation does a peek, it needs full emulation to happen. Similar to the logic in the various Cart::bank methods; if the debugger is enabled, then bankswitching is disabled, even if you strobe a hotspot that would normally cause it to bankswitch.

How to extend this idea to the TIA (and all other classes, really) is a problem.

thrust26 commented 6 years ago
void Debugger::saveState(int state)
{
  mySystem.clearDirtyPages();

  unlockBankswitchState();
  myOSystem.state().saveState(state);
  lockBankswitchState();
}

This is how it is usually done.

But I don't quite understand, because unlockBankswitchState() sets myDataBusLocked = false. And the check in System::peek() and poke() is:

#ifdef DEBUGGER_SUPPORT
  if(!myDataBusLocked)
#endif
    myDataBusState = result; // or value

Doesn't this do the opposite of what we need?

DirtyHairy commented 6 years ago

I just debugged the issue. As you found out, the culprit is

  #ifdef DEBUGGER_SUPPORT
    myDebugger = make_unique<Debugger>(*this, *myConsole);
    myDebugger->initialize();
    myConsole->attachDebugger(*myDebugger);
  #endif

in OSystem::createConsole which causes a flurry of debugger peeks. As you noticed, those have side effects on the data bus, and this is the underlying issue. I propose that we refactor System::peek and System::poke into two versions: one with side effects, and one without side effects.

thrust26 commented 6 years ago

myDataBusState seems to be the only critical value, no?

DirtyHairy commented 6 years ago

Precisely.

DirtyHairy commented 6 years ago

Let's split System::peek into System::purePeek (private and inline, no side effect), System::peek (calls purePeek and updates myDataBusState) and System::peekDebug (calls purePeek). If we use peekDebug in the debugger, the problem will be solved for good.

However, I have to go now, so I cannot implement this before the weekend... If anyone has a better idea or wants to beat me to it... feel free :smirk:

sa666666 commented 6 years ago

There are several issues here:

DirtyHairy commented 6 years ago

If course, we can also leave out peekDebug and just call purePeek, that's a matter of taste.

thrust26 commented 6 years ago

@sa666666 How about the other methods using the same pattern? loadstate, step, trace...

IMO we could just use a reversed pattern for e.g. saveOldState() and saveState() to fix everything.

Else we have to create a LOOOT of redundant code for the debugger.

sa666666 commented 6 years ago

@thrust26, no, those are correct. While in the debugger, bankswitching is locked/disabled. When we step/trace/etc, we need to reactivate bankswitching, so we unlock the ability to bankswitch, do the work, then re-lock it again.

For saving, it never should be unlocked in the first place, so the calls to unlock/relock shouldn't be there. Think of those calls as a mutex protecting bankswitch code from happening. unlock allows bankswitching to happen, lock disallows it again.

We'd still need it for loadState, though, since that can potentially cause a bankswitch. It's very similar to why in Serializer, loadState is non-const and saveState is const.

thrust26 commented 6 years ago

Except for loadState() I suppose.

So why do we not use the existing lock in saveOldState()? Isn't the lock designed exactly for this?

I knew the bug would be interesting. :smile:

DirtyHairy commented 6 years ago

An extension of my suggestion: In 6502.ts, I have two code completely separate code paths for memory access: read / write (with side effects, for emulation) and peek / poke (no side effects, for the debugger). If we agree to more refactoring, we could do the same thing in Stella. In essence, this means adding two more virtual methods to Device and refactoring the cartridges to implement them. This would completely eliminate the need for locking

sa666666 commented 6 years ago

@DirtyHairy, I agree that in the long run, that is the better approach. We can't forget to lock/unlock something if happens 'automatically'.

DirtyHairy commented 6 years ago

Exactly, and we don't have to worry about the lock state before locking when we unlock --- this is where the current implementation is imho inherently fragile, we just lock and unlock unconditionally.

sa666666 commented 6 years ago

It's amazing how a simple bug report of a ROM not starting up leads to a 20+ discussion post and some fairly major refactoring of the codebase :smiley:

So, who wants to take the lead on this :smiling_imp:

thrust26 commented 6 years ago

Not me, because I think the solution is way over the top.

Debugger::initialize()
  ...
  lockBankswitchState();
  saveOldState();
  unlockBankswitchState();

would do it IMO.

DirtyHairy commented 6 years ago

I suggest we do the simpler refactoring of peek and poke now to fix this bug, and create a new ticket for refactoring the general case later. I can do this at some point, but probable not before early summer :smirk: --- my pipeline is currently full enough 👹

thrust26 commented 6 years ago

Fine by me.

And once you have looked up the necessary code changes in the debugger, you may close the ticket right away. :smile:

thrust26 commented 6 years ago

So is this a fix for 5.1.1 or 5.2? A game is broken, so 5.1.1 seems more appropriate, no?

sa666666 commented 6 years ago

@thrust26, I don't think calling lockBankswitchState() as you suggest is sufficient, since it only protects against inadvertent hotspot strobes, not against myDataBusState being changed.

But I also agree that the extra work may not necessarily be justified, since the Cart classes are working and don't have any problems.

The fix for the ROM can be 5.1.1, since reworking the System class to have two different peek methods is fairly straightforward. I will work on this.

thrust26 commented 6 years ago

since it only protects against inadvertent hotspot strobes, not against myDataBusState being changed.

It does because it calls lockDataBus() too. The name is wrong, that's all. We should rename it into lock or lockSystem.

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
void Debugger::lockBankswitchState()
{
  mySystem.lockDataBus();
  myConsole.cartridge().lockBank();
}

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
void Debugger::unlockBankswitchState()
{
  mySystem.unlockDataBus();
  myConsole.cartridge().unlockBank();
}
sa666666 commented 6 years ago

Oops, so it does. I wrote the code, but am starting to forget about it. Are you saying that the changes to Debugger::initialize() as you described above fix the problem??

thrust26 commented 6 years ago

Yes, Stargunner works with that.

I think we should fix saveState() and loadState() (maybe more?) too. Any maybe rename the (un)lock methods to make more sense.

DirtyHairy commented 6 years ago

I certainly won't insist ;) However, I am not so sure that the current solution of locking the bankswitch scheme is sufficient. For example, there are reads with side effects in the RIOT (if I am not mistaken); those are not handled by the current code, and debugger peeks can inadvertently trigger them.

thrust26 commented 6 years ago

Bankswitching is locked too, so that should not happen. Or do I miss something?

sa666666 commented 6 years ago

OK, so if we are sure that @thrust26 changes fix this ROM, we should just commit that for now, for 5.1.1. And the rest can be put in a new issue, that we can revisit when we have time. I like @DirtyHairy idea, but it will be a lot of work to do it. Not the first time that the codebase was up-ended like that, though.

thrust26 commented 6 years ago

Will you do the fix @sa666666 ?