libretro / bsnes2014

Libretro fork of bsnes. As close to upstream as possible.
GNU General Public License v3.0
9 stars 17 forks source link

Fix lagfix regression which makes multiple frame events possible within a single VBLANK interval #19

Closed Brunnis closed 7 years ago

Brunnis commented 7 years ago

This pull request fixes an issue observed in Alien vs Predator, where the screen would stretch out horizontally seconds after you start moving in the first level. See this thread for details:

http://libretro.com/forums/showthread.php?t=6728

It was observed that the issue only occurred with the lagfix enabled. After analyzing the code it seems this bug was introduced while investigating the MSU-1 issue reported here: https://github.com/libretro/bsnes-libretro/issues/14

It was found that the lagfix didn't cause those MSU-1 issues, but the code changes that were made to the lagfix were kept, since they were considered an improvement. The issue with these code changes appears to be the following:

The frame_event_performed flag is cleared already at V=241. When this happens, there are still several lines left of the VBLANK interval. If a game enables NMI (or disables and re-enables NMI) after line 241 but before the end of VBLANK, another frame event will occur. This appears to be what Alien vs Predator does.

Therefore, it's crucial that the frame_event_performed flag isn't cleared until after the VBLANK interval ends and before the next one starts, i.e. somewhere between V=0 and V=224. The initial pull request for the lagfix did actually clear the flag on V=0. However, this was not considered safe, according to comment nr 5 in this pull request: https://github.com/libretro/bsnes-libretro/issues/14. The comment mentions that clearing the flag in the CPU's scanline() function is bad because the screen might be off which will cause the function to not be called when V=0 and therefore the frame_event_performed flag won't clear. Is this actually true? I'd appreciate a second look at this, but to me it looks like the CPU's scanline() function will in fact always be called when V=0.

If I'm wrong, I'd appreciate your opinions on alternate points in the code where it would be safe to clear the frame_event_performed flag. Otherwise, I propose that we change back to clearing the flag in the CPU's scanline() function as this pull request suggests.

Brunnis commented 7 years ago

I've tested the following games (menu + beginning of first level) with the new version (lagfix enabled + accuracy, balanced and performance profiles) without issues:

Alien vs Predator Chrono Trigger Donkey Kong Country Dragon Quest V F-Zero Megaman X RPM Racing Star Fox Super Castlevania Super Mario World Super Mario World 2: Yoshi's Island

Alcaro commented 7 years ago

You're right, I misremembered. I knew that HDMA init is not always performed, and disabling the PPU turns off a lot of stuff, so I drew an invalid conclusion; turns out the real answer is that HDMA init isn't performed if that channel isn't marked for HDMA at V=0, and some games (or at least homebrews) turn on HDMA mid-frame. Sorry about that. Still, I'm not sure how scanline() could be called multiple times with the same vcounter()... does it stop counting at 241, or fire multiple times if you fiddle too much with fblank?

This also sounds like it'd make #17 a lot safer (unless some processor is capable of wasting >20 scanlines in the same opcode, need to double check WAI and MVN).

Too bad byuu is so eager to dismiss people's ideas and pretend he's always right, rather than actually debating things on their merits; he's the only one who really knows the bsnes architecture, and what's safe and what isn't. He would've shot that one down quite fast.

Brunnis commented 7 years ago

Still, I'm not sure how scanline() could be called multiple times with the same vcounter()... does it stop counting at 241, or fire multiple times if you fiddle too much with fblank?

It's not scanline() that is called multiple times and it doesn't have to occur at V=241. The problem (at least as far as I believe I've understood) is: Imagine that a frame event occurs at V=241 and the frame_event_performed flag is cleared immediately after. Now, any NMI occurring after that point but before the VBLANK period ends will cause another frame event to occur, due to the lagfix code that makes a frame event occur in immediate response to an NMI. The scanline() function does not need to be called for this to happen; it happens right in the nmitimen_update() function.

Normally, the NMI would occur right at the beginning of VBLANK, but if NMI detection was disabled at that point and then enabled later during VBLANK, NMI would be delayed until that point. At least that's what I believe happens, after googling around on the interrupt handling.

This also sounds like it'd make #17 a lot safer (unless some processor is capable of wasting >20 scanlines in the same opcode, need to double check WAI and MVN).

That would be great. Let me know of your findings and I'll take a look as well. I think you know this a lot better than I do, but I'll help where I can.

Too bad byuu is so eager to dismiss people's ideas and pretend he's always right, rather than actually debating things on their merits; he's the only one who really knows the bsnes architecture, and what's safe and what isn't. He would've shot that one down quite fast.

Yep, drama always seems to be just around the corner and it's hardly productive...

Alcaro commented 7 years ago

Oh right, those other ways to trigger the frame event. It should still yield only one per frame, but if the game only does that every second frame, it'll reduce the effective framerate to 30; it could possibly also do some other stupid shit. Not sure how it'd stretch the screen, but I'll just trust you on that one.

But that means NMI can fire arbitrarily close to the V=0 reset. I don't know how long System::runtosave can run the relevant processors, but I'm not comfortable with less than one full scanline between the reset and the nearest theoretically possible check, and the more the better.

Solution: Why reset at V=0? Why not V=200?

I think you know this a lot better than I do, but I'll help where I can.

I may know the bsnes architecture better, but there's no doubt you're the more thorough tester. I can't think of any plausible faults with this code, but you proved me wrong the last four times I said exactly that.

So that's what I'll ask you to do. Apply this patch, then move savestates between lagfix-on, lagfix-off, and (if you can figure out how to get it to load a ROM) upstream higan v094.

diff --git a/sfc/alt/cpu/serialization.cpp b/sfc/alt/cpu/serialization.cpp
index 6bc8fc6..29267d9 100644
--- a/sfc/alt/cpu/serialization.cpp
+++ b/sfc/alt/cpu/serialization.cpp
@@ -42,7 +42,7 @@ void CPU::serialize(serializer& s) {
   s.integer(status.nmi_pending);

 #ifdef SFC_LAGFIX
-  s.integer(status.frame_event_performed);
+  status.frame_event_performed = true;
 #endif

   s.integer(status.irq_valid);
diff --git a/sfc/alt/cpu/timing.cpp b/sfc/alt/cpu/timing.cpp
index 137b370..c25426a 100644
--- a/sfc/alt/cpu/timing.cpp
+++ b/sfc/alt/cpu/timing.cpp
@@ -74,14 +74,7 @@ void CPU::scanline() {
   system.scanline();
 #endif

-  if(vcounter() == 0)
-  {
-#ifdef SFC_LAGFIX
-    status.frame_event_performed = false;
-#endif
-    
-    hdma_init();
-  }
+  if(vcounter() == 0) hdma_init();

   queue.enqueue(534, QueueEvent::DramRefresh);

diff --git a/sfc/cpu/serialization.cpp b/sfc/cpu/serialization.cpp
index 60ddc4e..87898ea 100644
--- a/sfc/cpu/serialization.cpp
+++ b/sfc/cpu/serialization.cpp
@@ -32,7 +32,7 @@ void CPU::serialize(serializer& s) {
   s.integer(status.nmi_hold);

 #ifdef SFC_LAGFIX
-  s.integer(status.frame_event_performed);
+  status.frame_event_performed = true;
 #endif

   s.integer(status.irq_valid);
diff --git a/sfc/cpu/timing/timing.cpp b/sfc/cpu/timing/timing.cpp
index 6cf8eb7..e99c6b9 100644
--- a/sfc/cpu/timing/timing.cpp
+++ b/sfc/cpu/timing/timing.cpp
@@ -51,10 +51,6 @@ void CPU::scanline() {
 #endif

   if(vcounter() == 0) {
-#ifdef SFC_LAGFIX
-    status.frame_event_performed = false;
-#endif
-    
     //HDMA init triggers once every frame
     status.hdma_init_position = (cpu_version == 1 ? 12 + 8 - dma_counter() : 12 + dma_counter());
     status.hdma_init_triggered = false;
diff --git a/sfc/system/system.cpp b/sfc/system/system.cpp
index 128027f..08766e0 100644
--- a/sfc/system/system.cpp
+++ b/sfc/system/system.cpp
@@ -247,6 +247,15 @@ void System::scanline() {

 void System::scanline(bool& frame_event_performed) {
   video.scanline();
+
+  if(cpu.vcounter() == 200) {
+    //why not 0? because it appears theoretically possible for one of these checks to come
+    // arbitrarily close to scanline 0, so let's have some margin.
+    //why specifically 200? it's a nice, round number, with a huge margin of error if the CPU is
+    // scheduled way too long during runtosave
+    frame_event_performed = false;
+  }
+
   if(cpu.vcounter() == 241) {
     if(!frame_event_performed) {
       scheduler.exit(Scheduler::ExitReason::FrameEvent);

I moved the reset to System::scanline because (1) less code duplication (2) less places changed by the lagfix. Consolidating the V=0 checks makes some sense, but that advantage disappears if we move off V=0.

...actually, I wonder if that's a way to detect bsnes vs real hardware... System::runtosave runs all processors for one opcode, WAI and STP can run very long, and SA-1 can break a WAI - but only if it's scheduled to execute, which it won't be. Instead, the WAI will run until the PPU interrupts it, which is much later and easily detected. (I checked the entire codepath, the PPU interrupts come from the CPU thread itself.)

(Fun fact: Of the major three SNES emulators, only ZSNES doesn't deadlock on STP. It implements STP as regs.pc-=1, which is easy to exit; the others implement it as while(true){}.)

Brunnis commented 7 years ago

Thanks for the input! I just wanted to say that I've started looking into this and will hopefully have something to report in a few days.

A quick note regarding resetting of the frame_event_performed flag: To me it appears that the cleanest way of doing it is exactly where nmi_line is set to false. nmi_line represents the physical NMI line to the CPU and is high during the whole VBLANK interval. When nmi_line is false, the CPU can no longer trigger on any NMI and therefore no frame event can occur. This is the earliest point at which the frame_event_performed flag should be cleared. The nmi_line variable is controlled entirely in the CPU code and is set to false on the first call to poll_interrupts() after vcounter flips around.

I think performing the flag clearing this way is logical and easy to understand.

Brunnis commented 7 years ago

I've been looking a little bit further into this serialization issue. Here's my (probably incomplete) understanding:

We know that serialization is done when within one of the scheduler.exit(Scheduler::ExitReason::FrameEvent) calls. Each of these events is followed by setting the frame_event_performed flag to true. The latest point at which this will occur in a frame is at V=241. If we implement my suggestion to clear the flag when the NMI line goes low, the frame_event_performed flag will get cleared very close after V=0 (I've verified this in the code).

With clearing the frame_event_performed flag this way, there will be lots of cycles between setting it and clearing it. The runtosave() call will run one opcode before exiting the CPU thread again. For all opcodes other than WAI, we can be sure that frame_event_performed will still be true, since we won't have executed enough cycles to clear the flag.

If the current/next instruction is WAI, the CPU will wait until the next time the NMI or IRQ lines go from low to high. For NMI, this happens when the next VBLANK interval starts. From what I can see in the code, the NMI line will first go high and a new scheduler.exit(Scheduler::ExitReason::FrameEvent) call will be made. Since we're in the runthreadtosave() loop, the emulator will again enter the CPU thread and immediately set the frame_event_performed flag to true. Since the NMI interrupt has now occured, WAI will end and we can once again assume that the frame_event_performed flag is true.

The problem is if WAI is ended by a regular IRQ instead of an NMI. From what I've seen, this can happen at an arbitrary point, so there's no way for us to know what frame_event_performed should be when serializing after such a WAI instruction.

Currently, my suggestion is to include the frame_event_performed flag and serialize it whether the lagfix is on or off. We'll just ifdef the parts of the lagfix code that trigger a frame event when an NMI occurs. This way, we'll always have an accurate value for frame_event_performed, whether the lagfix is on or off, and the saves will be compatible. However, the saves will not be compatible with bsnes standalone or other bsnes variants/forks, but being compatible within the LibRetro context might be good enough?

Alcaro commented 7 years ago

For all opcodes other than WAI, we can be sure that frame_event_performed will still be true, since we won't have executed enough cycles to clear the flag.

Well, STP also runs forever, but it also runs forever so it just deadlocks the process and doesn't read the flag. DMA may also run a while, I don't know how that interacts with the scheduler (and with interrupts).

From what I've seen, this can happen at an arbitrary point

Correct. It may be limited to V={0..223}, but that's arbitrary enough. And SA-1 can also initiate IRQ, and that one ain't gonna wait for the PPU.

but being compatible within the LibRetro context might be good enough?

I'd prefer seeing this fixed properly. Yes, it is theoretically possible for my solution to go wrong, but it would require a huge number of unlikely events, all at once: CPU going to sleep before vblank starts (CPU must be sleeping when the frame event fires, or runtosave returns immediately), using WAI rather than busy loop (some games don't), turning off NMI (it would break the WAI), and scheduling an IRQ after scanline 200 but before vblank (after V=200, or flag isn't cleared yet; before vblank, or flag gets set again). There's no way a game can sleep through vblank and still be a game.

...on the other hand, we're the only ones still using 094. Anyone not using libretro either stopped upgrading long ago, or still does it. And lagfix-serialized is the default configuration, removing the variable would mean breaking savestates twice for libretro users. The incompatibility is unfortunate, but it's the best thing we can do.

If we were following upstream, we'd care more, but upgrading would require figuring out his latest API breaks, and all we'd gain is even more slowdown and random regressions in Super Mario RPG that force us to upgrade again and find the new regression.

So yeah, I'll just flip some of those ifdefs.