libretro / gpsp

gpSP for libretro.
GNU General Public License v2.0
52 stars 52 forks source link

ARM dynarec slows right down when hitting the ball in Mario Golf: Advance Tour #151

Open therealteamplayer opened 3 years ago

therealteamplayer commented 3 years ago

In Mario Golf: Advance Tour, when the player tees off, the player experiences slowdowns when Dynarec is enabled. When it is disabled, this specific slowdown does not occur. This may only be noticable on lower end systems, on which the game otherwise runs full speed with dynarec enabled.

I tested this on the New 3ds build (where the issue was noticably bad with dynarec enabled), as well as building it myself to run on the Anbernic RG351V (i.e. a 32-bit ARM build). The issue was present on both.

The quickest way to test this from a new save is to reset after the name entry screen, then go to quick game, start any match, then hit the ball.

davidgfnet commented 3 years ago

A save file and/or savestate is welcome! Also the ROM hash, since there are too many versions of some games :)

therealteamplayer commented 3 years ago

ROM CRC32: D56C2E54 While I know the RG351V is on the latest build (since I compiled it myself), the 3DS build might be earlier since it had no sound after loading the saved state (though both have the aforementioned issue). Either way, I've uploaded the save file/savestate for both. saves.zip Also, I'm using the official (CRC32 = 81977335) BIOS for both, and have set the 'Boot to BIOS' option.

andymcca commented 3 years ago

I also noticed this in Dynarec mode - I'm running the latest gpSP commit on a LeapsterGS Explorer (Cortex A9, 128MB RAM, VFPv3).

interesting that it doesn't happen when in Interpreter mode - although Interpreter mode results in general performance issues elsewhere which is kind of expected especially given the hardware I'm using!

andymcca commented 1 year ago

@davidgfnet having a little investigate of this old issue this morning.

Running the game in mgba and using the debugger, it seems during this part of the game there are a lot of interrupts and also a lot of switching between ARM and Thumb. Given that this problem doesn't exist in interpreter, I decided to try varying the MAX_BLOCK_SIZE / MAX_EXITS dynarec definitions In cpu_threaded.c to see if this would have an effect on behaviour.

Sure enough, if I tweak MAX_EXITS down very low (say, 2 or 4), this reduces and almost eliminates this particular issue without any other noticeable side effect in-game.

I am guessing that this results in smaller blocks being produced by the recompiler and perhaps reduces translation cache flushing but beyond that I'm still a little unsure as to why this works, ha ha! I'm sure you will have a better explanation so just flagging this one for when you have time to review.