libretro / parallel-n64

Optimized/rewritten Nintendo 64 emulator made specifically for Libretro. Originally based on Mupen64 Plus.
310 stars 128 forks source link

Recent, significant performance regression with Angrylion #631

Open rzumer opened 4 years ago

rzumer commented 4 years ago

I currently use a build of ParaLLel on this commit, because a subsequent one caused large performance degradation on my system, and despite recent performance enhancements I am still not able to achieve similar frame rates using the new dynamic recompiler.

The commit I suspect is this one, but I have not retested it recently. The first regressed build I have is on this commit.

Changing the RDP from the default "auto" setting, or modifying thread sync level, doesn't seem to change much (I did make sure to close and restart the core between each change).

CPU is Ryzen 2700X, platform is Windows x64. I used Doubutsu no Mori to test performance, as the title screen chugs on recent versions.

mudlord commented 4 years ago

Thanks for pointing out the exact commits and a exact test case.

Got exact benchmarks as to how the performance deviance really is?

One of these days I really need to run through MSVC2019's profiler this.

rzumer commented 4 years ago

A quick unlocked framerate test shows a fairly stable 70 fps using the "good" commit and a more variable 43-67 fps (averaged over a few seconds, based on the FPS counter) using the recent commit, and it is below 60 most of the time.

mudlord commented 4 years ago

Interesting, I'll also have a play with rigging up Themaister's new RSP dynarec with this old build, to see what the performance diff really can be.

mudlord commented 4 years ago

Prob the problem is the thread syncing, which is needed for some titles that need proper framebuffer content. The old version didn't use any thread synchronization at all.

mudlord commented 4 years ago

I guess a deeper look would involve reverting all the RDP state changes (the VI code is left as-is in the differences between the two versions).

I'm curious how dynamically allocating RDP state would affect performance vs static inited state (how it is now).

Where all RDP commands are synced is in https://github.com/libretro/parallel-n64/blob/master/mupen64plus-video-angrylion/n64video.c#L244

I guess a good experiment would be adding the compat flags to the old previous build and then benchmarking from there.