libretro / vice-libretro

Versatile Commodore 8-bit Emulator
GNU General Public License v2.0
40 stars 70 forks source link

Significant performance drop since move from VICE 3.3 to 3.5 #411

Closed modeler closed 3 years ago

modeler commented 3 years ago

The VICE x64 core appears to run slower since moving to VICE 3.5 as its base. Crucially, it will no longer run at 50 FPS on a Raspberry Pi Zero or original Raspberry Pi clocked at 1 GHz:

Verified by building the 3.5 core from this specific commit: https://github.com/libretro/vice-libretro/commit/eb41072b5ad30b79d2d22b4548c25706bba13b8c

versus the 3.3 core from the previous one: https://github.com/libretro/vice-libretro/commit/d9c6b7538fb0be29730b873d24318840c6e7ae99

I understand that this is probably not a defect within the core itself, however may I suggest maintaining an older release to maintain a usable VICE core with the ReSID engine for low-end devices?

sonninnos commented 3 years ago

That is unfortunate. I only have a RPi2 to test with, and it can do full framerate with default settings.

Edit: And of course by fast you mean the default "Fast" sampling, and not "Fast Resampling", which is way slower.

modeler commented 3 years ago

Yes I meant "Fast", not "Fast Resampling" but that's a good thing to point out, thank you. I have edited the main text to make that clearer.

I have not compared stand-alone VICE 3.3 with 3.5 to see if the drop in performance is present upstream, but I would imagine it is. I just wanted to make a record of the issue and where it occurred, for future reference.

For the record, as lifelong Commodore 64 fanatic, thank you for this core. It is amazing.

sonninnos commented 3 years ago

Stand-alone has dropped the fast version (x64) altogether, so it is no surprise that they are going for accuracy in other fronts too.. The actual difference between x64 and x64sc is VIC-II chip, but this core will also default ReSID to "Fast" to better match the "fastness".

Could you try to build that 3.3 version with HAVE_RESID33=1 to include the 3.3 version of ReSID, to see if it has any performance difference to the default 2.4 version? I'm assuming that they only differ in 8580 filter stuff, but no harm confirming.

Also you could try editing retrodep/ui.c and remove the "SoundFragmentSize" set row, to get the default size, or try other values (0-4).

#define SOUND_FRAGMENT_VERY_SMALL    0
#define SOUND_FRAGMENT_SMALL         1
#define SOUND_FRAGMENT_MEDIUM        2
#define SOUND_FRAGMENT_LARGE         3
#define SOUND_FRAGMENT_VERY_LARGE    4

That has been the "very small" for ages and also in that 3.3 commit, but set elsewhere. The sound/video sync code had a big makeover between the versions, so maybe it plays a role too. And/or because the core is no longer outputting only in mono, but always in stereo, regardless of extra SID chips.

That can be tested by editing retrodep/archdep.h and changing #define ARCHDEP_SOUND_OUTPUT_MODE SOUND_OUTPUT_STEREO to SOUND_OUTPUT_MONO.

And thanks-a-bunch, let's keep making it more amazing!

sonninnos commented 3 years ago

I managed to cripple my RPi2 enough to also be able to get full speed with 3.3, while not with 3.5.

Turned out those fragment + stereo/mono ideas did next to nothing, so I guess the only route is try including the older version of ReSID..

Compiling sure takes forever in crippled one core mode, and constant cripple/uncripple rebooting also takes patience, heh.

VICE changelog caught my eye:

* Changes in Vice 3.5
=====================
** SID fixes
------------

- Fixed the filter saturation
- Noise writeback fixes
- Envelope regression fix
- Fix the coefficients for the resid external filter
- Rough implementation of the shift register and waveform zero bitfade
- Added 4 possible additional SID chips for a total of 8 (x64*/xscpu64/x128 only)

* Changes in Vice 3.4
=====================
** SID fixes
------------

- use model dependent floating output ttl values like in residfp

This core skipped 3.4 completely, but any of those can affect who knows what regarding performance. My suspect is the floating ttl though..

Which means comparing standalone 3.3 vs 3.4 vs 3.5 can be also worthwhile.

sonninnos commented 3 years ago

I'm afraid SID emulation is not to blame alone, since I tried copying all SID related code from 3.3 to 3.5, and it did not speed it up enough..

Which means there needs to be a separate time capsule of 3.3 with up-to-date libretro part then.

I had to drop the clock to 600 Mhz and limit to single core in order to get it slow enough, and even 1Ghz single core is more than enough here.

BUT I also managed to "cure" the slowness with either of these frontend options:

Surely the result is more input lag, but better than nothing I suppose..

modeler commented 3 years ago

Thank you for your efforts. Since the RPi 1 and Zero are based on single-core SoCs, enabling threaded video is an option for me. I am happy to continue using the VICE 3.3 core I built, which is attached if anyone else wishes to use it.

vice_x64_libretro.zip

Feel free to close this, as it's clearly caused by a shift in system requirements in upstream VICE.

sonninnos commented 3 years ago

Threaded video still made a difference here after I disabled all but one core though.