Closed DevilBlackDeath closed 3 years ago
There was no real relevant change from my part here since November, so I dont see what wouldve cause this regression.
@dankcushions can u comment?
Looks like another user has similar issues with other cores (namely PPSSPP and FlyCast) so the issue may very well be on Lakka's side.
Edit : Also I unfortunately can't test anything before 01/01, which still contains the issue unfortunately. If, and only if, it is a Mupen issue, then it happened between Lakka's 2.3.2 release and the latest November changes.
@m4xw After discussing the issue with another user, it would seem such performance issue have existed since mid 2019. https://github.com/libretro/Lakka-LibreELEC/issues/1232
He points to a commit that still got decent performance, so I'm linking it in case it can help.
ah probably just changed defaults then on other users request.
Well I did try to change a bunch of settings, but even perfectly duplicating Lakka 2.3.2's settings didn't give better performances. Are there settings that are not exposed to Retroarch maybe ? An ini or something similar that I have to change or copy from one to the other.
If it turns out the defaults are the issue, I assume it would be Lakka's job to specify different settings or maybe apply a patch during build right ?
In all cases, when I have some time (and finally manage to build Lakka from a Docker cause I'm tired of day-long builds in my VM every time -_- ) I'll try to narrow down which commit is the turnaround point and tell you here !
I believe that whatever is slowing this core down on the Pi4 is happening on the GPU side. Overclocking the ARM CPU to 2Ghz had no noticeable effect on the core, but overclocking the GPU to 750Mhz led to obvious performance improvements.
Yes definitely ! Changing some settings did help a lot though so I assume there is a lot to what @m4xw said about defaults being changed.
I do recommend building Lakka with Parallel RDP though. Just tested that and while it COULD be unstable (didn't test it enough, maybe it is actually quite stable) it's an absolute beast. It actually makes Banjo-Tooie's Jiggywiggy's challenge playable at around 60-70% original speed, without artifacts, without the framebuffer being black and white and so on. Main issue I can see is the internal scaling and crop overscan not working. That could be due with the RPi4 not being Vulkan 1.1 compliant yet. I also couldn't get Parallel RSP to build for some reason. But once 1.1 compliance is there and Parallel RSP can be built, near fullspeed 64 emulation could actually be viable on the most demanding of games for the Pi 4. Gonna close that for the time being though as Mupen 64 is defnitely not at fault there, and neither is Lakka.
Edit : Nevermind I thought I could close this by myself. Contributors you can close this issue it would seem. Afaik tests have shown no responsibility of performance loss from Mupen64Plus-Next !
parallel doesn't use the parallel rdp on the pi 4 - it doesn't support it. instead it uses the ancient fallback HLE plugins it ships with - old version of GLideN64 I believe (or possibly rice). you can prove this via a verbose log.
it likely gives good performance as they're so ancient and unsophisticated, compared to the version of GLideN64 used in -nx. compatibility will be low, though.
but HLE is a RSP plugin afaik. Parallel RDP on the other hand is a RSP one. Also that's weird, considering I'm getting much better performance with that "compile" Parallel RDP and most importantly much better accuracy than GlideN64 is able to (be it GlideN64 on Mupen or on ParaLLEl-n64). Shouldn't this fallback be shipped with it if it gives better results on platforms like the pi ?
Also how is the fallback handled ? Since with Lakka nightlies Vulkan 1.0 is included, could it be that Parallel RDP wouldn't fall back ?
Edit : Sorry didn't read the second part of your message. I will test compatibility then. Any idea of a game that would be incompatible ? Tooie and Conker are my usual go-to when it comes to plugins giving their all but since Tooie is now confirmed to be working perfectly I don't know what to test. Gonna check the verbose log in all cases ;)
but HLE is a RSP plugin afaik. Parallel RDP on the other hand is a RSP one.
you've said RSP twice. HLE can be RDP or RSP. the parallel RDP is for LLE graphics, and is equivalent to HLE RDP plugins GLideN64, rice, and so on. there is a parallel RSP, but that's irrelevant for pi (if it even uses it), as it's not using LLE graphics, and indeed the HLE RSP plugin performs fine.
Shouldn't this fallback be shipped with it if it gives better results on platforms like the pi ?
if you want to use abandoned HLE plugins you could use https://github.com/libretro/mupen64plus-libretro, or indeed parallel - but to be clear, these will never get updates whereas nx core is actively updated with current GLideN64, which is an increasingly accurate and compatible plugin. infact, nx ships with parallel to, I believe!
Also how is the fallback handled ? Since with Lakka nightlies Vulkan 1.0 is included, could it be that Parallel RDP wouldn't fall back ?
if it worked, it would be a slideshow since the pi 4 is so weak, vulkan or not. but i am certain it's not, as it won't be building for vulkan for pi, and besides IIRC parallel RDP requires extensions beyond 1.0. but check verbose logs to be sure...
My bad, my second one was meant to be RDP. But aren't RSP and RDP supposed to work together ? Actually isn't RSP entirely responsible for audio and some low level graphics ?
No, neither Parallel RDP or Parallel RSP ship with Lakka, I had to do a custom build to even get the option.
Unfortunately I wasnt able to get the "normal" way of obtaining a verbose log, as my nightly won't run "retroarch -v" (but it will run fine as a service...) so I turned logs on in thesettings, rebooted, ran the game. While AngryLion is clearly shown to fall back to GlideN64, no such mention is made of Parallel. Unfortunately that doesn't confirm much.
Yes as I said earlier, Parallel is using Vulkan 1.1 afaik.
Further test also seem to indicate I'm getting better performances in particularly intensive parts than on my RX5700 with GlideN64. Just confirmed it. I cannot get anywhere near 30-40% speed on Jiggywiggy's challenges with GlideN64 up to date, whereas the "Parallel RDP", whatever it is or falls back to on my Pi, is at the very least at half speed (probably more, I'll try to record the FPS) with no major graphical issues in this part (most older HLEs I've used couldn't display that properly with decent performances). If it turns out to be the actual Parallel RDP I expect to get into crashes, considering Vulkan is not good enough.
Another thing I only just considered : The RetroPie Supreme Ultra was actually able to get better 64 performances with Parallel. Though there isn't much precision about which Parallel. It seems to be the core, but the only way to squeeze performance out of the parallel core with Vulkan would actually be with the parallel GFX plugin afaik. I was the first one to think Parallel would be a slideshow on Pi 4, but maybe 2.0Ghz quad-core would actually be enough (considering overclocking here). I will also test if changing the CPU clock gives major performance differences since a lot of Parallel RDP's performances are supposed to be CPU bound. Will report back on that !
Edit : Some more reading late and I realize I was wrong, Parallel RDP is actually GPU bound and is expected to scale pretty well to low performance GPU though there isn't much data, and I'd even dare say not many tests have been done on poor GPUs. We've all based Parallel RPD's performance expectations on assumptions but I'm wondering how much of it is right. If anyone has a good example of something Parallel RDP does accurately that no other HLE does, or at the very least that GlideN64 doesn't, please do tell, the more test cases, the better !
My bad, my second one was meant to be RDP. But aren't RSP and RDP supposed to work together ? Actually isn't RSP entirely responsible for audio and some low level graphics ?
they do work together, but given't you can't/aren't running either, let's not get into it.
While AngryLion is clearly shown to fall back to GlideN64, no such mention is made of Parallel. Unfortunately that doesn't confirm much.
yes it does. parallel RDP IS angrylion; it's the angrylion 'pixel perfect' LLE RDP plugin running as compute shaders. GLideN64 is (an ancient version of) a HLE RDP plugin.
Another thing I only just considered : The RetroPie Supreme Ultra was actually able to get better 64 performances with Parallel. Though there isn't much precision about which Parallel. It seems to be the core, but the only way to squeeze performance out of the parallel core with Vulkan would actually be with the parallel GFX plugin afaik.
no, if it was using the Parallel RDP it would be a slideshow. Angrylion is a very computational intensive LLE RDP plugin which is massively sped up when run as a compute shader on an ok GPU. The pi4 has a terrible GPU and there is no way it would handle it well, even if it did have the necessary vulkan compliance/extensions. the reason you're getting good performance is because parallel is falling back to an ancient version of the GLideN64 HLE plugin compared to the up-to-date, more accurate, more sophisticated one that is used here.
We've all based Parallel RPD's performance expectations on assumptions but I'm wondering how much of it is right.
i'm not. parallel's performance is well known - it runs on some mobile GPUs quite well, but remember, all of those are WAY better than the pi4, which cannot run it other than the irrelevant fallback HLE plugins that are nothing to do with parallel RDP.
yes it does. parallel RDP IS angrylion; it's the angrylion 'pixel perfect' LLE RDP plugin running as compute shaders. GLideN64 is (an ancient version of) a HLE RDP plugin.
I wouldnt say that it is, however i reuse the error message for RSP fallback. It says AL for both parallel and AL when telling you that a HLE rsp was selected. Parallel rdp isnt really supported on rpi to begin with
It's an error message for the RSP, not the RDP, and shows up regardless of selecting Parallel as the RDP plugin (selecting GlideN64 as the RDP plugin still shows that AngryLion error), that has nothing to do with Parallel being "deselected" as far as RDP is concerned.
Also, do we know what the "lower limit" GPU is ? As a computer scientist I'm well aware of the differences in power of the Pi 4, I'd just like to have a basis of comparison. Do we know what is the worst mobile GPU capable of just barely running every game at original FPS ? (or at least, most games).
yes it does. parallel RDP IS angrylion; it's the angrylion 'pixel perfect' LLE RDP plugin running as compute shaders. GLideN64 is (an ancient version of) a HLE RDP plugin.
I wouldnt say that it is, however i reuse the error message for RSP fallback. It says AL for both parallel and AL when telling you that a HLE rsp was selected. Parallel rdp isnt really supported on rpi to begin with
Isn't the main obstacle Vulkan 1.1 ? In which case supporting arm and aarch64 is important anyway because there are much more powerful arm-based SBCs than the Pi 4 and those could clearly use Parallel as long as they're Vulkan 1.1 compliant right (which not a lot are I believe but I could be wrong, I didn't take much of an interest in that yet I'll be honest).
What would be a good test case to determine which RDP is being run, and not only in that particular case ? Is there any point in the log where Mupen64Plus Next confirms which plugins it uses ?
My minspec is 1020MHz quad core arm (aarch64) and 460MHz GPU (Tegra T210 is used as reference, nintendo switch).
Isn't the main obstacle Vulkan 1.1 ? In which case supporting arm and aarch64 is important anyway because there are much more powerful arm-based SBCs than the Pi 4 and those could clearly use Parallel as long as they're Vulkan 1.1 compliant right (which not a lot are I believe but I could be wrong, I didn't take much of an interest in that yet I'll be honest).
What would be a good test case to determine which RDP is being run, and not only in that particular case ? Is there any point in the log where Mupen64Plus Next confirms which plugins it uses ?
Parallel RDP just-so runs fullspeed on my S21 with Mali G78, dont even dream about it on RPI.
My minspec is 1020MHz quad core arm (aarch64) and 460MHz GPU (Tegra T210 is used as reference, nintendo switch).
Isn't the main obstacle Vulkan 1.1 ? In which case supporting arm and aarch64 is important anyway because there are much more powerful arm-based SBCs than the Pi 4 and those could clearly use Parallel as long as they're Vulkan 1.1 compliant right (which not a lot are I believe but I could be wrong, I didn't take much of an interest in that yet I'll be honest). What would be a good test case to determine which RDP is being run, and not only in that particular case ? Is there any point in the log where Mupen64Plus Next confirms which plugins it uses ?
Parallel RDP just-so runs fullspeed on my S21 with Mali G78, dont even dream about it on RPI.
Well in terms of raw spec the Pi 4 is technically quad core up to 2Ghz with a GPU that goes up to 750 in overclock. Though iirc there's some dynamic scaling between those two but I could be wrong (that could be about the previous Pis that didn't have dedicated GPUs).
Out of curiosity and unrelated to the discussion, is that running at native res or upscaled ?
My main question would then be : what is it falling back to ? Whatever it falls back to gives me near fullspeed performance (at least at what seems to be native since I can't upscale it) while retaining visual fidelity where GLN64 doesn't and where the old mupen64plus core doesn't either, no matter what configuration is selected. I'm talking nearly fullspeed with framebuffer effects. Surely, even if it is an outdated HLE, it's clearly worth packaging it with Retroarch. Not as the default plugin obviously, GLN64 is the actively developped one and will only get better after all, but if it can get some of those more demanding games running, then I'd say maybe at least have a compilation option to make it appear ? While I generally agree "unstable" plugins give a bad perception to the end-users, most average users won't sit patiently for years waiting for the Pi to catch up and possibly even have a Pi 5 powerful enough to run P-RDP (wishful thinking, I know). I'll try some other demanding games like Mario Tennis, Goldeney and Conker, see if that still holds up.
Well in terms of raw spec the Pi 4 is technically quad core up to 2Ghz with a GPU that goes up to 750 in overclock.
it's not at all comparable with a dedicated gaming GPU. the GPU on the pi4 is very weak (the cpu is ok).
My main question would then be : what is it falling back to ?
rice, i believe
Surely, even if it is an outdated HLE, it's clearly worth packaging it with Retroarch
what do you mean, 'packaged with retroarch'? parallel is a libretro core on the buildbot like every other. do you mean 'packaged with lakka'? if so, ask them why they don't include it...
Oh yeah I know it's not comparable :P iirc the GPU is mostly meant to be used for cameras and light video stuff so I'd assume there's a lot of the dedicated cores most GPUs, even mobile GPUs, have that the Pi's lack (especially when it comes to 3D/advanced shaders stuff). Vulkan is still exciting IMO as far as non-emulation stuff goes, though mostly from an experimental standpoint as I don't have much interest in playing Quake 2 on a Pi for example.
So I just retested and my head hurts ! It would seem it actually falls back to GLN64. Rice has a tendency of messing framebuffer stuff (be it Banjo's jigsaw effects or Zelda's pause menu for what I tested). Though I can't figure out why I couldn't get Banjo-Tooie's shadow and the jigsaw effect to work yesterday... Maybe the fact I run parallel before somehow brought over the "GLN64" bugs I had there (since these are the same symptoms that I'm still able to reproduce in Parallel core with GLN64 as a GFX plugin). Today GLN64 in Mupen works fine, and most effects work as well, without nearly as much of a performance hit as yesterday...
I meant with Lakka but since I now discovered the fallback seems to be GLN64 and my issues were weird and apparently only my own, my point is now moot !
By the way, is GLN64 still updated only in terms of accuracy or also in terms of performance/optimizations ?
As a closing word for this comment, apologies for the trouble, after that Parallel test I kept up very late in the night to run all sorts of tests and well... yeah didn't go well with me ! Thanks for your messages, even if it didn't look like so it did help clear up my mind and now I shall avoid very late night tests and compilations -_-
By the way, is GLN64 still updated only in terms of accuracy or also in terms of performance/optimizations ?
it does get optimizations but emulation in general tends to drift towards higher and higher system requirements until accuracy is achieved. besides, many devices can run gliden64 quite well, so optimizations aren't really on the radar.
low power targets like the pi are not really well represented by code contributions across emulation, which favour more powerful targets like PC, nintendo switch, etc. it's strange, given how popular the device is for emulation, but that's how it's been for a long time.
In my tests between GLide64 and GLideN64, GLideN64 performed better in most games, oddly enough. So its also a factor of outdated technology taking shitty codepaths
@dankcushions Indeed that's weird, though I guess since all consoles up to that generation, including PSX and Dreamcast are very well emulated even on the Pi, I don't see much point in optimizing Mupen specifically for the Pi, even more so considering the Pi 5 will likely have enough power to run almost all N64 games at original FPS on 320x240 (if they add a very small dedicated GPU it can even likely reach 640x480). I know some people would, but I'd never ask the devs to focus their efforts on that kind of pointless goal. Surprisingly though, the Pi still runs better than the Switch when it comes to that particular generation, thought let's be real, it's not a GPU limitation, and more of a result of the much younger age of Switch's based emulation. Might personally make the jump once N64 gets there !
@m4xw Yeah I've been doing a LOT of tests these last 3 weeks and GLN64 definitely outranks GLide64 in every games once it's correctly configured. My only issue so far is with DK64, all cores and plugins are able to run it at native FPS or near-native FPS, but all have some form of major graphical glitches. ParaLLEl's HLE plugin causes animation to be distorted and Mupen's causes the camera to act weirdly like it was constantly resetting. Always been a tough game to emulate though and even when Tooie was nearly fully playable back in the day, DK64 still had some way to go iirc.
causes the camera to act weirdly like it was constantly resetting. Always been a tough game to emulate though and even when Tooie was nearly fully playable back in the day, DK64 still had some way to go iirc
DK64 Camera Issue workaround:
or just keep it at software, but perf probably wont make it
OS : Lakka nightlies dating from 30/03 Last stable performance confirmed : Lakka 2.3.2, commit 9ae6f16bb9c75f2d2f2fa2fc4fd001cf7dda6d95 of this repository
This is by no mean a complaint. Seeing how nightlies get little to no tests in regards to Lakka, and there are no reports, I figured I might as well report it. Performances are extremely low with constant stutter and microfreezes. Tested both from a MicroSD boot and USB SSD boot. Tested with both arm and aarch64 versions of Lakka image.
Tested on 2 different Pi 4, one of which has no issues running Banjo-Kazooie and to some extent Banjo-Tooie smoothly with the older version.
I'll try doing some more tests, see if one of the available nightlies doesn't have such a performance impact.
If this is believed to be Lakka's fault, please close the issue and tell me so I can open an issue on Lakka's GitHub.
Thanks in advance for reading :)