libretro / melonDS

DS emulator, sorta
GNU General Public License v3.0
38 stars 40 forks source link

[Bounty] [$50] Update melonDS core to latest version + extra features #68

Closed Ryunam closed 3 years ago

Ryunam commented 3 years ago

I'm opening this issue to serve as a general request for updating this core to the latest version (upstream code reached version 0.9 recently and the Windows x64 libretro core is particularly outdated at this point), while also adding a bounty to provide further incentive and take into account the following features in the process:

Hopefully all of the above is realistically feasible! This would definitely make it the ultimate go-to libretro core for DS / DSi titles.

Bounty link: https://www.bountysource.com/issues/92807839-bounty-update-melonds-core-to-latest-version-extra-features

Myaats commented 3 years ago

I tried rebasing a couple weeks back but got some weird errors and the switch branch already has a fully rewritten copy of the core with savestate support which I planned to port over when the JIT reached the melonDS master which it has now.

I think the hybrid view will have performance impact as rescaling the framebuffer in software is not that cheap. Also I have no idea how well runahead will map over the melonDS API, it might come with a huge performance penalty (but that is just my guess).

But I should look into rebasing again and moving the core rewrite out of the switch branch.

Ryunam commented 3 years ago

Thank you! Any contribution and update to this will be of course much appreciated.

With regards to Runahead, based on my understanding of this functionality so far, there are two things to consider:

Myaats commented 3 years ago

Thank you! Any contribution and update to this will be of course much appreciated.

With regards to Runahead, based on my understanding of this functionality so far, there are two things to consider:

* the faster and smaller savestates are, the better it will work, since it relies on a constant state save and load sequence to accomplish what it sets out to do;

* I’ve seen other cores where indeed single instance produced a very noticeable (unplayably slow) performance degradation, however having a second instance can help in such cases. Maybe melonDS could benefit from the second instance option the same way.

I have ported my core rewrite from the switch branch and so far runahead is unusable with hardware acceleration and has major graphical issues on same instance runahead. But it seems to run mostly fine with some minor random flickering on second instance software rendered at half performance. It might get better when I enable the JIT but there are still quite a number of issues to figure out.

Ryunam commented 3 years ago

Well, that’s some cool and reassuring progress already. :) I really appreciate your continued effort and dedication to this and I’m available in case you require some testing on Windows 10 x64.

I did expect Runahead to be unusable with HW acceleration, unfortunately. Seems that, as of its current form in RA, it is incompatible by design with any 3D or hardware-accelerated core. For example it doesn’t work with neither the Vulkan nor the OpenGL renderer in Beetle HW, but it does work very well with software (albeit with a performance penalty that is directly proportional to the amount of latency frames that you are shaving off).

Myaats commented 3 years ago

I have done an initial implementation of the hybrid screen layout with support for both the software and hardware renderer (It supports both 3:1 and 2:1).

If you want to test it, the code is at: https://github.com/libretro/melonDS/tree/rebase

Currently the only issue I have found is that the threaded software renderer is broken. And I'll look into it tomorrow.

DukeSkinny commented 3 years ago

Forgive me if this isn't the place, but since we're talking features: is an adjustable screen gap like the one featured in the DeSmume core possible to add?

bslenul commented 3 years ago

If you want to test it, the code is at: https://github.com/libretro/melonDS/tree/rebase

I got that on Windows (MSYS2):

src/libretro/libretro.cpp: In function 'void retro_init()':
src/libretro/libretro.cpp:67:10: error: 'time' was not declared in this scope; did you mean 'ftime'?
   67 |    srand(time(NULL));
      |          ^~~~
      |          ftime

I'm not a dev so not sure if it's a proper fix but I found online someone with a similar issue, I was able to compile by adding #include <ctime> to "libretro.cpp".

I compared the core options with my Linux VM, looks like there are a lot missing on the Windows version of the core, the GL and JIT options for example. I'm not really familiar with that core tbh, so no idea if that's normal or not.

Other than that I noticed a few issues with layouts: Top/Bottom and Bottom/Top result in this:

image

Left/Right and Right/Left:

image

Top and Bottom Only seem to work fine, the smaller screen in Hybrid mode overlap a bit on the bigger screen:

With Hybrid Ratio at "2": image

At "3": image

All the layouts work fine on my VM with the GL renderer core option enabled, screens in Hybrid mode don't overlap, etc. With the option disabled however, exact same results as above.

There are issues with inputs also, in Yoshi's Island for example inputs ingame work in menus but I can't move Yoshi. In Mario Kart when you scroll up/down in menus sometimes it's like it's doing multiple inputs at once, and ingame L button (to use items) doesn't work. In Castlevania Dawn of Sorrow X and Y don't respond. Haven't tried other games yet (it's late here :p).

Yoshi completely crashes RA on my VM:

ASAN:DEADLYSIGNAL
=================================================================
==1992==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f262800b080 bp 0x0000000007fd sp 0x7ffd26d53f60 T0)
==1992==The signal is caused by a READ memory access.
==1992==Hint: address points to the zero page.
    #0 0x7f262800b07f in GPU2D::GetOBJExtPal() (/home/bobby/.config/retroarch/cores/melonds_libretro.so+0x7f07f)
    #1 0x7f262800b79b in GPU2D::InterleaveSprites(unsigned int) (/home/bobby/.config/retroarch/cores/melonds_libretro.so+0x7f79b)
    #2 0x7f262800feba in void GPU2D::DrawScanlineBGMode<0u>(unsigned int) (/home/bobby/.config/retroarch/cores/melonds_libretro.so+0x83eba)
    #3 0x7f262800cbcb in GPU2D::DrawScanline_BGOBJ(unsigned int) (/home/bobby/.config/retroarch/cores/melonds_libretro.so+0x80bcb)
    #4 0x7f262800cee3 in GPU2D::DrawScanline(unsigned int) (/home/bobby/.config/retroarch/cores/melonds_libretro.so+0x80ee3)
    #5 0x7f2628006300 in GPU::StartHBlank(unsigned int) (/home/bobby/.config/retroarch/cores/melonds_libretro.so+0x7a300)
    #6 0x7f2627fd10cc in NDS::RunSystem(unsigned long long) (/home/bobby/.config/retroarch/cores/melonds_libretro.so+0x450cc)
    #7 0x7f2627fdadb7 in unsigned int NDS::RunFrame<true>() (/home/bobby/.config/retroarch/cores/melonds_libretro.so+0x4edb7)
    #8 0x7f262803555f in retro_run (/home/bobby/.config/retroarch/cores/melonds_libretro.so+0xa955f)
    #9 0x55ab2d1a062d in core_run /home/bobby/Documents/RetroArch/retroarch.c:40461
    #10 0x55ab2d19b8f1 in runloop_iterate /home/bobby/Documents/RetroArch/retroarch.c:39826
    #11 0x55ab2d0dba63 in rarch_main /home/bobby/Documents/RetroArch/retroarch.c:17563
    #12 0x55ab2d0dbb2a in main /home/bobby/Documents/RetroArch/retroarch.c:17640
    #13 0x7f2647e99b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
    #14 0x55ab2d066989 in _start (/home/bobby/Documents/RetroArch/retroarch+0x322f989)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/bobby/.config/retroarch/cores/melonds_libretro.so+0x7f07f) in GPU2D::GetOBJExtPal()
==1992==ABORTING

Sorry if this is too much or already known... 😅

Myaats commented 3 years ago

I have just pushed a fix for the various screen layout problems and I am going to look into getting the rest to work on Windows (I installed msys2 on my laptop earlier today).

bslenul commented 3 years ago

Nice! I just tested on Windows and can confirm that every layout issues I mentioned above are gone 👍

edit: Tested your latest commit with the core options on Windows, seems to work fine as well, here's a quick screenshot at 4x:

image

edit: Damn, resolution at 1x, GL renderer OFF with fast forward: ~80fps, GL renderer ON: ~210fps 😮

Awakened0 commented 3 years ago

Checking out a rebase build, in Joystick touch mode, pressing R3 causes the screen to go white and freeze instead of acting as a screen tap. Also, the deadzone seems a little too tight, since I get some drifting of the cursor with my Xbox One controller's right stick. I'd still like to have the cursor autohide after a few seconds of not moving, but I can always open a separate feature request for that if it's too much more work to add to this bounty.

I noticed the quick switch button now acts as a hold instead of toggle. It was a toggle in the master branch build I was using before. I kind of like it better as a hold, honestly. But it'd be nice to have an option to make it a toggle or hold some day.

Myaats commented 3 years ago

Checking out a rebase build, in Joystick touch mode, pressing R3 causes the screen to go white and freeze instead of acting as a screen tap.

Check the controls, R3 is lid close by default.

Also, the deadzone seems a little too tight, since I get some drifting of the cursor with my Xbox One controller's right stick.

The deadzone can be tweaked in the RA settings.

Otherwise, I think I am done adding new features for now. I have probably already spent 6+ hours on this.

Myaats commented 3 years ago

@Ryunam @bslenul

The first automated builds has been done on the new infrastructure: https://git.libretro.com/libretro/melonDS/-/jobs/385/artifacts/browse

Feel free to report issues, but I don't plan to do any new features before it get's moved to the master branch.

Awakened0 commented 3 years ago

Check the controls, R3 is lid close by default.

Whoops :X L3 taps the screen as expected.

The deadzone can be tweaked in the RA settings.

Nice, I forgot that was there. A deadzone of 0.1 is enough to fix the cursor drift for me.

Otherwise, I think I am done adding new features for now. I have probably already spent 6+ hours on this.

No problem, this is an awesome update! Thanks for all the work!

bslenul commented 3 years ago

The first automated builds has been done on the new infrastructure: https://git.libretro.com/libretro/melonDS/-/jobs/385/artifacts/browse

Seems really good, inputs are fixed in the few games I tested! 👍

However, Yoshi's Island now crashes on Windows too: crash-200908-232337.log :(

edit: Also noticed that it's still missing a core option compared to Linux: "Threaded software renderer".

bslenul commented 3 years ago

Hm weird, it doesn't crash if I build it myself on Windows (still crashes on my Linux VM) 🤔

edit: Chrono Trigger crashes too with the .dll from the link. edit2: New Super Mario Bros. and Sonic Rush too.

Ryunam commented 3 years ago

I finally had the time to test the newly-updated core on Windows 10 x64. I have to say I'm amazed at the speed this is being finetuned and reworked by @m4tsa. It's definitely an awesome development from all standpoints.

I used the attached .dll from https://git.libretro.com/libretro/melonDS/-/jobs/385/artifacts/browse and so far I noticed the following:

Awakened0 commented 3 years ago

If you set the video driver to "vulkan" and run content with the software renderer, it seems to boot fine but RA will render a transparent / black window and not display any picture (audio seems okay). I assumed that the GL renderer wouldn't work at all with the Vulkan video driver in RA, however other software cores perform well with it and in fact gain a little bit of a performance increase compared to the "gl" or "glcore" drivers, so it would be cool to have it working. No big deal if it's unfeasible.

Strange, I don't have that issue and I've only been using Vulkan. Using the same Gitlab build too. Win10, GTX 970 with 452.06 drivers:

Untitled

Ryunam commented 3 years ago

Hmm, I was using “glcore” as the default driver in RA and then set “vulkan” as the video driver for melonDS through a core override.

Perhaps the driver switching is what caused the problem I was referring to? I’ll have to experiment more with it tomorrow.

Awakened0 commented 3 years ago

Yeah, I'm using vulkan as my global driver in retroarch.cfg, so it could be a driver switching/override thing.

Tatsuya79 commented 3 years ago

We don't have HAVE_THREADS in the makefile for windows, is that on purpose? The software renderer should be twice faster with it.

(I can not test as it's not compiling on gcc 9.2 that I don't want to update yet)

Awakened0 commented 3 years ago

Found a few more games that crash before getting to gameplay: Jump Ultimate Stars, Mega Man ZX, Sonic Colors, Sonic Rush Adventure and Tetris DS.

Myaats commented 3 years ago

@Ryunam

The most critical thing I think is that a few games seem to crash after loading, as reported by @bslenul. A few I have tested personally were "Yoshi Touch and Go" and "New Super Mario Bros". Both titles crash right after displaying the Nintendo logo at startup.

I'll look into it but based on @bslenul's backtrace earlier his crash was from upstream code and not the core itself.

If you set the video driver to "vulkan" and run content with the software renderer, it seems to boot fine but RA will render a transparent / black window and not display any picture (audio seems okay). I assumed that the GL renderer wouldn't work at all with the Vulkan video driver in RA, however other software cores perform well with it and in fact gain a little bit of a performance increase compared to the "gl" or "glcore" drivers, so it would be cool to have it working. No big deal if it's unfeasible.

This is more of a frontend thing, all the core does is request a GL context from the frontend.

I noticed a pretty substantial FPS decrease that occurs specifically whenever you're saving your progress, presumably when the SRAM is being saved to disk. You can experience this drop quite clearly when you use Save Rooms in the Castlevania games (I tested both Dawn of Sorrow and Order of Ecclesia and they both exhibit this issue).

This has always been a bug in the core, it saves the entire SRAM to the sav file for every write done to the SRAM. If you can reproduce it with the standalone melonDS emulator it's an upstream issue I guess.

Finally, Runahead with Second Instance is unfortunately very flickery right now. I don't know if it's purely a performance thing and it could very well be that, but with my i7 4790K anytime I activate Second Instance the gameplay starts to stutter and there's a lot of black flickering going on. It's less than Single Instance, but it still flickers noticeably. Single Instance is very stuttery and flickers way more with lots of graphical glitches.

There is nothing that can be done on the core side to fix this atm.

I'd prefer the "Swap Screen" feature to be a toggle, rather than requiring the corresponding button to be held at all times. For example in a few games you might want to switch the visible screen for a little while and then go back to the other one only after completing a certain section, so a toggle is better-suited for this kind of situation.

I'll look into adding an option later and make it default to toggle.

@Tatsuya79

We don't have HAVE_THREADS in the makefile for windows, is that on purpose? The software renderer should be twice faster with it.

Good catch, it's just an oversight. I just added it and it works fine, will commit it later.

@Awakened0

Found a few more games that crash before getting to gameplay: Jump Ultimate Stars, Mega Man ZX, Sonic Colors, Sonic Rush Adventure and Tetris DS.

I'll look into it.

Tatsuya79 commented 3 years ago

Also melonds stand-alone default settings is jit block on 32, and both JIT optimizations are enabled.

bslenul commented 3 years ago

I'll look into it but based on @bslenul's backtrace earlier his crash was from upstream code and not the core itself.

But like I said on Windows it doesn't crash if I build it myself (if the other testers want to try: melonds_libretro_9a462bc.zip), which seems super weird. On my VM however no matter if I build it myself or use the .so from gitlab, it crashes :(

Anything I could do to try to understand why it works when I built it myself? Version of the compiler or dependencies or something else?

Myaats commented 3 years ago

I think most things should be fixed with the newest build.

bslenul commented 3 years ago

Yeah, no crash anymore 👍 On Windows at least, the same games still crash on my Linux VM.

edit: ah damn, I can't test standalone melonDS on my VM to see if it crashes too, "libslirp-dev" package isn't available for me, I'll update to Linux Mint 20 and try again I guess :p

bslenul commented 3 years ago

Ah, it works with your very last commit (59cfbc2)! 👍

bslenul commented 3 years ago

Threaded + GL renderer options enabled = crashes RA on close content/exit.

Not sure why but retroarch_debug.exe doesn't generate a crash log on Windows for this crash :/ I get that on Linux:

ASAN:DEADLYSIGNAL
=================================================================
[INFO] [Core Options]: Saved core options file to "/home/bobby/.config/retroarch/config/melonDS/melonDS.opt"
==1671==ERROR: AddressSanitizer: SEGV on unknown address 0x7f6100a0f3c7 (pc 0x7f6100a0f3c7 bp 0x000000000000 sp 0x7f61086b0d38 T4)
==1671==The signal is caused by a READ memory access.
ASAN:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

No idea if this is more helpful or not, but with gdb:

(gdb) bt full
#0  0x00007fffd12ac3cf in GPU3D::SoftRenderer::ClearBuffers() () from /home/bobby/.config/retroarch/cores/melonds_libretro.so
No symbol table info available.
#1  0x00007fffd12ac507 in GPU3D::SoftRenderer::RenderThreadFunc() () from /home/bobby/.config/retroarch/cores/melonds_libretro.so
No symbol table info available.
#2  0x00007fffd1255660 in thread_wrap () from /home/bobby/.config/retroarch/cores/melonds_libretro.so
No symbol table info available.
#3  0x00007ffff66ab6db in start_thread (arg=0x7fffd8f4e700) at pthread_create.c:463
        pd = 0x7fffd8f4e700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140736833316608, -6176136144449129640, 140736833314432, 0, 140737488296064, 140737488295888, 6176199872029963096, 6176150547866628952}, mask_was_saved = 0}}, 
          priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#4  0x00007ffff1221a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
Tatsuya79 commented 3 years ago

The threaded soft renderer is there now but it doesn't make a difference; it's as slow as with it disabled.

Ryunam commented 3 years ago

I'm testing the latest code (59cfbc297518a34cfae11015a8f32a348043ca96) with the build from the new Gitlab infrastructure. The games that used to crash at startup (at least, the ones I had tested) seem to work now. That's great!

I can also confirm that vulkan is actually working with the software renderer. My bad, it was probably some config options in my overrides that were causing that issue.

A few things I've noticed, in addition to all that's been discussed already:

Myaats commented 3 years ago

I didn't expect the Hybrid layout options to have such a visible performance penalty. I noticed also that the chosen screen ratio affects the intensity of the performance decrease, so a ratio of 3 seems to be quite more taxing than 2 (causing an average reduction of 30fps, although of course this depends on each game and individual scene). Can something be done at a core level to alleviate this drop?

I said in my first reply that the software hybrid mode would have noticeably performance penalty. This is due to having to go over through every pixel that has to be resized and duplicate it in a buffer which is even bigger than originally. The impact should be much less severe on the OpenGL renderer where it is done on the GPU.

Right now I'm not noticing any performance difference between the different JIT block options, even after restarting. The default of 32 seems absolutely identical to 1, 10 or even 100. Whether the JIT is enabled or not does make a difference, but the max block size does not really seem to have a noticeable effect. I wonder if that's normal and intended.

Depends on how many jit blocks the active game code needs, I have not looked much into it so I do not have any good answers.

As for the FPS drop when saving the SRAM, I found that the same thing has been reported upstream and seems to be due to melonDS not supporting asynchronous saves for now (see Arisotura#477). I'm wondering though if perhaps a fix can be applied specifically to the libretro core implementation by using .srm savefiles as other libretro cores do. Perhaps it's an entirely unrelated thing, but it doesn't hurt to ask.

I really don't want to change emulation code downstream, and I really don't have the time to debug this.

Also I have moved the code to master and backed the old one up in the old branch. So it should hit most platforms during the next buildbot cycle, hopefully.

Ryunam commented 3 years ago

I said in my first reply that the software hybrid mode would have noticeably performance penalty. This is due to having to go over through every pixel that has to be resized and duplicate it in a buffer which is even bigger than originally. The impact should be much less severe on the OpenGL renderer where it is done on the GPU.

I see, indeed the OpenGL renderer is much faster and the performance decrease with the Hybrid layout is not as strong compared to software.

As for the FPS drop when saving the SRAM, I found that the same thing has been reported upstream and seems to be due to melonDS not supporting asynchronous saves for now (see Arisotura#477). I'm wondering though if perhaps a fix can be applied specifically to the libretro core implementation by using .srm savefiles as other libretro cores do. Perhaps it's an entirely unrelated thing, but it doesn't hurt to ask.

I really don't want to change emulation code downstream, and I really don't have the time to debug this.

Yes, I understand not wanting to alter the emulation code, especially for the sake of maintainability and future updates. It's just an annoying issue unfortunately, that makes itself quite evident especially with games where either you are supposed to save often or the savefile is being updated constantly in the background. Your work is very much appreciated anyway, I'm grateful for the time you spent so far on this.

Also I have moved the code to master and backed the old one up in the old branch. So it should hit most platforms during the next buildbot cycle, hopefully.

That's great, nice to hear that it's going on the buildbot soon! I believe there are still a few issues with the Threaded Soft Renderer? I haven't tested it personally.

Myaats commented 3 years ago

I believe there are still a few issues with the Threaded Soft Renderer? I haven't tested it personally.

Worked fine when I tested it? It just seem to lack a noticeable performance increase. Which I guess can be due to overhead in the synchronization which is more noticeable the faster the machine is.

Ryunam commented 3 years ago

I am experiencing another issue which I didn't notice before: compared to what happens on Desmume there seems to be some touchscreen-related oddity (at least when controlling the touchscreen with the mouse) where clicking on the very farthest column of pixels to the right seems to be detected as if a horizontal line had been drawn from left to right.

I noticed this with Yoshi Touch & Go. See here:

Catch! Touch! Yoshi! (J)-200909-200141

The brown dot shows where the mouse was clicking / dragging (right-most pixel column). The output is an entire row of clouds from left to right. On Desmume when using the stylus / mouse on that area of the screen it just produces a cloud over that point (or a vertical line of clouds if you drag the mouse from top to bottom).

I believe there are still a few issues with the Threaded Soft Renderer? I haven't tested it personally.

Worked fine when I tested it? It just seem to lack a noticeable performance increase. Which I guess can be due to overhead in the synchronization which is more noticeable the faster the machine is.

I think @bslenul was still getting a crash with the Threaded Renderer + GL, hence why I brought it up.

Myaats commented 3 years ago

I have already fixed the threaded renderer crashing when closing the emulator when the OpenGL renderer is enabled. I have also fixed the touch overflow issue in the latest commit.

Ryunam commented 3 years ago

Thank you once again @m4tsa for working so patiently on this. It's great to see that the touchscreen overflow problem was fixed too.

I just want to bring up 3 minor things, then I believe we can consider this request fully addressed and I will be glad to confirm its completion and the corresponding bounty I had devoted to it.

  1. In standalone melonDS on Windows, under Video Settings -> OpenGL Renderer, there is an option called "Improved polygon splitting" that does not seem to be exposed currently in the libretro core.

Annotation 2020-09-10 121946

  1. Another option in standalone melonDS that does not show up at present among the libretro core options is "Fast Memory", under Emu Settings -> CPU emulation.

Annotation 2020-09-10 122017

  1. I have checked the "Maximum JIT block size" option again in standalone melonDS and 32 is actually both the default and the maximum value selectable. I think it would be best to limit that option to the same range, from 1 to 32.
Myaats commented 3 years ago

I have added the missing core options, but I did change the maximum JIT block size to 100 on purpose since it can be useful to increase even higher on some platforms.

ghost commented 3 years ago

does the core supports libslirp? couldnt get (online) wifi to work with one of my games.

Myaats commented 3 years ago

The libslirp blackend is specific to the qt frontend, feel free to open a dedicated issue.

Since the rebase has happened and is on the master branch and all the original issue is solved I am closing this.

Feel free to create new issues for suggestions or issues found.