libretro / parallel-n64

Optimized/rewritten Nintendo 64 emulator made specifically for Libretro. Originally based on Mupen64 Plus.
319 stars 128 forks source link

[BOUNTY] Write an ARM64 DYNAREC #538

Closed natinusala closed 5 years ago

natinusala commented 6 years ago

Bountysource

Writing an ARM64 DYNAREC (JIT compiler) would allow games to run at full speed on the growing number of ARM64 platforms :

Link to the associated bounty : https://www.bountysource.com/issues/63766562-write-an-arm64-dynarec

Gillou68310 commented 5 years ago

Sure but is it worth a 2115$ bounty? ;-)

fzurita commented 5 years ago

@twinaphex can be the judge of that :). Even though it's a lazy port for you, your are one of the few people that can do it with little effort.

Gillou68310 commented 5 years ago

not my money not my decision :-) I think every baker should give his opinion!

warlockv2 commented 5 years ago

Few weeks to xmas. Someone gotta want that cash l9p

Gillou68310 commented 5 years ago

BTW this doesn't mean we cannot have the lazy port at first. I just think that the bounty should only be rewarded to an arm64 dynarec optimized for the arm64 platform

Ploggy commented 5 years ago

Well if your willing to do the lazy port first and take a crack at the new Dynarec too, that's great!

m4xw commented 5 years ago

@Gillou68310 I agree with you, lazy port is not enough.

inactive123 commented 5 years ago

@Gillou68310 Well, if you want to do it in a more optimized way, do you think it would be realistic to expect to be done with it before Christmas?

I don't think we need to reinvent the wheel here, but I am in agreement that just a plain lazy 32bit ARM port to 64bit can probably be done far more efficiently. Plus I think users are getting a bit restless (especially the ones on Switch) and they want results sooner rather than later.

So just let us know I guess. We are fairly flexible in terms of what we expect in terms of this bounty being considered finished.

inactive123 commented 5 years ago

On second thoughts, I will leave this in @natinusala and @m4xw's hands, and will take a hands off approach on this. So whatever they say in specific is what will go.

I still think it would be fine to backport the current 'lazy' ARM64 dynarec that works to parallel n64 and mupen64plus libretro, then we just leave this bounty up for whenever the new, improved and faster ARM64 dynarec is improved. But I will leave @m4xw, @natinusala and @Gillou68310 to decide this.

Affinator commented 5 years ago

Clearly in favor of getting something working until christmas.

As far as I understand the FAQ of bountysource, the bounty can be paid to several people, if both worked on the solution. Has this to be a 50:50 split, or can this be chosen?

For example 10% for the developer of the 'lazy' solution and 90% for the 'fast' solution?

m4xw commented 5 years ago

After speaking with the Team, I will implement your lazy approach, this should buy you loads of time to work more thoroughly @Gillou68310, but a lazy port is not the goal of this bounty. If you happen to not want to work on that, I think this bounty can be partially awarded

Gillou68310 commented 5 years ago

This looks like a good plan! We can set a first goal for Christmas for the lazy implementation, I can definitely assist you to reach this goal. The next goal will be the optimized implementation. Just to be 100% transparent it's hard to estimate what will be the performance gain from the optimized version until it's finished. I just want to warn everybody that the optimized version might not give a huge performance boost. But it's definitely worth the try!

m4xw commented 5 years ago

@Gillou68310 You know how it is, that's what I wanna hear!

m4xw commented 5 years ago

@Gillou68310 I know gcc likes to do 2x 32bit operations instead of 1x 64bit too, so I expect them to be somewhat neglectable (of course we will only know after we try), but having it "poperly" done, even if it doesn't give any benefit, is definitely the scope of this bounty. In the end, if we get a working solution acceptable implemented (even if it turns out the "lazy" approach was better), this would get awarded IMO. But we absolutely gonna try this^^

Gillou68310 commented 5 years ago

Totally agree!

elmagio commented 5 years ago

Out of curiosity, would the "lazy" implementation be enough for full speed N64 emulation within the Switch OS?

Gillou68310 commented 5 years ago

As a switch owner I hope so ;-) Mario64 should be, games like Goldeneye, banjo-tooie and Conker bad fur day definitely won't :-(

m4xw commented 5 years ago

@Gillou68310 we got many games running at fullspeed with 1.7GHz overclock. So I actually think with dynarec and OC, we can maaaaaaaybe even run these But with the dynarec, OC shouldn't be needed for many titles.

elmagio commented 5 years ago

@m4xw Is 1.7GHz OC on the Switch safe/stable, or does that make it reach ungodly temps? (Also, do we even have temp readings available on Switch?) I think the main CPU cluster usually runs at about 1GHz so that seems rather high.

dmiller423 commented 5 years ago

Should be perfectly safe, see X1 SoC clocks, it's underclocked by default on switch to conserve power.

m4xw commented 5 years ago

@elmagio This is offtopic. ccplex is designed for 2GHz anyway, it doesn't even run warm.

dmiller423 commented 5 years ago

Note, on battery isn't going to last long obv

m4xw commented 5 years ago

@dmiller423 The increased battery consumption is actually quite low. There is more to gain from lower clocks than there is to lose from higher clocks, from our tests.

dmiller423 commented 5 years ago

Strange they would have kept it clocked so low if it didn't affect the power consumption radically, good to know though.

tabnk commented 5 years ago

Is @firerooks still working on it as well?

warlockv2 commented 5 years ago

Issue is its been a while now. And we havent seen anything other then some comments. Theres no reason somene else cant step up.

m4xw commented 5 years ago

@Gillou68310 Is there a way to contact you? I'd prefer if you could join my Discord so we discuss some things in detail, but IRC would be fine too, so is email. https://discord.gg/GfEqpMa

Gillou68310 commented 5 years ago

Just joined your discord ;-)

m4xw commented 5 years ago

Quick update, we got it to run on rockpro and lakka-switch. Horizon will need more work, as we need to work between 2 address spaces (RX, RW) or kernel patches. Performs way better than expected, but there is def. room for improvements! I am excited!

dmiller423 commented 5 years ago

You can get away with one address space on the switch

m4xw commented 5 years ago

@dmiller423 RWX permissions are not allowed by the kernel

dmiller423 commented 5 years ago

Yes, it's W^X exclusive, you have to change back and forth as with mprotect, and unfortunately it's a bit more than that && read: i have not checked how it affects performance , but it does work

m4xw commented 5 years ago

@dmiller423 I considered going back and forth, but I think just using libnx's JIT function will be the cleaner way. Also our "mprotect" is currently only a stub, but I know what u mean.

dmiller423 commented 5 years ago

@m4xw yes the jit has to be switched between anyhow iirc, and if you look at JIT code it's rather simple. Anyhow just thought i'd point it out.

m4xw commented 5 years ago

Also part of "not my beer to maintain for future firmware versions"

dmiller423 commented 5 years ago

lol, I can't blame you there, but it only uses service calls so I don't think it's likely to change too much.

Here was some quick test code if anyone is interested in how it's done. Disclaimer: It wasn't meant to be pretty/public, only to see if vm would change the pages prot.

https://gist.github.com/dmiller423/1bac2ba090402417cfedda9122da25a9

warlockv2 commented 5 years ago

Great job guys. Its nice to see some real progress here. Place has been dead for a long time. Now i feel like we may actually get something. Its good stuff.

m4xw commented 5 years ago

@dmiller423 it was actually changed in the past. It currently uses a fallback way for >4x IIRC. When i tried it, I didn't map the code memory tho, only svcSetProcessMemoryPermission (would always fail) I might try that anyway, ty

warlockv2 commented 5 years ago

Any news on this?

dmiller423 commented 5 years ago

I don't want to speak for someone else, but will say: you can't expect miracles in a few short weeks when people work on these things in their spare time. Swapping to a64 on the surface might sound easy, but with arm dropping conditionals for the 64b ISA it changes the whole codebase. Optimizing the generated instruction cache when you have prob 10x as many branches will need more serious regalloc and block linking.

Gillou68310 commented 5 years ago

Yeah I'm currently spending all my free time on this. I will push my work as soon as it's in a usable state. I'll keep in touch!

m4xw commented 5 years ago

This has been merged for now https://github.com/libretro/mupen64plus-libretro/pull/83 Currently works for Android and *nix.

Update on Horizon support: Got the perm issue solved (in 2 ways actually), but seems there are some other Issues that keep it from working under Horizon right now. Will need to set up a debugger for that or try some stuff with a custom ldr sysmodule to pinpoint the Issue.

tabnk commented 5 years ago

Too bad, no Xmas for NSwitch user but linux and android users will be delightful.

xTMODx commented 5 years ago

i did a quick test with super mario 64 on my phone and it runs ~10fps faster than with 32bit buiilds... great to see it before christmas :) many thanks to all involved

misson20000 commented 5 years ago

@m4xw let me know if you need any debugging help. I wrote some weird tools that might be helpful.

AndreoBotelho commented 5 years ago

I've made a quick Test with aarch64 dynarec from mupen64plus source and it got better speed on 1.8ghz cortex a53 device but the compatibility is unknow, i'll try to build for my Android phone, mupen64 runs really fast on my phone but it has some frameskip that parallel dont

AndreoBotelho commented 5 years ago

Built for Android, it is really fast, as least 1,5x faster than cached interpreter(super Mario 64), maybe it can be faster with optimization, about the compatibility Someone have to test.

xTMODx commented 5 years ago

@m4xw are you working on that for the parallel-n64 core too or was it planned only for mupen64 plus?

m4xw commented 5 years ago

@xTMODx My current focus is on mupen, it's a easy backport tho.

IntelMiner commented 5 years ago

Working on getting a build of this going on a Raspberry Pi 3 (model B+) in ARM64 mode

I'll throw up a quick-and-dirty SD card image for those who want to test it themselves if desired