Closed natinusala closed 5 years ago
Sure but is it worth a 2115$ bounty? ;-)
@twinaphex can be the judge of that :). Even though it's a lazy port for you, your are one of the few people that can do it with little effort.
not my money not my decision :-) I think every baker should give his opinion!
Few weeks to xmas. Someone gotta want that cash l9p
BTW this doesn't mean we cannot have the lazy port at first. I just think that the bounty should only be rewarded to an arm64 dynarec optimized for the arm64 platform
Well if your willing to do the lazy port first and take a crack at the new Dynarec too, that's great!
@Gillou68310 I agree with you, lazy port is not enough.
@Gillou68310 Well, if you want to do it in a more optimized way, do you think it would be realistic to expect to be done with it before Christmas?
I don't think we need to reinvent the wheel here, but I am in agreement that just a plain lazy 32bit ARM port to 64bit can probably be done far more efficiently. Plus I think users are getting a bit restless (especially the ones on Switch) and they want results sooner rather than later.
So just let us know I guess. We are fairly flexible in terms of what we expect in terms of this bounty being considered finished.
On second thoughts, I will leave this in @natinusala and @m4xw's hands, and will take a hands off approach on this. So whatever they say in specific is what will go.
I still think it would be fine to backport the current 'lazy' ARM64 dynarec that works to parallel n64 and mupen64plus libretro, then we just leave this bounty up for whenever the new, improved and faster ARM64 dynarec is improved. But I will leave @m4xw, @natinusala and @Gillou68310 to decide this.
Clearly in favor of getting something working until christmas.
As far as I understand the FAQ of bountysource, the bounty can be paid to several people, if both worked on the solution. Has this to be a 50:50 split, or can this be chosen?
For example 10% for the developer of the 'lazy' solution and 90% for the 'fast' solution?
After speaking with the Team, I will implement your lazy approach, this should buy you loads of time to work more thoroughly @Gillou68310, but a lazy port is not the goal of this bounty. If you happen to not want to work on that, I think this bounty can be partially awarded
This looks like a good plan! We can set a first goal for Christmas for the lazy implementation, I can definitely assist you to reach this goal. The next goal will be the optimized implementation. Just to be 100% transparent it's hard to estimate what will be the performance gain from the optimized version until it's finished. I just want to warn everybody that the optimized version might not give a huge performance boost. But it's definitely worth the try!
@Gillou68310 You know how it is, that's what I wanna hear!
@Gillou68310 I know gcc likes to do 2x 32bit operations instead of 1x 64bit too, so I expect them to be somewhat neglectable (of course we will only know after we try), but having it "poperly" done, even if it doesn't give any benefit, is definitely the scope of this bounty. In the end, if we get a working solution acceptable implemented (even if it turns out the "lazy" approach was better), this would get awarded IMO. But we absolutely gonna try this^^
Totally agree!
Out of curiosity, would the "lazy" implementation be enough for full speed N64 emulation within the Switch OS?
As a switch owner I hope so ;-) Mario64 should be, games like Goldeneye, banjo-tooie and Conker bad fur day definitely won't :-(
@Gillou68310 we got many games running at fullspeed with 1.7GHz overclock. So I actually think with dynarec and OC, we can maaaaaaaybe even run these But with the dynarec, OC shouldn't be needed for many titles.
@m4xw Is 1.7GHz OC on the Switch safe/stable, or does that make it reach ungodly temps? (Also, do we even have temp readings available on Switch?) I think the main CPU cluster usually runs at about 1GHz so that seems rather high.
Should be perfectly safe, see X1 SoC clocks, it's underclocked by default on switch to conserve power.
@elmagio This is offtopic. ccplex is designed for 2GHz anyway, it doesn't even run warm.
Note, on battery isn't going to last long obv
@dmiller423 The increased battery consumption is actually quite low. There is more to gain from lower clocks than there is to lose from higher clocks, from our tests.
Strange they would have kept it clocked so low if it didn't affect the power consumption radically, good to know though.
Is @firerooks still working on it as well?
Issue is its been a while now. And we havent seen anything other then some comments. Theres no reason somene else cant step up.
@Gillou68310 Is there a way to contact you? I'd prefer if you could join my Discord so we discuss some things in detail, but IRC would be fine too, so is email. https://discord.gg/GfEqpMa
Just joined your discord ;-)
Quick update, we got it to run on rockpro and lakka-switch. Horizon will need more work, as we need to work between 2 address spaces (RX, RW) or kernel patches. Performs way better than expected, but there is def. room for improvements! I am excited!
You can get away with one address space on the switch
@dmiller423 RWX permissions are not allowed by the kernel
Yes, it's W^X exclusive, you have to change back and forth as with mprotect, and unfortunately it's a bit more than that && read: i have not checked how it affects performance , but it does work
@dmiller423 I considered going back and forth, but I think just using libnx's JIT function will be the cleaner way. Also our "mprotect" is currently only a stub, but I know what u mean.
@m4xw yes the jit has to be switched between anyhow iirc, and if you look at JIT code it's rather simple. Anyhow just thought i'd point it out.
Also part of "not my beer to maintain for future firmware versions"
lol, I can't blame you there, but it only uses service calls so I don't think it's likely to change too much.
Here was some quick test code if anyone is interested in how it's done. Disclaimer: It wasn't meant to be pretty/public, only to see if vm would change the pages prot.
https://gist.github.com/dmiller423/1bac2ba090402417cfedda9122da25a9
Great job guys. Its nice to see some real progress here. Place has been dead for a long time. Now i feel like we may actually get something. Its good stuff.
@dmiller423 it was actually changed in the past. It currently uses a fallback way for >4x IIRC. When i tried it, I didn't map the code memory tho, only svcSetProcessMemoryPermission (would always fail) I might try that anyway, ty
Any news on this?
I don't want to speak for someone else, but will say: you can't expect miracles in a few short weeks when people work on these things in their spare time. Swapping to a64 on the surface might sound easy, but with arm dropping conditionals for the 64b ISA it changes the whole codebase. Optimizing the generated instruction cache when you have prob 10x as many branches will need more serious regalloc and block linking.
Yeah I'm currently spending all my free time on this. I will push my work as soon as it's in a usable state. I'll keep in touch!
This has been merged for now https://github.com/libretro/mupen64plus-libretro/pull/83 Currently works for Android and *nix.
Update on Horizon support: Got the perm issue solved (in 2 ways actually), but seems there are some other Issues that keep it from working under Horizon right now. Will need to set up a debugger for that or try some stuff with a custom ldr sysmodule to pinpoint the Issue.
Too bad, no Xmas for NSwitch user but linux and android users will be delightful.
i did a quick test with super mario 64 on my phone and it runs ~10fps faster than with 32bit buiilds... great to see it before christmas :) many thanks to all involved
I've made a quick Test with aarch64 dynarec from mupen64plus source and it got better speed on 1.8ghz cortex a53 device but the compatibility is unknow, i'll try to build for my Android phone, mupen64 runs really fast on my phone but it has some frameskip that parallel dont
Built for Android, it is really fast, as least 1,5x faster than cached interpreter(super Mario 64), maybe it can be faster with optimization, about the compatibility Someone have to test.
@m4xw are you working on that for the parallel-n64 core too or was it planned only for mupen64 plus?
@xTMODx My current focus is on mupen, it's a easy backport tho.
Working on getting a build of this going on a Raspberry Pi 3 (model B+) in ARM64 mode
I'll throw up a quick-and-dirty SD card image for those who want to test it themselves if desired
Writing an ARM64 DYNAREC (JIT compiler) would allow games to run at full speed on the growing number of ARM64 platforms :
Link to the associated bounty : https://www.bountysource.com/issues/63766562-write-an-arm64-dynarec