Closed irixxxx closed 3 months ago
Just wanted to let you know that your version runs somewhat decently on the RG-350 (and GCW0) : 30~35 FPS for Afterburner, 40-45 FPS for Virtua Racing. Perhaps you could consider making use of the extra instructions of mips32r2 ? Seeing how Afterburner performs on the stronger CPU (JZ4770), i don't think performance will improve much on the JZ4760B.
It's also working pretty well on the Retrostone (fullspeed in fact), but that's no surprise. I've released a build for the GCW0/RG-350 based on your changes here https://gameblabla.nl/files/ipk/gcw0/picodrive_gcw0.opk
EDIT : Very strange. Switched to using Triple buffering instead and now i'm getting 55 FPS minimum instead. Probably a side effect of double effect when there are small frame drops. Either way, its very smooth now on the RG-350.
@gameblabla thanks, that sounds good, though I too don't have an idea why tripe buffering is helping.
@notaz I had a look at PIC code for accessing data. I now have some macros based on what gcc does for PIC and non-PIC code, but the only solution to access the data segment without causing textrel is apparently through the GOT. That's really a lot of overhead - instead of 1 insn and 1 pool slot it's using an additional register, 4 insns and 2 pool slots in the PIC case :-( If you are interested I can mail you the macros.
Moreover, I had a look at libretro/picodrive, but I found nothing to address this in drz80 and ym2616_arm. Do you have an idea how that was ultimately solved?
I also had a closer look at 78d817c. Most of the m68k accesses in the arm case go through the cyclone interface (which strangely is pretty much the same code anyway?), so the generic code is unused for this. The generic m68k/s68k interface is only used in the 32x bank mapping case through bank_map_handler. Do you know how much this is really used? If it's mostly unused, it's ineffective and unneeded. I reckon it's a relic from older days...
@gameblabla I have pushed basic support for mips32/64r2. It mainly makes use of EXT/INS, ROTR, and SEB/SEH - not sure if more could be sensibly used, maybe MADDU. The size reduction isn't really impressive, though, I have measured around 1%.
Well besides PIC/GOT there is also iOS which does things yet another way. It would be best to pass things from C code somehow.
Moreover, I had a look at libretro/picodrive, but I found nothing to address this in drz80 and ym2616_arm. Do you have an idea how that was ultimately solved?
I believe they build using jni/Android.mk which has use_drz80 and asm_ym2612 set to 0. asm_memory is also 0 so your change probably doesn't hurt after all, but I'd still like to have this checked. I don't want to screw up libretro because it's probably the biggest PicoDrive user (by amount of people using it).
Do you know how much this is really used? If it's mostly unused, it's ineffective and unneeded. I reckon it's a relic from older days...
After that much time I doubt I know more than you, sorry...
@notaz I have commited a patch to resolve the textrel issue. I can build a generic arm executable with both -fPIC and -fPIE, and both results return nothing for "readelf -a PicoDrive | grep -i textrel", so it should be OK. The -fPIE build works with qemu-arm, so I assume it's basically ok.
A libretro raspberry pi build (which has the asm parts enabled) also shows no textrels with "scanelf -Tq picodrive_libretro.so" , but I haven't checked if it runs, though (no accessible hardware).
@irixxxx sorry to ask this here. Could this dynarec be ported to fbneo for cps3? Would you be interested?
Well, basically it should be possible. The historical license is the mame license, please check if that is ok with your project. The only other author I know of is notaz, and I wouldn't think any further about this whithout his consent. AFAICS the current code would need some cleanup to untangle picodrive references, and some code would be needed for acquiring and managing the code buffer, for low-level memory access, for the cpu interface, and initialisation stuff. I'll have a look at the drc code and at fbneo (after xmas :-]) to make an estimate of the effort and let you know if I would be able to handle this. Meanwhile, please contact me at my mail address (see readme.md).
I took a look at this today, applied your cyclone_gp2x.patch to Cyclone. Could you perhaps do a git rebase without cyclone_gp2x.patch and README.md, which doesn't quite suit the upstream?
Also I couldn't test drive it, getting a segfault trying to run standalone x86_64 build:
00003:134: drc_cmn_init: 0x555555a39000, 4194304 bytes: 0
DRC registers created, 15 host regs (3 REG, 2 STATIC, 1 CTX)
Thread 1 "PicoDrive" received signal SIGSEGV, Segmentation fault.
sh2_reset (sh2=0x559fc540, sh2@entry=0x5555559fc540 <sh2s>) at cpu/sh2/sh2.c:45
45 sh2->pc = p32x_sh2_read32(0, sh2);
(gdb) bt
#0 sh2_reset (sh2=0x559fc540, sh2@entry=0x5555559fc540 <sh2s>) at cpu/sh2/sh2.c:45
#1 0x00005555555a6620 in p32x_reset_sh2s () at pico/32x/32x.c:131
#2 0x00005555555ac497 in p32x_reg_write8 (d=3, a=1) at pico/32x/memory.c:413
#3 PicoWrite8_32x_on (a=10572033, d=3) at pico/32x/memory.c:1065
#4 0x000055555560b0f6 in fm68k_emulate ()
#5 0x00005555555a70b7 in SekSyncM68k () at pico/32x/../pico_cmn.c:41
#6 SekRunM68k (cyc=<optimized out>) at pico/32x/../pico_cmn.c:59
#7 PicoFrameHints () at pico/32x/../pico_cmn.c:173
#8 PicoFrame32x () at pico/32x/32x.c:582
#9 0x000055555557b988 in PicoFrame () at pico/pico.c:307
#10 0x00005555555697e5 in emu_loop () at platform/common/emu.c:1492
#11 0x00005555555629c6 in main (argc=<optimized out>, argv=0x7fffffffdfa8) at platform/common/main.c:139
(gdb) p/x sh2
$1 = 0x559fc540
(gdb) p/x *sh2
Cannot access memory at address 0x559fc540
I've just cross checked... a clean checkout from my repository and a "./configure; make clean all; ./PicoDrive" produces a working binary for me. What exactly was your procedure to produce this error?
edit: IMO the damaged pointer can only come from a 32bit store/save of the register it lives in... DRC_SAVE/RESTORE_SR does such things, but there are prerequisites: in sh2_reset the sh2 pointer is in the same register as the DRC places sh2 SR in (ebx on x86/64), and (sh2->state&SH2_STATE_RUN) must be set. I took some care the latter can't happen, but in case I overlooked something there, would you please have a look a sh2->state? It should be 0...
Re rebasing... since I know that history rewriting can wreak havoc with remote repos from my own experience (and not having done this via github up to now :-/)... what exactly do you want to be rebased how (if possible without loosing the boilerplate for my own repo)?
I've just cross checked... a clean checkout from my repository and a "./configure; make clean all; ./PicoDrive" produces a working binary for me. What exactly was your procedure to produce this error?
git clone https://github.com/irixxxx/picodrive.git
cd picodrive/
git submodule update --init
./configure && make clean all
./PicoDrive Knuckles\'\ Chaotix\ \(32X\)\ \(E\)\ \[\!\].32x
...
Segmentation fault
I've also tried it on another machine (both are Ubuntu 18.04.3 LTS) with the same result. The sh2_reset() disasm shows that gcc has the sh2 pointer placed in rbx and stepping through shows that after p32x_sh2_read32() call rbx gets truncated. sh2->state is 0 before the read32() call.
Re rebasing... since I know that history rewriting can wreak havoc with remote repos from my own experience (and not having done this via github up to now :-/)... what exactly do you want to be rebased how (if possible without loosing the boilerplate for my own repo)?
You can do this in another branch. Something like:
git remote add notaz https://github.com/notaz/picodrive.git # if you don't have that already
git fetch notaz
git checkout -b for_upstream
git rebase -i notaz/master
git push origin for_upstream
then you can make a new PR. "git rebase -i" will allow you to edit or drop the commits with the readme and cyclone_gp2x.patch . Later you can merge back from my repo with hopefully not too many conflicts.
Thanks for your evaluation work... I still can't reproduce it here (ubuntu 16 LTS in a VM - IIRC the last LTS one with arm-linux-gnueabi-gcc 4.7 :-|), but I disassembled the relevant code for this and I think I have discovered the root cause for this. It's the IMO somewhat dubious "-ftree-loop-if-convert" enabled automatically with "-O3". It changes "if (c) s=t" to "s = c ? t:s", to enable the use of conditional insns, and AFAIK you can't even suppress this by adding volatile. For DRC_RESTORE_SR this means there's always an assignment to sh2_sr, and the type is badly chosen. It's int... 32 bit on x86-64.
If leaving this in at all (pity if not, it's a measurable speed improvement), it should at least be changed to intptr_t in cpu/sh2/compiler.h:56 (and possibly "rbx" in cpu/sh2/compiler.c:49). If you could possibly spare the time please try this to verify the theory, while I'll go pondering some how the (slightly awkward) design could be improved to reliably avoid this.
edit: no wonder I can't see the problem - all my pointers fit in 32 bits.
Btw, i wanted to report back a bug in Knucles Chaotix with your changes : When you start a new game and move all the way left, Picodrive instantly crashes. I'm not sure however if this happens on the stable picodrive release or not however but if you could try it yourself it would be nice.
@gameblabla I forgot to add mips isa r2 support to the delay slot handling. It's fixed, the last commit should fix your problem. @notaz this commit also includes a fix for your crash. I'm still not really satisfied as this feature is only working with gcc.
It would be great if you would not mash different topic changes into single commits like you do. I would still like to go through you changes someday, and mix-mash changes all over the place make it a miserable affair.
You can use "git add -p" to stage chunks and commit them separately after doing large sprints of work.
It would be great if you would not mash different topic changes into single commits like you do. I would still like to go through you changes someday, and mix-mash changes all over the place make it a miserable affair.
Sry... I'm cherrypicking the stuff from my private repo. Since I'm working on more than one task at a time (mostly stuff I'm stumbling upon while working on another task), stuff there is rather mixed. I'm usually just committing everything when closing shop rather than taking the time to git add -p. I'm taking stuff back to the github repo when I'm reaching a point where the merge work seems to be low. I promise to try to clean stuff up when pulling it from my repo.
It appears that as of latest commit of your PR that a lot of stuff and games are broken. (Several sound issues with the YM chip in several Sonic games and Wani Wani World has some graphical glitches) Still WIP i guess ?
It appears that as of latest commit of your PR that a lot of stuff and games are broken. (Several sound issues with the YM chip in several Sonic games and Wani Wani World has some graphical glitches) Still WIP i guess ?
Yes, it is. Moreover, I'm currently not completely convinced I can really pull this through. The FIFO stuff causes all kinds of funny timing issues showing up, mostly due to the fact that a lot of the timing stuff in picodrive is only a cleverly crafted approximation, which would require a lot of adjusting. And I'm hitting hard limits of time I can spend on testing this :-|. I've pushed another fifo revision which solves some of the issues I ran into. If you want to support me some, please try it and tell me where exactly you are hitting problems.
Edit: Before I forget this, the sprite rendering currently has some problems... I'm on it.
@gameblabla I pushed a fix for the sprite rendering problem. It should fix Wani Wani. Are you still experiencing sound problems in the Sonic games? If so, where exactly? Edit: I just discovered it also fixes the rain scene in the overdrive 2 demo ;-)
"Sonic 1's title music is missing a note, Sonic 3's title is chugging noticeably with the music, and Sorcery Saga I (Mega Drive)'s music is playing at half the speed despite the game itself (seemingly) performing as well as it would on Genesis Plus GX on the same system." I updated to your newer commits and as i suspected, it still does not fix them. Testing was done on the RG-350. Had a guinea pig try it for myself, i have yet to try it myself.
Thank you very much... Preliminary investigation suggests commit 43e1401 is the culprit. Moreover, I observed in sonic2 the title music also has some "arrhythmia".
On Tue, 25 Feb 2020, 21:10 gameblabla, notifications@github.com wrote:
"Sonic 1's title music is missing a note, Sonic 3's title is chugging noticeably with the music, and Sorcery Saga I (Mega Drive)'s music is playing at half the speed despite the game itself (seemingly) performing as well as it would on Genesis Plus GX on the same system." I updated to your newer commits and as i suspected, it still does not fix them. Testing was done on the RG-350. Had a guinea pig try it for myself, i have yet to try it myself.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4W73HQPHPNHEHQZAKTREV3MZA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM5KHBI#issuecomment-591045509, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHR2L4XFDPP2LU3Q5HHX3BLREV3MZANCNFSM4IH5H3EQ .
@gameblabla I think the last commits should fix most of the problems you and others have observed. It now passes about 90% of Nemesis VDP test rom, as well as his sprite masking test (but for one single red line). I have also tested against a number of roms (title, demo and some 1 or 2 level play) and can't see any more problems ATM - which, considering the number of available titles, absolutely doesn't mean there aren't any :-|. Feedback is appreciated, especially for stuff working in 1.93 but not working with HEAD...
@gameblabla after all that bugfixing the last version appears to be rather stable but for some minor leftover problems, though I'm not completely satisfied with the performance. You might give this another try.
Btw forgot to report back but your fixes did fix the issues i had so thanks. Unfortunately it seems that this PR https://github.com/notaz/picodrive/pull/106 conflicts with your PR.
Unfortunately it seems that this PR #106 conflicts with your PR.
I know... I already addressed this same bug (and BTW several more I found in the rendering code) some time ago, albeit in a different way (also fixed it in the ARM asm stuff, which PR #106 doesn't). It can run most of the overdrive demos with minor glitches and largely passes some of the test roms I've found around. DMA timing is still somehow off, though - I think it's not getting better without improving the accuracy of the internal timings of picodrive, which is ATM too much work (besides I'm not sure if it would really improve things - it may be an exercise in futility).
Not sure how to deal with it though. Is this ever going to be merged somehow? If so, who should do something about it? ATM I have the feeling it has diverged so much that merging may be very difficult. I should probably declare it as a standalone project and close this PR...
Not sure how to deal with it though. Is this ever going to be merged somehow? If so, who should do something about it? ATM I have the feeling it has diverged so much that merging may be very difficult. I should probably declare it as a standalone project and close this PR...
In my opinion this should be merged but notaz isn't very active these active. It would be a shame though if this becomes stand alone because nobody would want to have dozens of forks. @notaz Wouldn't it be possible that you revert the PR #106 so this can be merged ?
Well this PR still has README.md that is not suitable here, and I don't like some of those huge commits changing things all over the place seemingly at random. Ideally this would be split in smaller cleaned up PRs with separate topics like the new recompiler, sound changes, VDP, timing, etc. I doubt anyone will want to do such work so I could try doing it myself, but it will likely take forever.
Perhaps a more feasible option would be to deprecate this repo and declare @irixxxx one as "real" mainline PicoDrive. @irixxxx do you want to maintain PicoDrive indefinitely?
Sorry for the late reply. I needed some time to think about this. I think we should discuss this in a more private channel. Would you please contact me at my email address derkub@gmail.com ?
@gameblabla the last commits fix some performance problems I have observed on my ARM9 devices. This may or may not improve performance on MIPS devices as well. I understand there are still minor performance problems with some 32x games; I would appreciate a test run.
By and large I'm now through with my list. The last major thing is the somewhat less than perfect performance of interrupt handling, which is especially slowing down 32x games with PWM sound.
@irixxxx I have tested your branch (this patch does not apply on this repo :( ) using the libretro core on an Amlogic s905x3 on aarch64 and while it does boot half of the graphics are missing, on After burner for example, I can only see the gui and not the actual game. Is this a know issue? or is libretro not included in this PR? maybe I am compiling it wrong, cross-compiling with configure generic (as noted on the readme) as I didn't find a way to configure for aarch64 directly.
@shantigilbert I have crosschecked with After Burner on my current development branch using qemu-aarch64 and a debian userland. That seems to work. I'm going to crosscheck this on an older S905 running ubuntu ASAP. It may be a side effect of the newer cores in the S905X3, or possibly a caching effect. Unfortunately I don't have yet any hardware with A55 cores to crosscheck that. Meanwhile, I've rebased those parts from my private development branch which IMO have influence on the generated DRC code. You might want to check if this works better for you.
@irixxxx thank you, unfortunately it has the same issue, here is a small video that shows the issue.
I am cross-compiling like this
./configure --platform=generic make -f Makefile.libretro
Which exact version of AB are you using? How about other games, do they show issues as well? And lastly, would you please compile picodrive on x86 (Linux please) and test your AB version with that?
On Thu, 7 May 2020, 02:30 shantigilbert, notifications@github.com wrote:
@irixxxx https://github.com/irixxxx thank you, unfortunately it has the same issue, here is a small video that shows the issue.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100#issuecomment-624961045, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHR2L4TYQ4ORKJ66OGTZXO3RQH6JZANCNFSM4IH5H3EQ .
The I am not sure what exact version I am using but its a zip file with the following info
md5: f663edc7d3933877bbef92a374e73a1b After Burner (Japan, USA).zip
None of the games I've tried work almost same issue, not even the SEGA logo appears only sound, and some items do display.
I do not have a linux machine to test ATM, but if I compile the standalone version (not libretro) for aarch64 I can see that it kinda works, but it does not detect my resolution correctly so it displays weirdly (this is probably an SDL1 issue other SDL1 apps behave the same) which is why I want to use the Libretro core.
@shantigilbert I think I found the reason for your problem. Would you please pull and try again.
@irixxxx it works perfectly now for sega32x! :) On my device I used to get 50-53 fps, now its 60fps solid :+1: Thanks!
@irixxxx I'm trying to port this sh2 drc to old fbalpha / fbneo emulators in order to have full speed emulation in CPS3. Problem is that when building the libretro core for windows, it uses readelf when the objects are in pe format. any suggestion in order to change readelf to objdump or similar?
@frangarcj I'll have a look, but you might want to wait maybe 2 weeks with this. I'm currently working on extracting all drc-related stuff from picodrive, and remove picodrive-specific stuff from it. I'm planning to evaluate other 32 bit guest processors (no floating point and MMU though - MIPS R30x1 would probably be interesting since it's quite simple and could be tested in a playstation emu).
Hi there @irixxxx , could you perhaps send a PR to the libretro repo for this? If there are no perceivable regressions it could serve as a good testing ground for the dynarec until it is finally merged into upstream.
@twinaphex I presume you reference to libretro/picodrive. Before I create the PR, is there some preparation needed I should do beforehand from your point of view? Otherwise I could do so on short notice.
Yeah, I'm referring to the picodrive libretro repo.
I don't think there will be any major deviations. If there is though let us know about it in case it causes issues with your integration efforts.
@twinaphex OK, created the PR. Note there seem to be non-trivial conflicts, however I cannot review them with the web editor :-/
@all I did change the repo structure a tiny bit to better accomodate PRing to notaz and libretro at the same time. I hope I didn't break something, but to whom it may concern, you better check.
@irixxxx It looks like there are some severe speed issues ever since the May build of this year. https://boards.dingoonity.org/retro-game-350rg-350/rg-350-emulatorsgame-ports/msg195773/#msg195773
I'm not sure when it got worse but seems to have happened after your ARM specific optimizations.
I'll have a look at the caanoo version and see if I can reproduce it there. Do you possibly have a last-known-good hash? If I can reproduce this I could try bisecting to find the offending commit.
Just a guess: Is "video overlay 2x" set in display options? That creates more yuv data to increase color resolution, but it's slower. Is "YM2612 SSG-EG" enbaled in the advanced options? That has better audio for some games, but is also slower.
Set video output mode to video overlay, and set the disable ssg-eg option. Are you still seeing slowdowns afterwards?
@gameblabla I've tried the latest version on my caanoo. I can't really notice any heavy slowdowns, neither in genesis nor in 32x games, beyond what was introduced by the latest bugfixing rounds. Those introduced some slight overhead, though far from as heavy as reported in your dingoonity thread. I haven't encountered any frame drops in genesis or cd games, and about 1 or 2 frames in a few 32x games (notably kolibri), and the caanoo should be much older and slower than those jz47xx based devices. It has those ASM optimizations, though. I'll turn those off and see if that changes the picture. Otherwise I'm a bit at a loss. My best bet is the 2x display mode. Please look if that's somehow enabled (AFAIK it shouldn't, it should be off by default).
@gameblabla I can't build a speedy opk from the 1.95 tag with my gcw0 toolchain. I remember you said something about triple buffering in SDL to obtain good fps results. I take it that is so with the latest opk you made?
@gameblabla I can't build a speedy opk from the 1.95 tag with my gcw0 toolchain. I remember you said something about triple buffering in SDL to obtain good fps results. I take it that is so with the latest opk you made?
Latest OPK was just a straight compile unlike my last other OPK. And yeah, i did notice a noticeable speed increase when turning on SDL_TRIPLEBUF now that i think about it... https://github.com/retrofw/picodrive/commit/f876f2997f1a1f0cd4df06c0ea2d13423420b3f1
libpicofe also needs to be patched to add SDL_TRIPLEBUF and overlay mode should not be used, as it caused a massive performance drop when used : plat_sdl_patch.txt
I totally forgot about these until i was being reminded of it all... lol
Btw, while you're here, it would need nice if Picodrive could make use of IPU scaling. If you are not familiar with it, it works by setting SDL_SetVideoMode to the native resolution of the game. (so a 256x224 game will have a window being created as 256x224 instead of 320x240 and spacing it or scaling it) The OD driver will then detect the framebuffer size and use the IPU chip block to scale from 256x224 to the device's native resolution.
There was an attempt here although in my experience, it does not work properly or for everything https://github.com/retrofw/picodrive/commit/a6cded6ec6411baae9ca4dd61b02a0461c73aaa1
I had made a more successful attempt but it was still a hack https://github.com/arcade-mini/picodrive/commit/059a56450c173c49ef29e1b38990f94a3f125ee8
That way, we could also avoid copying buffers around and the performance boost would be greater than "overlay" mode, though i'm not exactly sure how picodrive does things.
Anyway, just saying. Maybe i should try redoing it that way again. Perhaps i should have told you about it but its been a while. I'm pretty sure the performance drop is caused by SDL_DOUBLEBUF because i have noticed it before on my LDK
@gameblabla Re IPU scaling:
The only rendering modes supporting no-copy are the 2 8-bit CLUT modes. No-copy data always has 8 scratch bytes at the start of each line, which are used to avoid implementing sprite clipping - a speed hack. You can get a "seamless" buffer only with copying the image data. 8 bit accurate does this automatically if the line length set can't accomodate the scratch area. For fast mode there's no such provision.
Is the gcw0/rg350 able to use OpenGL in SDL? In that case, you could use the texture blitting of OpenGL to clip away the leading 8 bytes.
Originally I only wanted to learn something about recompilation and looked for a simple target... I think this got way out of hand. I ended up changing things in picodrive all over the place.
However, mainly I made large changes to the DRC for optimisation. It now creates much better code (something slightly above 3 arm32-insns per SH2-insn). And lastly, I added 2 new backends for mipsel (MIPS32R1) and aarch64 (A64) to it. Have a look and take what you think is suitable.