notaz / picodrive

Fast MegaDrive/MegaCD/32X emulator
Other
291 stars 165 forks source link

large overhaul of SH2 DRC, plus new backends for MIPS32 and A64 #100

Closed irixxxx closed 1 month ago

irixxxx commented 5 years ago

Originally I only wanted to learn something about recompilation and looked for a simple target... I think this got way out of hand. I ended up changing things in picodrive all over the place.

However, mainly I made large changes to the DRC for optimisation. It now creates much better code (something slightly above 3 arm32-insns per SH2-insn). And lastly, I added 2 new backends for mipsel (MIPS32R1) and aarch64 (A64) to it. Have a look and take what you think is suitable.

notaz commented 5 years ago

Nice, wasn't expecting anything like this.

I'll want to look at this though, and make sure the targets I care about still work, so with my time it may take quite a while before this is merged.

irixxxx commented 5 years ago

to be honest, me neither... like I said, it got out of hand.

I forgot to mention that I extended x86-64 support to use the full register set. I'm no x86 afficionado, but it helped me a lot with debugging the DRC changes.

Also notable are the changes to polling detection. I added basic loop detection to the DRC, which enabled me to detect polling on memory addresses, which would otherwise be far too expensive to do. That gave a nice speed boost to some games. And I tried to mend some of the synchronisation problems that were still present by adding a "synchronisation FIFO". This stores values written to known addresses used for synchronisation together with the cycle time of the write to avoid missing a value when several values are written in short time. I say that's a bit experimental, but it may have the potential to make some of the other sync stuff superfluous.

I introduced a "branch cache" in the DRC. This caches lookup results for entry points to speed up branching across tcache buffers. That also brought a noticeable improvement. I also riveted on the memory access functions and did some changes there since I identified these as a bottleneck for the overall speed. And I added some more arm32 asm stuff to speed up things on the armv[4-7] platforms.

Together with all the other changes I did, this code achieves 30-60 fps on most 32x games on a caanoo @800MHz, firmware V3. The aarch64 implementation runs anything I threw at it at 100-200 fps on an Odroid-C2 under ubuntu, which isn't really the fastest horse in the stable. Unfortunately I can't say how fast the mipsel stuff is since I have no real hardware and can only test this in a qemu environment (using https://boards.dingoonity.org/gcw-development/gcw-zero-emulation-in-qemu/). It generates about 10-15% more code than the arm32 version (due to the flags emulation stuff), so I expect it to behave similar to the caanoo or slightly better on a 1GHZ JZ4760.

I think further speedup might be achieved if the polling logic is changed in a way that interrupt don't disrupt the polling state. A high-profile interrupt like pwm can occur about every 1000 cycles. Redetecting the polling state can take more than 200 sh2 cycles, so that can take up to 20% of the emulated cpu. I can't spend much more time with this, but I reckon that would be an interesting route.

Lastly, I should mention that I didn't work on the configure stuff. I used static config. files instead (included in the repo). That should probably be changed...

gameblabla commented 5 years ago

it doesn't compile with flto enabled :

pico/32x/memory.c:44:1: error: global register variable follows a function definition
   44 | DRC_DECLARE_SR;

Also compilation breaks when attempting to run make in tools. The following patch works for me

--- a/Makefile
+++ b/Makefile
@@ -202,10 +202,10 @@
 endif

-target_: pico/pico_int_offs.h $(TARGET)
+target_: $(TARGET)

 clean:
-   $(RM) $(TARGET) $(OBJS) pico/pico_int_offs.h
+   $(RM) $(TARGET) $(OBJS)
    $(RM) -r .opk_data

 $(TARGET): $(OBJS)
@@ -218,8 +218,8 @@
 pprof: platform/linux/pprof.c
    $(CC) $(CFLAGS) -O2 -ggdb -DPPROF -DPPROF_TOOL -I../../ -I. $^ -o $@ $(LDFLAGS) $(LDLIBS)

-pico/pico_int_offs.h:: tools/mkoffsets.sh
-   make -C tools/ XCC="$(CC)" XCFLAGS="$(CFLAGS)"
+tools/textfilter: tools/textfilter.c
+   make -C tools/ textfilter

 .s.o:
    $(CC) $(CFLAGS) -c $< -o $@`

The 32X MIPS DRC code doesn't work at all on an Ingenic JZ4760B : it crashes upon booting up any game. The ingenic Jz4760 is MIPS32r1 (non MSA) unlike the JZ4770 which is MIPS32r2 (it does have an FPU though). Disabling the SH2 DRC MIPS code makes it work again but it runs at like 7-8 FPS. Guess it's better than 3-4 FPS... I'm using an LDK/RS-97 for testing.

irixxxx commented 5 years ago

Hi,

On Sun, 11 Aug 2019, 06:52 gameblabla, notifications@github.com wrote:

it doesn't compile with flto enabled :

pico/32x/memory.c:44:1: error: global register variable follows a function definition 44 | DRC_DECLARE_SR;

What platform did you compile for? Libretro or standalone?

Also compilation breaks when attempting to run make in tools.

The following patch works for me

--- a/Makefile +++ b/Makefile @@ -202,10 +202,10 @@ endif

-target_: pico/pico_intoffs.h $(TARGET) +target: $(TARGET)

clean:

  • $(RM) $(TARGET) $(OBJS) pico/pico_int_offs.h
  • $(RM) $(TARGET) $(OBJS) $(RM) -r .opk_data

    $(TARGET): $(OBJS) @@ -218,8 +218,8 @@ pprof: platform/linux/pprof.c $(CC) $(CFLAGS) -O2 -ggdb -DPPROF -DPPROF_TOOL -I../../ -I. $^ -o $@ $(LDFLAGS) $(LDLIBS)

-pico/pico_int_offs.h:: tools/mkoffsets.sh

  • make -C tools/ XCC="$(CC)" XCFLAGS="$(CFLAGS)" +tools/textfilter: tools/textfilter.c
  • make -C tools/ textfilter

    .s.o: $(CC) $(CFLAGS) -c $< -o $@`

I built an automatic mechanism to calculate the offsets in pico_int_offs.h. it's absolutely needed at least once since some offsets have changed. That may also explain why the mips drc doesn't work.

What exactly did fail?

The 32X MIPS DRC code doesn't work at all on an Ingenic JZ4760B : it crashes upon booting up any game. The ingenic Jz4760 is MIPS32r1 (non MSA) unlike the JZ4770 which is MIPS32r2 (it does have an FPU though). Disabling the SH2 DRC MIPS code makes it work again but it runs at like 7-8 FPS. Guess it's better than 3-4 FPS... I'm using an LDK/RS-97 for testing.

Unfortunately I can test mips only in qemu. I don't have real hardware for this. But in qemu it works fine (fun fact, the basic drc also works fine on a 25 years old sgi indigo 2 :-)).

I guess if it's not the offsets from above I have to organise a real hw...

Regards, --kub

You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4XIWFTPFXX63THYL5LQD6LKFA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4A2AIA#issuecomment-520200224, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4WD5JI33LINXZM722DQD6LKFANCNFSM4IH5H3EQ .

irixxxx commented 5 years ago

On Sun, 11 Aug 2019, 09:00 Kai-Uwe Bloem, derkub@gmail.com wrote:

Hi,

On Sun, 11 Aug 2019, 06:52 gameblabla, notifications@github.com wrote:

it doesn't compile with flto enabled :

pico/32x/memory.c:44:1: error: global register variable follows a function definition 44 | DRC_DECLARE_SR;

What platform did you compile for? Libretro or standalone?

After some more thinking I suspect it might be a compiler version issue. Which compiler version did you use? Which additional cflags?

gameblabla commented 5 years ago

I built an automatic mechanism to calculate the offsets in pico_int_offs.h. it's absolutely needed at least once since some offsets have changed. That may also explain why the mips drc doesn't work. What exactly did fail?

I get this error when trying when doing make : objcopy: Unable to recognise the format of the input file `/tmp/getoffs.o'

It does that even when i try to do it manually with CC set to mipsel-linux-gcc.

After some more thinking I suspect it might be a compiler version issue. Which compiler version did you use? Which additional cflags? I'm using my own toolchain which uses GCC 9.1 (with some patches applied to it), no additional CFLAGS. I also tried the gcw0 toolchain (which is GCC 4.8, got it here : http://www.gcw-zero.com/develop) and i still get the same error.

EDIT Turns out that it doesn't work because it uses the host objcopy, not the target's. I did a quick hack fix by using mipsel-linux-objcopy instead and it compiled just fine. Obviously this will need a proper fix for cross compiling like i do.

But guess what ? 32X games still don't work properly. Same way as before. (Picodrive crashes before even booting up the game)

irixxxx commented 5 years ago

On Sun, 11 Aug 2019, 16:03 gameblabla, notifications@github.com wrote:

I get this error when trying when doing make : objcopy: Unable to recognise the format of the input file `/tmp/getoffs.o'

It does that even when i try to do it manually with CC set to mipsel-linux-gcc.

Hmm. Sounds like the binutils don't support foreign ELF formats. In case you haven't done so already, could you please try to install multiarch binutils?

In any case, I'll think up another solution for this.

I'm using my own toolchain which uses GCC 9.1 (with some patches applied to it), no additional CFLAGS. I also tried the gcw0 toolchain (which is GCC 4.8, got it here : http://www.gcw-zero.com/develop) and i still get the same error.

I reckon it can't be gcc. It works fine with the Ubuntu supplied gcc for x86 and aarch64. I strongly suspect binutils.

gameblabla commented 5 years ago

Hmm. Sounds like the binutils don't support foreign ELF formats. In case you haven't done so already, could you please try to install multiarch binutils?

See my edit to my post. I managed to make it work because it wasn't using the target's objcopy but the host one. So if you try to cross compile it, it won't work. I had to edit the script so that it uses mipsel-linux-objcopy instead of objcopy but there should be a better fix than that obv.

irixxxx commented 5 years ago

I've made some changes to fix this -flto problem. Please pull and check if this solves your problem. The binutils problem isn't solved yet. I'm still searching for a general solution.

irixxxx commented 5 years ago

There is a tentative fix for missing multiarch binutils.

gameblabla commented 5 years ago

So mkoffsets.sh works good now, except that it now seemingly crashes on my host's GCC 9.1 compiler. But i heard that version had multiple issues with compiling stuff like PCSX2 anyway so switching to clang fixed it.

However, it is still not working. So i grabbed my Qemu GCW0 image and recompiled it with the GCW0 toolchain and it crashes in the exact same way in the QEMU vm as it does on my LDK. I wanted to debug it, however...

opendingux:/media/QEMU VVFAT # gdb PicoDrive
GNU gdb (GDB) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "mipsel-gcw0-linux-uclibc".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from PicoDrive...done.
(gdb) run "rom.32x"
Starting program: /media/QEMU VVFAT/PicoDrive "rom.32x"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
plat_sdl: using 320x240 as fullscreen resolution
plat_sdl: overlay: fmt 59565955, planes: 1, pitch: 640, hw: 0
warning: video overlay is not hardware accelerated, not going to use it.
input: new device #0 "sdl:keys"
input: async-only devices detected..
# drv probed binds name
0   0      y     y sdl:keys
using sdl audio output driver
platform/libpicofe/readpng.c: failed to open: /media/QEMU VVFAT/skin/font.png
platform/libpicofe/readpng.c: failed to open: /media/QEMU VVFAT/skin/selector.png
emu_ReloadRom(rom.32x)
00000:000: couldn't open carthw.cfg!
00000:000: sram: 200000 - 203fff; eeprom: 0
starting audio: 44100 len: 735 stereo: 1, pal: 0
00003:134: 32X startup
00003:134: drc_cmn_init: 0x676000, 4194304 bytes: 0
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred

Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

So unfortunately that's kind of a bummer. It does the same thing with my buildroot too so we can rule out a regression too.

If you want to give it a try yourself, grab the qemu image here : http://www.gcw-zero.com/files/gcw0-qemu.zip

For convenience, i also add the -hdc fat:rw:./myfolderwithstuff switch to run-gcw0.sh. Put your build of Picodrive with a 32x game inside of that folder, run run-gcw0.sh in a terminal. Select the terminal app in the GUI. (Controls are : LCTRL for A, TAB/BACKSPACE for L/R, and so on) Go to your terminal where you ran ./run-gcw0.sh and type in

cd /media/QEMU VVFAT

Then you can use GDB and run it on your Picodrive build. It will most likely crash like my builds...

I must mention that Genesis games still work fine.

irixxxx commented 5 years ago

That's really strange. I used that qemu image for testing when writing the mips backend.

I've just built a fresh github checkout with "ln -sf config.gcw0 config.mak && make clean opk". I copied the opk into the gcw0_data image, and I can start it and load 32x roms:


< Welcome to OpenDingux ! >

    \   ^__^
     \  (oo)\_______
        (__)\       )\/\
            ||----w |
            ||     ||

opendingux:/media/data/local/home # mount /media/data/apps/PicoDrive.opk /mnt/ opendingux:/media/data/local/home # cd /mnt opendingux:/mnt # ./PicoDrive plat_sdl: using 320x240 as fullscreen resolution plat_sdl: overlay: fmt 59565955, planes: 1, pitch: 640, hw: 0 warning: video overlay is not hardware accelerated, not going to use it. input: new device #0 "sdl:keys" input: async-only devices detected..

drv probed binds name

0 0 y y sdl:keys config_readsect: unhandled val for "Video output mode": "SDL Window" config_readsect: loaded from /usr/local/home/.picodrive/config2.cfg using sdl audio output driver platform/libpicofe/readpng.c: unexpected font image size 256x320, needed 128x160 platform/libpicofe/readpng.c: failed to open: /mnt/skin/selector.png found skin.txt selected file: /media/data/roms/rom1.32x emu_ReloadRom(/media/data/roms/rom1.32x) config_readsect: loaded from /usr/local/home/.picodrive/config2.cfg config_readsect: loaded from /usr/local/home/.picodrive/config2.cfg 00000:000: couldn't open carthw.cfg! 00000:000: sram: 200000 - 203fff; eeprom: 0 starting audio: 44100 len: 735 stereo: 1, pal: 0 ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred 00003:134: 32X startup 00003:134: drc_cmn_init: 0x636000, 4194304 bytes: 0 ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred (...more of the same, while the rom is running...)

I'm at a loss here. There must be a difference in our setups, but where is it?

Could you possibly get the output of "info reg" and see if the link register contains something useful? Is it maybe a path problem? The paths in the config.* are set so that the toolchains live in $HOME/opt. (And yes, that must be changed somehow. I just don't have an idea how to do this independantly from platform/compiler) What is the output if you compile it with drc_debug set to 15? (You may need a patch to libpicofe for this:) diff --git a/linux/host_dasm.c b/linux/host_dasm.c index 66a83ea..eba39ac 100644 --- a/linux/host_dasm.c +++ b/linux/host_dasm.c @@ -22,11 +22,21 @@ extern char **g_argv;

static struct disassemble_info di;

-#ifdef arm +#if defined arm

define print_insn_func print_insn_little_arm

define BFD_ARCH bfd_arch_arm

define BFD_MACH bfd_mach_arm_unknown

define DASM_OPTS "reg-names-std"

+#elif defined aarch64 +#define print_insn_func print_insn_aarch64 +#define BFD_ARCH bfd_arch_aarch64 +#define BFD_MACH bfd_mach_aarch64 +#define DASM_OPTS NULL +#elif defined mips +#define print_insn_func print_insn_little_mips +#define BFD_ARCH bfd_arch_mips +#define BFD_MACH bfd_mach_mipsisa32 +#define DASM_OPTS NULL

elif defined(__x86_64) || defined(i386__)

define print_insn_func print_insn_i386_intel

define BFD_ARCH bfd_arch_i386

On Fri, Aug 16, 2019 at 9:03 PM gameblabla notifications@github.com wrote:

So now mkoffsets.sh works good now, except that it now seemingly crashes on my host's GCC 9.1 compiler. But i heard that version had multiple issues with compiling stuff like PCSX2 anyway so switching to clang fixed it.

However, it is still not working. So i grabbed my Qemu GCW0 image and recompiled it with the GCW0 toolchain and it crashes in the exact same way in the QEMU vm as it does on my LDK. I wanted to debug it, however...

opendingux:/media/QEMU VVFAT # gdb PicoDrive GNU gdb (GDB) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "mipsel-gcw0-linux-uclibc". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from PicoDrive...done. (gdb) run "rom.32x" Starting program: /media/QEMU VVFAT/PicoDrive "rom.32x" [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/libthread_db.so.1". plat_sdl: using 320x240 as fullscreen resolution plat_sdl: overlay: fmt 59565955, planes: 1, pitch: 640, hw: 0 warning: video overlay is not hardware accelerated, not going to use it. input: new device #0 "sdl:keys" input: async-only devices detected..

drv probed binds name

0 0 y y sdl:keys using sdl audio output driver platform/libpicofe/readpng.c: failed to open: /media/QEMU VVFAT/skin/font.png platform/libpicofe/readpng.c: failed to open: /media/QEMU VVFAT/skin/selector.png emu_ReloadRom(rom.32x) 00000:000: couldn't open carthw.cfg! 00000:000: sram: 200000 - 203fff; eeprom: 0 starting audio: 44100 len: 735 stereo: 1, pal: 0 00003:134: 32X startup 00003:134: drc_cmn_init: 0x676000, 4194304 bytes: 0 ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred

Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? () (gdb) bt

0 0x00000000 in ?? ()

Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)

So unfortunately that's kind of a bummer. It does the same thing with my buildroot too so we can rule out a regression too.

If you want to give it a try yourself, grab the qemu image here : http://www.gcw-zero.com/files/gcw0-qemu.zip

For convenience, i also add the -hdc fat:rw:./myfolderwithstuff switch to run-gcw0.sh. Put your build of Picodrive with a 32x game inside of that folder, run run-gcw0.sh in a terminal. Select the terminal app in the GUI. Go to your terminal where you ran ./run-gcw0.sh and type in

cd /media/QEMU VVFAT

Then you can use GDB and run it on your Picodrive build. It will most likely crash like my builds...

I must mention that Genesis games still work fine.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4W6LQKVTKUMWKIV26TQE32XFA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4POFPQ#issuecomment-522117822, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4RSOTON7G6AGND3D33QE32XFANCNFSM4IH5H3EQ .

gameblabla commented 5 years ago

Well i managed to make the GCW0 version work on QEMU as you described but still couldn't make it work on my LDK in a similar way (except by extracting the executable out of the OPK). I'm really not sure what's going on, either a toolchain issue or something else.

I'll try some other workaround before i give up on that because i have no idea.

irixxxx commented 5 years ago

How exactly are you building your version? Can I replicate that to check if I can reproduce the problem?

On Sat, 17 Aug 2019, 17:43 gameblabla, notifications@github.com wrote:

Well i managed to make the GCW0 version work on QEMU as you described but still couldn't make it work on my LDK in a similar way (except by extracting the executable out of the OPK). I'm really not sure what's going on, either a toolchain issue or something else.

I'll try some other workaround before i give up on that because i have no idea.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4UJCBMG6KHB4CX7VZDQFAMCPA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QOAWY#issuecomment-522248283, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4R6PRJ6ITU2IORAFADQFAMCPANCNFSM4IH5H3EQ .

irixxxx commented 5 years ago

Another idea: could you please look into pico/pico_int_offs.h and check if the computed offsets look ok?

On Sat, 17 Aug 2019, 17:52 Kai-Uwe Bloem, derkub@gmail.com wrote:

How exactly are you building your version? Can I replicate that to check if I can reproduce the problem?

On Sat, 17 Aug 2019, 17:43 gameblabla, notifications@github.com wrote:

Well i managed to make the GCW0 version work on QEMU as you described but still couldn't make it work on my LDK in a similar way (except by extracting the executable out of the OPK). I'm really not sure what's going on, either a toolchain issue or something else.

I'll try some other workaround before i give up on that because i have no idea.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4UJCBMG6KHB4CX7VZDQFAMCPA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QOAWY#issuecomment-522248283, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4R6PRJ6ITU2IORAFADQFAMCPANCNFSM4IH5H3EQ .

gameblabla commented 5 years ago
/* autogenerated by mkoffset.sh, do not edit */
/* target endianess: le, compiled with: /opt/rs97-toolchain-PIE/usr/bin/mipsel-linux-gcc -Wall -ggdb -ffunction-sections -fdata-sections -I. -O2 -finline-functions -DNDEBUG -falign-functions=2 -I/opt/rs97-toolchain-PIE/usr/mipsel-rs97-linux-uclibc/sysroot/usr/include/ -I/opt/rs97-toolchain-PIE/usr/mipsel-rs97-linux-uclibc/sysroot/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT -Wno-unused-result -fno-stack-protector -march=mips32 -mtune=mips32 -mhard-float -DEMU_F68K -D_USE_CZ80 -DDRC_SH2 */
#define OFS_Pico_video_reg   0x0000
#define OFS_Pico_m_rotate    0x0040
#define OFS_Pico_m_z80Run    0x0041
#define OFS_Pico_m_dirtyPal  0x0046
#define OFS_Pico_m_hardware  0x0047
#define OFS_Pico_m_z80_reset 0x004f
#define OFS_Pico_m_sram_reg  0x0049
#define OFS_Pico_sv          0x0090
#define OFS_Pico_sv_data     0x0090
#define OFS_Pico_sv_start    0x0098
#define OFS_Pico_sv_end      0x009c
#define OFS_Pico_sv_flags    0x00a0
#define OFS_Pico_rom         0x0570
#define OFS_Pico_romsize     0x0578
#define OFS_Pico_est         0x00c0
#define OFS_EST_DrawScanline 0x0000
#define OFS_EST_rendstatus   0x0004
#define OFS_EST_DrawLineDest 0x0008
#define OFS_EST_HighCol      0x0010
#define OFS_EST_HighPreSpr   0x0018
#define OFS_EST_Pico         0x0020
#define OFS_EST_PicoMem_vram 0x0028
#define OFS_EST_PicoMem_cram 0x0030
#define OFS_EST_PicoOpt      0x0038
#define OFS_EST_Draw2FB      0x0040
#define OFS_EST_HighPal      0x0048
#define OFS_PMEM_vram        0x10000
#define OFS_PMEM_vsram       0x22100
#define OFS_PMEM32x_pal_native 0x90e00
#define OFS_SH2_is_slave     0x0a18
#define OFS_SH2_p_bios       0x0098
#define OFS_SH2_p_da         0x00a0
#define OFS_SH2_p_sdram      0x00a8
#define OFS_SH2_p_rom        0x00b0
#define OFS_SH2_p_dram       0x00b8
#define OFS_SH2_p_drcblk_da  0x00c0
#define OFS_SH2_p_drcblk_ram 0x00c8

The offsets look like these... No idea if this is correct or not.

I'm using my own toolchain here : https://github.com/rs-97-cfw/buildroot

It's fully static with forced no pic and mno-abicalls. Reverted those changes and rebuilt Picodrive (i know some stuff that wouldn't like those) but still crashes on 32X games. I'll try to recompile Picodrive using the toolchain that is used for the rootfs and see if that fixes the issue (given that some issue could arise from either)

Also, i tried using the config.gcw0 file and modifying it for my toolchain as well as using CROSS_COMPILE=mipsel-linux- ./configure --platform=opendingux. Compiles but still crashes on 32X games.

Perhaps it works with the GCW0 toolchain due to the older GCC ? No idea.

irixxxx commented 5 years ago

The offsets look OK... though as an afterthought I think they are only used in asm parts... and those mainly (only?) exist for arm.

Anyway... what's your make command? Just to save me some hours: Is there a binary release of your toolchain/sysroot I can readily install in ubuntu 18?

On Sat, Aug 17, 2019 at 6:18 PM gameblabla notifications@github.com wrote:

/ autogenerated by mkoffset.sh, do not edit / / target endianess: le, compiled with: /opt/rs97-toolchain-PIE/usr/bin/mipsel-linux-gcc -Wall -ggdb -ffunction-sections -fdata-sections -I. -O2 -finline-functions -DNDEBUG -falign-functions=2 -I/opt/rs97-toolchain-PIE/usr/mipsel-rs97-linux-uclibc/sysroot/usr/include/ -I/opt/rs97-toolchain-PIE/usr/mipsel-rs97-linux-uclibc/sysroot/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT -Wno-unused-result -fno-stack-protector -march=mips32 -mtune=mips32 -mhard-float -DEMU_F68K -D_USE_CZ80 -DDRC_SH2 /

define OFS_Pico_video_reg 0x0000

define OFS_Pico_m_rotate 0x0040

define OFS_Pico_m_z80Run 0x0041

define OFS_Pico_m_dirtyPal 0x0046

define OFS_Pico_m_hardware 0x0047

define OFS_Pico_m_z80_reset 0x004f

define OFS_Pico_m_sram_reg 0x0049

define OFS_Pico_sv 0x0090

define OFS_Pico_sv_data 0x0090

define OFS_Pico_sv_start 0x0098

define OFS_Pico_sv_end 0x009c

define OFS_Pico_sv_flags 0x00a0

define OFS_Pico_rom 0x0570

define OFS_Pico_romsize 0x0578

define OFS_Pico_est 0x00c0

define OFS_EST_DrawScanline 0x0000

define OFS_EST_rendstatus 0x0004

define OFS_EST_DrawLineDest 0x0008

define OFS_EST_HighCol 0x0010

define OFS_EST_HighPreSpr 0x0018

define OFS_EST_Pico 0x0020

define OFS_EST_PicoMem_vram 0x0028

define OFS_EST_PicoMem_cram 0x0030

define OFS_EST_PicoOpt 0x0038

define OFS_EST_Draw2FB 0x0040

define OFS_EST_HighPal 0x0048

define OFS_PMEM_vram 0x10000

define OFS_PMEM_vsram 0x22100

define OFS_PMEM32x_pal_native 0x90e00

define OFS_SH2_is_slave 0x0a18

define OFS_SH2_p_bios 0x0098

define OFS_SH2_p_da 0x00a0

define OFS_SH2_p_sdram 0x00a8

define OFS_SH2_p_rom 0x00b0

define OFS_SH2_p_dram 0x00b8

define OFS_SH2_p_drcblk_da 0x00c0

define OFS_SH2_p_drcblk_ram 0x00c8

The offsets look like these... No idea if this is correct or not.

I'm using my own toolchain here : https://github.com/rs-97-cfw/buildroot

It's fully static with forced no pic and mno-abicalls. Reverted those changes and rebuilt Picodrive (i know some stuff that wouldn't like those) but still crashes on 32X games. I'll try to recompile Picodrive using the toolchain that is used for the rootfs and see if that fixes the issue (given that some stuff could arise from either)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4UV35QRCRUI4B55OM3QFAQE3A5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QOT6Y#issuecomment-522250747, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4QQI2BEBIW3EPP3643QFAQE3ANCNFSM4IH5H3EQ .

gameblabla commented 5 years ago

Well nvm, looks like to be an issue with my toolchain, it works with another one. (which i'll upload tomorrow, it uses GCC 7.3. I wonder if GCC 9.1 was the issue ?) Runs at about 18/19 FPS on After Burner 32X though versus 4/5 FPS but i'll take it.

I guess this can be merged now.

irixxxx commented 5 years ago

OK... though I would really be interested in the root cause of this. Thank you for your patience.

After Burner is a tough customer. I haven't had time to delve deeply into this, but it apparently has a lot of sync switching between the SH2 CPUs which is rather expensive. It might also have some high profile irq. Both break poll detection (that's what apparently slows down some other games; optimization idea: push/pop poll detection state with irq/rte). It's one of the slower games, at about 25-35 fps on a caanoo with a frameskip of 2. Try something like Tempo, that should work much better.

I think there are some opportunities to optimize the mips backend:

On Sat, Aug 17, 2019 at 10:36 PM gameblabla notifications@github.com wrote:

Well nvm, looks like to be an issue with my toolchain, it works with another one. (which i'll upload tomorrow, it uses GCC 7.3. I wonder if GCC 9.1 was the issue ?) Runs at about 18/19 FPS on After Burner though versus 4/5 FPS but i'll take it.

I guess this can be merged now.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4R2SCW36B7ETX66Z3DQFBONZA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QTBJA#issuecomment-522268836, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4RC2EKMDEZCYXD7ZODQFBONZANCNFSM4IH5H3EQ .

gameblabla commented 5 years ago

OK... though I would really be interested in the root cause of this. Thank you for your patience.

I downgraded to GCC 7.4 and Binutils 2.31.1 and it fixed the crashing issue i had with Picodrive. (still fully statically linked, as i suspected that was not the issue) Building with GCC 9.1 would repeatedly make Picodrive crash. I will try to debug it with Valgrind to see if it could be an issue with your code but at least i have a (better) workaround for now.

I think there are some opportunities to optimize the mips backend:

  • mips32r2 stuff may be used if appropriate (I've already documented where I think this to be viable).

I may suggest you the MXU instruction set as well ? It's an instruction set for the VPU coprocessor found on Ingenic socs like the JZ4760B (which the LDK/RS-97/RG-300/PAP K3/Gameta have) as well as its sucessors. The Dingoo A320 also supports the MXU but only revision 1. JZ4755 and above support MXU revision 2.

Senquack made a header that allows you to use it since linkers do not support it

(Header to include in your project for MXU set) https://github.com/senquack/mxu1_as_macros (A test example below for checking the MXU ) https://github.com/senquack/mxu_pcercuei_test

He had implemented in MXU/MIPS32 assembly a GTE implementation for PCSX4ALL (which will be out by the end of this year hopefully) and he told me he had a performance improvement of around 10~20%. (he said that it could be above that)

But given that it's seldom documented, well no urge i suppose lol...

As for MIPS32r2 stuff, well outside of the GCW0 and the upcoming RG-350 it is fairly uncommon...

Not sure about your other suggestions though but they sound good.

Also Tempo without frame-skipping is like 32-60 FPS.

irixxxx commented 5 years ago

Hmm, I can see the potential for a GPU, but it's probably not worth using a vector extension in the DRC, since it strictly operates on scalar values. Regarding the speed difference between caanoo and LDK, I think this is due to the heavy ARM assembler optimisation, which isn't available for any other target. I might say something well known, but using google perftools on the target helped me a lot. I can explain how I did this on a caanoo if necessary.

On Sun, Aug 18, 2019 at 1:43 AM gameblabla notifications@github.com wrote:

OK... though I would really be interested in the root cause of this. Thank you for your patience.

I downgraded to GCC 7.4 and Binutils 2.31.1 and it fixed the crashing issue i had with Picodrive. Building with GCC 9.1 would repeatedly make Picodrive crash. I will try to debug it with Valgrind to see if it could be an issue with your code but at least i have a (better) workaround for now.

I think there are some opportunities to optimize the mips backend:

  • mips32r2 stuff may be used if appropriate (I've already documented where I think this to be viable).

I may suggest you the MXU instruction set as well ? It's an instruction set for the VPU coprocessor found on Ingenic socs like the JZ4760B (which the LDK/RS-97/RG-300/PAP K3/Gameta have) as well as its sucessors. The Dingoo A320 also supports the MXU but only revision 1. JZ4755 and above support MXU revision 2.

Senquack made a header that allows you to use it since linkers do not support it

(Header to include in your project for MXU set) https://github.com/senquack/mxu1_as_macros (A test example below for checking the MXU ) https://github.com/senquack/mxu_pcercuei_test

He had implemented in MXU/MIPS32 assembly a GTE implementation for PCSX4ALL (which will be out by the end of this year hopefully) and he told me he had a performance improvement of around 20%. (he said that it could be above that)

As for MIPS32r2 stuff, well outside of the GCW0 and the upcoming RG-350 it is fairly uncommon...

Not sure about your other suggestions though but they sound good.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4SEPICKNQPYCYEW573QFCEL3A5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QVNTY#issuecomment-522278607, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4WJCG4S7WQ6F323AGTQFCEL3ANCNFSM4IH5H3EQ .

irixxxx commented 5 years ago

I may accidently have found one culprit for the low speed you observed. It's the rgb565_to_uyvy function in platform/common/plat_sdl.c. It has a pixel loop with about 50 mips insns which is executed for every pixel displayed - at 60Hz frame rate that makes 32024060 pixels per second...and a whopping 240000000 insns. With jz4760b@740Mhz that's about a third of the available performance.

I have a patch reducing this to just above 20 insns, at the expense of having a huge precalculated array with the yuv values. However, I'm not so sure this will really be faster since it trashes the data cache for sure. Would you be willing to check it out for me on real hardware before I commit it?

gameblabla commented 5 years ago

Yes, the YUV related code is very slow, which is why our RS-97 fork of it avoids it and just directly draws to the screen in RGB565 mode. I will be able to test either way.

fjtrujy commented 5 years ago

Very interesting thing guys!!! I'm using Picodrive as a core in the RetroArch for the Playstation 2. @irixxxx the PlayStation 2 has a MIPS64 processor, you mentioned something about the improvements in MIPS, will it affect PlayStation 2 as well?. Finally, you were saying that you don't have a way to test the MIPS improvements, if the Playstation 2 is valid, I can help you with the process.

Additionally, I have a fork of Picodrive which add PS2 platform (is outdated I would need to rebase).

I would like to check how fast 32x is now in the PS2 xDD

irixxxx commented 5 years ago

As long as the PS2 processor supports the Mips32r1 ISA (which I think it should), it should work fine. Just don't forget to enable the DRC in the makefile.

Don't expect too much, though. I reckon it might be on par with a jz7440. @gameblabla has tested it on a jz7460 and got a so-so result, with below 20fps in afterburner, a rather high profile game wrt cpu usage. Other games should work noticably better.

The 8 bit output mode is working at least on gp2x and generic. However, iirc 32x always has rgb565 output since that is its native format, so no cigar here.

On Fri, 23 Aug 2019, 15:47 Francisco Javier Trujillo Mata, < notifications@github.com> wrote:

Very interesting thing guys!!! I'm using Picodrive as a core in the RetroArch for the Playstation 2. @irixxxx https://github.com/irixxxx the PlayStation 2 has a MIPS64 processor, you mentioned something about the improvements in MIPS, will it affect PlayStation 2 as well?. Finally, you were saying that you don't have a way to test the MIPS improvements, if the Playstation 2 is valid, I can help you with the process.

Additionally, I have a fork of Picodrive which add PS2 platform (is outdated I would need to rebase).

I would like to check how fast 32x is now in the PS2 xDD

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4WFRRHVD5BRFOUNFQ3QF7S5PA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5AIH4Y#issuecomment-524321779, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4RCMWABZSVAFLZBUIDQF7S5PANCNFSM4IH5H3EQ .

irixxxx commented 5 years ago

Yes, the YUV related code is very slow, which is why our RS-97 fork of it avoids it and just directly draws to the screen in RGB565 mode. I will be able to test either way.

I've commited some changes. It involves 3 functions central to frame creation, FinalizeLine555 (manual loop unrolling), rgb565_to_uyvy (more aggressive precalculation in a large array), and the do_line_pp, do_line_dc macros (splitting loops to simplify tests). On x86 I can see an effect of some 5-10% higher frame rates depending on the rom (after turning off the frame limiter). The 1st 2 functions make for the larger part of this, the 3rd change is in comparison less effective. I also see this effect in the gcw0 "simulator" (after all, it runs on the same x86 cpu). I would much appreciate it if you could try this out on a mips hw, comparing the results with and without my last commit.

gameblabla commented 5 years ago

Just saying but it was never using the YUV related code in the first place... lol So it really wasn't related. Should i still give it a try ?

irixxxx commented 5 years ago

Right... it does so only on platforms using YUV in the first place. However, FWIW, you might try it anyway - it optimizes some drawing functions, and the last commit brings some low hanging fruits for a small code size reduction. It might still help a little bit. It's an extra frame on a caanoo, give or take.

I also played around a bit with -flto on arm with gcc 4.7 (can't use anything newer), but I cannot find any constellation where it produces faster code on the caanoo - it's more like some % slower. Does it produce faster code for you on the mips platforms?

On Fri, Aug 30, 2019 at 8:43 PM gameblabla notifications@github.com wrote:

Just saying but it was never using the YUV related code in the first place... lol So it really wasn't related. Should i still give it a try ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4S2QPHT5TWRCUMWDUTQHFS6BA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5SOXNA#issuecomment-526707636, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4UZIJ6TB7QQUDWUW33QHFS6BANCNFSM4IH5H3EQ .

irixxxx commented 4 years ago

If you could spare the time, I would appreciate a run of the latest version on real mips hardware. A performance report would also be very nice, just to see if the changes aren't only good for ARMv4 in a caanoo.

gameblabla commented 4 years ago

I'll give it a try on my Retrostone with my CFW (ARMv7+NEON of course). I did find an issue though : trying to compile with -fprofile-generate will fail when trying to run mkoffsets.sh and it will output text like "undefined reference to free" and etc... Removing -fprofile-generate allows it to compile.

Would it be possible to have some way to compile Picodrive with PGO without it affecting mkoffsets ? PGO can result in a speedup of 10% on a low end device like the RS-97/LDK.

EDIT: This is strange. Sometimes it will freeze picodrive for no reason at random when playing After Burner. Again, this is on my Retrostone with the ARM32 backend. This did not happen on the older version (without your commits)

irixxxx commented 4 years ago

Hmm... I guess the simplest way would be to change the top level Makefile like this:

pico/pico_int_offs.h:: tools/mkoffsets.sh make -C tools/ XCC="$(CC)" XCFLAGS="$(CFLAGS) -fno-profile-generate"

On Fri, Oct 11, 2019 at 9:05 AM gameblabla notifications@github.com wrote:

I'll give it a try on my Retrostone with my CFW (ARMv7+NEON of course). I did find an issue though : trying to compile with -fprofile-generate will fail when trying to run mkoffsets.sh and it will output text like "undefined reference to free" and etc... Removing -fprofile-generate allows it to compile.

Would it be possible to have some way to compile Picodrive with PGO without it affecting mkoffsets ? PGO can result in a speedup of 10% on a low end device like the RS-97/LDK.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4WIJ6V2ETP3IVP3HZ3QOAQUJA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA7B2AY#issuecomment-540941571, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHR2L4QDIVBLBRXDR76RMNLQOAQUJANCNFSM4IH5H3EQ .

gameblabla commented 4 years ago

Yeah that seems to work with -j1 though not when multiple cores at involved it seems. Either way, the freeze issue still remains. Could be a compiler issue like last time, no idea...

irixxxx commented 4 years ago

I think I found the bug. Shifting an u32 left 32 bits produces strange results with gcc 9 on mips. Despite of my off-by-one error, I'm not really sure if the compiler does this right. It's not doing this on x86. It would be interesting how the C standard defines the behaviour for this.

Anyway... please pull and try again, I have good hope that this crash is gone.

irixxxx commented 4 years ago

BTW I added a better fix for -fprofile-generate/-flto. I hope this solves it for you.

gameblabla commented 4 years ago

Yes, the profile fix works now.

But sadly it doesn't fix the freeze issue. (also, Retrostone is ARMv7 so it's not affected by the MIPS or ARM64 backend) I could seem to make it trigger in a somewhat reliable way :

EDIT: Yup, using save states makes it crash rather quickly or in some cases instantly. I'm using GCC 8.3, does this happen to you ?

I tried it again with the stock version and this does not happen, it loads the save state properly.

irixxxx commented 4 years ago

I can only reproduce a lockup when loading a saved state while the game demo runs. Is this last commit fixing it for you?

irixxxx commented 4 years ago

On second thoughts the previous fix may itself present problems. Please use this new one.

gameblabla commented 4 years ago

Ok so it fixed the freeze issue for me, great ! No issues so far now. (other than the fact that it seems not work with -j4+ without multiple passes)

irixxxx commented 4 years ago

Hmm, "make clean; make -j4 all" completes without errors, but "make -j4 clean all" is apparently running the make targets in parallel, leading to strange results. I'm rather surprised about this behaviour. Since I'm not much of a makefile buff, I have no present idea what to do about this. Anyway, I haven't changed much wrt to the original v1.93 make structure, so this might happen with the v1.93 tag as well. Is it still at 19 fps in Afterburner? On that sluggy caanoo I see an improvement of about 20%.

gameblabla commented 4 years ago

Well if you don't plan any more changes then i can try it on my LDK/RS-97 now in addition to possibly the PocketGo/Bittboy.

irixxxx commented 4 years ago

I think this last commit will be it for some time, apart from bugfixing. I should deal with my other projects. Moreover, atm I don't have any more promising new ideas for the drc... let me know if you have any. So, please, go ahead and try it out and report some results to satisfy my curiosity ;-)

gameblabla commented 4 years ago

Btw, i forgot to report back but After Burner on an Ingenic JZ4760B runs at about 26 FPS without frame-skipping. A small improvement but overall not that great. I'm sure it will make a greater difference on the GCW0/RG-350 though

irixxxx commented 4 years ago

Thank you for testing. For the result, YMMV but I'm not so unhappy with it. Compared with your initial report (IIRC 18 fps) it's a gain of more than 40%.

BTW Re: Shifting an u32 left 32 bits produces strange results... I consulted the standard which says this is undefined. And after some thinking I say rightly so, since this depends on the actual implementation in the CPU.

irixxxx commented 4 years ago

@notaz, atm I've run out of items on my list, hence I'd say I'm done for now (except bugfixing and/or future new ideas). Can I do anything to advance integration? @gameblabla, I optimized the mips code emitter a bit, reducing the code size by about 5-10%. You might want to have a look.

notaz commented 4 years ago

I want to at least test drive this on the platforms I have, but still failing to do so, sorry. And still won't be able to do it for the next 2 weeks at least.

Also I've spotted some issues before like introducing .text relocations in asm that are forbidden on Android/iOS (program gets killed if their loader notices those relocations on load), but had no time to properly raise them. I haven't checked if your newer work addresses this.

irixxxx commented 4 years ago

FYI "scanelf -qt PicoDrive" on a generic arm build doesn't report textrel errors, so I suppose at least standalone armv[4-7] binaries might be ok.

On Thu, Nov 28, 2019 at 10:41 AM notaz notifications@github.com wrote:

I want to at least test drive this on the platforms I have, but still failing to do so, sorry. And still won't be able to do it for the next 2 weeks at least.

Also I've spotted some issues before like introducing .text relocations in asm that are forbidden on Android/iOS (program gets killed if their loader notices those relocations on load), but had no time to properly raise them. I haven't checked if your newer work addresses this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4SJWNA4BV2ELNWZK3TQV6G27A5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFMAZYA#issuecomment-559418592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHR2L4RFKVYFWHAEGH55RALQV6G27ANCNFSM4IH5H3EQ .

irixxxx commented 4 years ago

Now I see... a libretro .so build really produces textrel warnings. A quick check shows most of it is in places I haven't really touched, though, like drz80 and ym2612 stuff. The simple solution would be to disable asm stuff for android builds.

Anyway, some of the remains are code address tables which can be easily converted into jump tables (at the expense of the extra jump). The leftovers load an address from a literal pool via "ldr rn, =symbol". That might be replaced by something like "ldr rn, =symbol-.-12 ; add rn, pc", but unfortunately gas throws a syntax error for this.

notaz commented 4 years ago

It's strange with drz80/ym2612, these were supposed to be dealt with. Maybe only in the libretro repo.

Do we need https://github.com/notaz/picodrive/pull/100/commits/78d817c37006a557174594071d6390987ea8f09c at all, does it actually help? Compilers shouldn't be too terrible generating small functions like that, and generate PIC or whatever the platform wants only when needed.

I remember dealing with this in pcsx, maybe this will give you some inspiration: https://github.com/notaz/pcsx_rearmed/commit/8184d7c5f6db0b05fafeaaed69ef18d22a60c451

irixxxx commented 4 years ago

I have looked a bit further into this:

gas throws a syntax error for "ldr r0, =symbol-.-8" - not surprising since this is hardcoded in gas/config/tc-arm.c:parse_big_immediate() (BTW the gas doc says it's an expression...). Also, "l0: .equ xx,symbol-l0" produces an error. And I haven't found a way to convince gas to generate PIC code for "LDR =" by itself.

This is one way I've found as a replacement for "ldr r0, =symbol": ldr r0, [pc, #l1-.-8] l0: add r0, pc ... l1: .word symbol-l0-8

Local number labels also can't be used for this. You could write 2x2 macros for this, evaluating to a pool load for non-PIC and the above code for PIC. Something like PICLDA(l0, l1, symbol) and PICDEFA(l0, l1, symbol).

irixxxx commented 4 years ago

Re 78d817c: IIRC it's about half the insn count, but I can't remember how often this was called.

m68k takes 4 cycles per memory access, SH2 takes 2(?), so m68k can only do about 1/12th of the accesses of both SH2s in worst case. Most probably the effect is rather small, but I'll cross check that to be sure. Another solution would be to hand in the table pointer as a parameter like for the SH2 memory access - moreover in that case these functions would only be neeed once for both m68k cpus.