libretro / FBNeo

FBNeo - We are Team FBNeo.
https://neo-source.com
Other
224 stars 134 forks source link

Likely endian issue on Sega Y Board games, e.g. Rail Chase (rchase) #820

Closed vaguerant closed 2 years ago

vaguerant commented 3 years ago

Firing up Rail Chase on the big-endian Wii U greets the player with the following:

rchase-210704-163200 rchase-210704-163300

Tested on 0d55b9e, a commit from 15 hours ago as of this filing.

Highlight for endian czar @crystalct.

crystalct commented 3 years ago

Yes... i confirm.... it has some little 'big-endian' issues..... i will fix it

crystalct commented 3 years ago

@barbudreadmon How endianize in an elegant way a situation like this:

uint16_t *x = (uint16_t *)app;
*x = 0xFF;
uint16_t y = ++*x;

a working ways is:

uint16_t *x = (uint16_t *)app;
*x = BURN_ENDIAN_SWAP_INT16(0xFF);
*x = BURN_ENDIAN_SWAP_INT16(BURN_ENDIAN_SWAP_INT16(*x) + 1);
uint16_t y = BURN_ENDIAN_SWAP_INT16(*x);

Other better ways?

barbudreadmon commented 3 years ago

no idea, maybe @dinkc64 will have an idea ?

edit: what's the original type for app ?

dinkc64 commented 3 years ago

where is this code located?

crystalct commented 3 years ago

sys16_gfx.cpp There are many ++data[7]

Il mer 7 lug 2021, 21:24 dinkc64 @.***> ha scritto:

where is this code located?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/libretro/FBNeo/issues/820#issuecomment-875870249, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMY7LOLP6G3NO2KPCW67L3TWSSYVANCNFSM47Y5J6GA .

dinkc64 commented 3 years ago

Hi crystalct, please see attached sys16_gfx.cpp: System16ARenderSpriteLayer() I think this would be the best way to do it, by using a cache variable.

sys16_gfx.zip

best regards,

crystalct commented 3 years ago

Fixed. image image

vaguerant commented 3 years ago

It's slightly less garbled than it was, but this definitely still isn't running correctly on my end.

Tested on the latest nightly, 45345d2, from 24 hours ago.

rchase-210711-025500 rchase-210711-025518

barbudreadmon commented 3 years ago

reopening then

crystalct commented 3 years ago

At this point it is no longer a big endian issue. Or somewhere there is a difference between WIIU code and other systems. If the cause were BURN_ENDIAN_SWAP_INT macros then all games would be affected.... maybe BURN_ENDIAN_SWAP_INT64 that is rarely used and it's present in this driver.

barbudreadmon commented 3 years ago

Both systems are using the same macro for BURN_ENDIAN_SWAP_INT64, even the 3 other macros are basically the same asm functions, so i have no idea what could cause the difference between the 2 platforms here, and since there is no way the wiiu is ever gonna play this at playable speed anyway (it's like half the speed of cps3, which already run like crap on wiiu iirc), i'm thinking of not caring and closing this as "wontfix" tbh.

vaguerant commented 3 years ago

I think "never" might be slightly strong, but until such time as the Wii U RetroArch port gets rewritten to move rendering off the CPU, use modern homebrew libraries and multi-threading, you're probably right. CPS-3 actually does run full-speed on the Wii U, but Rail Chase fluctuates between 30 and 55. If RetroArch ever gets some kind of full rewrite that brings performance closer to full speed, I can always try to convince you to reopen the issue then.

barbudreadmon commented 3 years ago

move rendering off the CPU, ... and multi-threading

That's not gonna help FBNeo's performance, which is software-rendered and mono-thread, or maybe marginally at best, like the way you'll get a 5% performance gain (and tons of bugs/regressions) when enabling "Threaded Video" on other devices.

Pretty sure i saw people complaining about FBNeo's cps3's performance on wiiu, are you maybe using frameskip to get around this ?

vaguerant commented 3 years ago

Nope, no frameskip here. Street Fighter III (sfiii2): IMG_20210711_191909_hdr

In fairness, it's technically not a rock-solid 59.94 (Wii U native refresh rate); during transition screens such as loading stages it can dip down to 58 or so. I don't know anything about how the game operates, but I assume it's doing something CPU-intensive like decompressing graphics at this point. So it's definitely right on the borderline of what the Wii U can do currently, but gameplay is perfectly smooth. I don't know if that holds for all CPS-3 games though, this is the only one I play.

You'd know better than me re: multi-threading and/or moving rendering of the RetroArch UI, overlays etc. onto the GPU; it probably wouldn't help as much as I thought. I do think there's still quite a gulf between the performance RetroArch can achieve currently and what the console is theoretically capable of, however. I've also seen talk about failing to take advantage of the Wii U's CPU paired singles which massively accelerate math performance but which aren't currently used in Wii U homebrew at all, partly due to a lack of library support (this isn't a RetroArch issue, just Wii U homebrew in general). Also, while this only benefits a handful of cores and drivers, I don't believe RetroArch on Wii U has any support for dynamic recompilers whatsoever (this is also most of why there's no usable cores for Sega 32X, PlayStation or N64 on the Wii U).

Really I'm just repeating (and probably mangling) things I've heard from Wii U homebrew devs, e.g. Gary who authored the Wii U port of Grand Theft Auto 3. That port of a ~20-year-old game doesn't quite maintain a steady 30 FPS@720p, but it's fairly self-evident that that's not because Grand Theft Auto 3 is too complex for the Wii U to handle. Wii U homebrew just isn't really making optimal use of the CPU/GPU "yet". Hard to say if it will ever get better given the small community, but I definitely think a lot of things that are unplayably slow on Wii U right now aren't that way purely because of the limitations of the console.

barbudreadmon commented 3 years ago

Wii U has any support for dynamic recompilers whatsoever

There probably aren't many dynarecs written for ppc cpu because the architecture was abandonned 10 years ago, so i doubt it's gonna change. Dynarecs are architecture-specific, you can't even use the same dynarec for 32/64 bits variants of the same architecture. Dynarecs translate instructions from 1 type of cpu towards instructions for 1 kind of other cpu, with FBNeo emulating dozens of cpus and being supported by dozens of platforms, what you are hoping for (dynarecs for ppc) is probably never gonna happen, we'll stay with the cross-platform approach of classic interpreters.

vaguerant commented 3 years ago

True enough, PPC dynarecs are not a priority, but the examples like the 32X's SH2s and the PSX MIPS CPU have existing PPC32 dynarecs in various cores that just aren't functional on Wii U because RetroArch has no handling for the memory management necessary. On Wii U, memory can be r-x or rw- but never rwx, so running a dynarec requires flipping the access modes for the memory area the dynarec is using. The PicoDrive and Beetle PSX libretro cores are both good examples of cores that already have functioning PPC dynarecs, but compiling them into Wii U builds just causes crashes due to invalid memory accesses, as there's nothing in the Wii U libretro port to manage the permissions.

I don't really expect FBNeo to bundle a bunch of in many cases non-existent PPC32 dynarecs just for the Wii and Wii U's sake. The only other platform that would particularly benefit, the PS3, already has broadly decent performance just using interpreted CPU cores, but all three platforms are kind of an afterthought as far as compatibility, they're certainly far from the lead focus. Either way, it's a moot point right now since RetroArch/libretro simply doesn't support dynarecs on Wii U. Of course, as a user it would be nice to have PPC dynarecs (and support for dynarecs in libretro) for a handful of popular CPUs, but they'd have to a) be written by somebody or b) already exist and be compatible and compatibly licensed, and c) be desired by Team FBNeo, and it seems like a high bar to clear those for a small niche audience like Wii/Wii U/PS3.

Ultimately I'm just a regular (and probably too optimistic) user, so I can't really claim to understand how much more performance can realistically be squeezed out of the Wii U, even before going to the dynarec well. Maybe the state of homebrew on Wii U is already as good as it gets and can possibly get, and no amount of hypothetical fast math libraries using paired singles would help the situation. All I can tell as an end-user is that the consensus among people better-informed that I am seems to be that Wii U homebrew could perform better than it does today.

crystalct commented 3 years ago

@barbudreadmon @dinkc64 I'm "endianizing" Raiden Fighters (d_seibuspi) and i'm near to finish but there is a little part that is driving me crazy: image

It should be: image

It seems a wrong address of memory. Any clue where to look at?

dinkc64 commented 3 years ago

That text is coming from the first (0) tilemap, I found this by entering the shots factory on the standalone fbneo and disabling layers until I found the layer. Then I search for nBurnLayer or nSpriteEnable depending which layer its from in the shots factory window. The thing is, if text layer was broken, the smaller font wouldn't be readable (Attack the enemy...) - it's the same layer. Something is getting half-broken, which is hard to track. Your best bet is to search "text" and make sure everything looks fine.

best regards,

crystalct commented 3 years ago

Hello, i moved to ejsakura cause the issue is greater (black screen). I debugged and logged all READ/WRITE and the first diiference between x86 and PS3 is in a eprom write. Before start a test i remove .nv files. There is program_write_word_32le that call sys386f_write_word with address 0x404 That program_write_word_32le is called from a WRITE16 from I386OP(mov_m16_ax) but in that moment AX registry is different. On x86 a series of 0x80 0xa0 0xe0 0xa0 0xe0 0x20 ..... will be written on EPROM but on PS3 are all 0x0 .....

This is the most complex file to endianize :(

crystalct commented 3 years ago

I'm sure at 99.9% to have finished to endianize it. Tons of read & write are correct before that WRITE. Those bytes written on EPROM where are taken?

crystalct commented 3 years ago

I found the culprit.... image

REG8(EAX) is conceptually wrong, randomly right in a little endian machine. Surely it was an oversight or a wrong copy and paste. REG8(AL) is correct.

barbudreadmon commented 3 years ago

Afaik, the cpu core was ported from MAME and that's the way they are currently implemented : https://github.com/mamedev/mame/blob/master/src/devices/cpu/i386/i386ops.hxx#L942 https://github.com/mamedev/mame/blob/master/src/devices/cpu/i386/i386ops.hxx#L1351 https://github.com/mamedev/mame/blob/master/src/devices/cpu/i386/i386ops.hxx#L1726 https://github.com/mamedev/mame/blob/master/src/devices/cpu/i386/i386ops.hxx#L1728

crystalct commented 3 years ago

They don't have a crazy Big Endian solver ^_^

dinkc64 commented 3 years ago

crystalct nice work :)

barbudreadmon commented 2 years ago

I was looking at the rchase issue again, and while ps3 and wiiu are using the same BURN_ENDIAN_SWAP_INT64 macro, i think their respective compilers might be interpreting it differently : one of the 2 compilers might be evaluating x twice (see macro content) and not the other, giving the different results.

I pushed a fix which should prevent x from ever being evaluated twice, could you test both ps3 and wiiu ?

crystalct commented 2 years ago

I had a fault on make: cc1.exe: error: unrecognized command line option "-Wno-unused-but-set-variable" Caused by:

# gcc specific flags
ifeq ($(shell $(CC) -v 2>&1 | grep -c "gcc version"), 1)
    CFLAGS += -ffloat-store -Wno-unused-but-set-variable
    CXXFLAGS += -ffloat-store -Wno-unused-but-set-variable
    ifeq ($(shell expr `$(CC) -dumpversion | cut -f1 -d.` \>= 5), 1)
        CFLAGS += -finline-limit=1200 -fcheck-new
        CXXFLAGS += -finline-limit=1200 -fcheck-new
    endif
endif

$(shell $(CC) -v 2>&1 output is:

Using built-in specs.
Reading specs from c:/psdk3v2/mingw/msys/1.0/local/cell/host-win32/ppu/bin/../../../target/ppu/lib/prxspec.4.1.1
Target: ppu-lv2
Configured with: /ps3/svn/current/toolchain/trunk/gcc/configure --target=ppu-lv2 --host=i386-pc-mingw32msvc --build=i686-pc-linux-gnu --prefix=/usr/local/cell/host-win32/ppu --with-sysroot=/usr/local/cell/target/ppu --with-headers=yes --disable-shared --disable-hosted-libstdcxx
Thread model: lv2
gcc version 4.1.1 (SDK400, $Rev: 3538 $)

In the old Makefile -Wno-unused-but-set-variable was a CXXFLAGS for ps3 environment edit: nope.... not working even as CXXFLAGS

barbudreadmon commented 2 years ago

i removed that flag, it's only to hide a few warnings so it's pretty useless

vaguerant commented 2 years ago

Just tested 4b5c6dc61 on Wii U. Sad to report there's no change here. It looks the same as it did in this comment.

crystalct commented 2 years ago

PS3 ok: image

crystalct commented 2 years ago

@vaguerant , did you have test BURN_ENDIAN_SWAP_INT64 macro using a very simple C code to see if it works correctly?

vaguerant commented 2 years ago

That's well beyond me, I mostly just know enough to run makes via Docker. If you're willing, you could write a short code snippet I could just paste into the top of sys16_gfx.cpp or something that prints the output of BURN_ENDIAN_SWAP_INT64 to the RetroArch log or something along those lines? Or if you opened a branch just for doing endian tests I could build that? Sorry I'm not more useful in that regard.

barbudreadmon commented 2 years ago

@vaguerant let's try something : could you delete those 4 lines and test the resulting core ? Maybe there is something wrong with the wiiu macros, and the default gcc macros would work better

vaguerant commented 2 years ago

@vaguerant let's try something : could you delete those 4 lines and test the resulting core ? Maybe there is something wrong with the wiiu macros, and the default gcc macros would work better

rchase-220104-003834 !!!!!

It works!

EDIT: As long as there's things going on with Y-Board, I just noticed this typo in the DIPs: https://github.com/libretro/FBNeo/blob/becc436c585760bfae2c4ac08b0f86c7111fabfe/src/burn/drv/sega/d_ybrd.cpp#L435 https://github.com/libretro/FBNeo/blob/becc436c585760bfae2c4ac08b0f86c7111fabfe/src/burn/drv/sega/d_ybrd.cpp#L439 mul-it-plier instead of mul-ti-plier

crystalct commented 2 years ago

@vaguerant let's try something : could you delete those 4 lines and test the resulting core ? Maybe there is something wrong with the wiiu macros, and the default gcc macros would work better

That is really weird..... BURN_ENDIAN_SWAP_INT32 and BURN_ENDIAN_SWAP_INT16 (WIIU) have always worked on other driver/sources

barbudreadmon commented 2 years ago

So those wiiu macros are somehow broken, the fact it seemingly didn't affect other games remain a mystery though... Supposedly the non-asm macros should be slower, could you check some demanding driver (cps3 maybe ?) to see if it's noticeable ?

vaguerant commented 2 years ago

I see no appreciable difference in Street Fighter III: Giant Attack (sfiii2). In gameplay it's a solid 59.94 (Wii U native FPS) with some brief and minor slowdown when loading into a new fight which isn't even obvious without the FPS display enabled. That's how it always is; everything I said about it in my comment six months ago holds.

crystalct commented 2 years ago

we should see ppu_intrinsics.h inside his SDK.... for PS3:

#define __stwbrx(base, value) do {      \
    typedef  struct {char a[4];} wordsize;  \
    wordsize *ptrp = (wordsize*)(void*)(base);      \
    __asm__ ("stwbrx %1,%y0"            \
       : "=Z" (*ptrp)           \
       : "r" (value));          \
   } while (0)
vaguerant commented 2 years ago

Supposedly the non-asm macros should be slower, could you check some demanding driver (cps3 maybe ?) to see if it's noticeable ?

Uh, is it possible the opposite is true? Another slow game I keep around for some reason is Night Striker (nightstr, Taito Z System) which I just re-tested on the build from right before I switched to the GNUC endian-flips. Framerate fluctuates, but I was getting 45 FPS in the mostly empty starting tunnel, then 42-43 FPS out in the open air with all the scaling enemies and buildings, etc. Then, switching back to the GNUC build, I was getting 47 in the tunnel and 45 out in the open air.

barbudreadmon commented 2 years ago

Uh, is it possible the opposite is true?

Who knows, i would expect the ppc optimized asm to be faster than the generic gcc builtin functions, but maybe i'm wrong ? I suppose it's yet another reason to use those builtin functions.

barbudreadmon commented 2 years ago

i disabled those macros, let's hope it won't cause more issues than it solves

barbudreadmon commented 2 years ago

Closing