cucholix commented 6 years ago

Description

Black screen or DSI dump error when trying to load a second consecutive content from the playlist, it load core/content fine the first time in a session but it fails to load a second content.

Expected behavior

Being able to load 2 o more contents in a single session.

Actual behavior

It loads core/content fine the first time in but it fails to load another content.

Steps to reproduce the bug

Open RA Wii U either from forwarder channel or HBL retroarch.rpx
Load a content from playlist
(optional) Close that content
Load a content from playlist from other core
Black screen or,
DSI error:

Bisect Results

Last working GIT: 29-dic-2017 d49b7b2

Version/Commit

All versions since 31-dic-2017 have the same problem, the 30-dic-2017 version freezes when launching a content.

System Info

Wii U 5.5.2, Haxchi CFW
SDXC 64GB class 10
Wii U formated HDD on the back USB ports
GCN adapter in the first front port (from left to right)

gblues commented 6 years ago

Sigh. Looks like this is my fault.

On the latest master build, I don't even get the first ROM loaded; it goes to "System memory error" right away.

If I disable the hidpad driver by ifdef'ing out the init method, I can load multiple ROMs in succession using the recent menu.

Investigating. Sorry for the inconvenience.

ashquarky commented 6 years ago

@gblues I wouldn't worry too much - this looks like it's more a result of using our own toolchain than anything explicit in your code. Check your thread allocation - note the sizeof(OSThread). I was fiddling with wut a while ago (the toolchain ours is vaugely based on) and had some weirdness, and it turns out that they had their OSThread structure wrong! Here's the fix they did - note they went from 69c-5f4 (=A8 bytes, 2A dwords) to 6a0-5f4 (=AC bytes, 2B dwords) in their unknown section. Looks like the same fix didn't make it into RetroArch - we're still on 2A dwords. Fix should be to change that to 2B.

I hate to say it, but this was probably the issue with controller_patcher as well.

gblues commented 6 years ago

Enh.. after disabling the hidpad driver, I narrowed the scope to just ifdef out the body of the attach handler, and it remained stable. This included the thread setup/teardown. But, maybe something in my code was trying to use that missing dword.

I've removed the ifdef and adjusted the size of OSThread. Let's see what happens.

gblues commented 6 years ago

Aaand crash.

ashquarky commented 6 years ago

Hm. Can you quickly printf("%08X", sizeof(OSThread)); or something? Might also be worth avoiding connecting a HID, which would theoretically stop the attach handler from running.

ashquarky commented 6 years ago

I'd suspect the reason the attach handler would have an effect is because it makes allocations/frees stuff, which can make this kind of bug surface or become hidden pretty much at random. This sort of thing is really dependent on heap layout, but until we can get some kind of ASAN (oh man) there's no obvious way to see the true cause.

That said, if fixing OSThread doesn't change it, I'm kinda stumped. I'd suggest avoiding trusting sizeof() when giving it structs from the toolchain (maybe add 0x100 to each allocation as a debugging step?)

gblues commented 6 years ago

This is after having updated wiiu/include/wiiu/os/thread.h: [INFO] Size of OSThread: 000006a0

I'll play with it tomorrow. Over-allocating isn't a bad idea, in moderation.

Traace commented 6 years ago

Build: Retroarch 1.7.0 Stable RPX Device: Wii U 8GB with 32GB SD Card @ haxchi 5.5.2

Start core.rpx via Homebrew Launcher: First Rom load crash at Mame 2003 due "bad execute" https://img1.picload.org/image/ddorpilr/img_20180104_142647.jpg

First Rom load heavy artifacts in Mame 2009. Memory mapping issues? https://img3.picload.org/image/ddorpiwi/img_20180104_150203.jpg

Both started with proper romset + rdb files

Start retroarch.rpx via Homebrew Launcher: Native Nintendo message appears it says memory error https://img2.picload.org/image/ddorppcr/img_20180104_145118.jpg

First Rom load Mame 2003 rpx via Retroarch Forwarder Channel (to eliminate HBL issues): Again just Native Nintendo message appears, memory error. ID: 160-2203

Cores like mgba & snes 2010 are working fine with both, HBL and forwarder channel

More Ref. : https://github.com/libretro/RetroArch/issues/5123

(Edit: I dont use playlist for mgba or snes cores. But for MAME)

gblues commented 6 years ago

OK, well the crash bugs I was responsible for, are fixed in that commit. But there are still others that are directly related to loading via playlist.

I build the mednafen NGP core and did some tests:

Loading multiple ROMs consecutively via "Load Content" works fine
Loading from either the history screen or the Favorites playlist crashes

I expect you can work around this by unzipping your ROMs. Using the NEStopia core and a directory of *.nes files, I can load games all day long from the recent playlist.

cucholix commented 6 years ago

@gblues loading games consecutively via playlist only works when loading games from the same core, whenever I exchange the core (via playlist) it goes black screen.

Example:

Open RA
Load a NES game from playlist (works)
Load another NES game from playlist (works)
Load Nth NES game from playlist (works)
Load Atari2600 game from playlist (black screen)

Ploggy commented 6 years ago

For me it happens without using Playlists too and using the same Core... like.. Load a GB game, Close content then load another GB game and repeat.. Eventually you will get a Blackscreen lock.

It is random though so reproducing it reliably is going to be difficult :(

gblues commented 6 years ago

Well, crap. Well, I guess I found my own bug: if you add a zip file to your favorites and then try to load the zip, it uses whatever your first core is (in my case, nestopia) and promptly crashes.

In this log, I go through Load Content > Start Directory > (drill down) > Load Archive several times, and then go to Favorites > zip_I_favorited, and it loads the nestopia core and I get a black screen.

retroarch.log

ashquarky commented 6 years ago

Black screens are a pain to deal with - our exception handler is sticky, so this is either an infinite loop or a crash just after a new RPL is loaded (before the exception handler gets reset). If it's the latter, something should show up in the console's syslogs (/vol/system_slc/logs, I can take you through dumping them if you need) but the former...

(looking at the log, this might also be an RPX loading issue, though that's hard to say for sure)

On 6 Jan 2018 07:11, "gblues" notifications@github.com wrote:

Well, crap. Well, I guess I found my own bug: if you add a zip file to your favorites and then try to load the zip, it uses whatever your first core is (in my case, nestopia) and promptly crashes.

In this log, I go through Load Content > Start Directory > (drill down) > Load Archive several times, and then go to Favorites > zip_I_favorited, and it loads the nestopia core and I get a black screen.

retroarch.log https://github.com/libretro/RetroArch/files/1607734/retroarch.log

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/libretro/RetroArch/issues/6025#issuecomment-355653540, or mute the thread https://github.com/notifications/unsubscribe-auth/AII1QQyc1azVcNnANzxucKNtuSlj5KeAks5tHoHngaJpZM4RQH95 .

gblues commented 6 years ago

OK, I've got something.

When you use "Add to favorites" command, it associates the ROM you selected with the currently loaded core, not the core recorded in the history.

Reproduction steps:

Load a NES game with the Nestopia core
Load a SNES game with the Snes9x core
While still in the SNES9x core, go to the History list
Highlight the NES game and press A
Choose "Add to favorites"
Quit Retroarch and grab sd:/retroarch/content_favorites.lpl

Expected result: the content_favorites.lpl will have the NES game associated with the nestopia core Actual result: it will be associated with the snes9x core

And, from that point on, if you try to load it from the playlist, RA will crash. How exactly it crashes largely depends on how well the core handles errors.

@cucholix I recommend that you inspect your *.lpl files in a text editor and look for the above.

gblues commented 6 years ago

Well, I think the DSI error may have been due to my not-freeing-memory bug that has since been fixed.

But the black screen problem is still happening when switching cores. This is after controlling for the wrong-core-in-playlist bug I managed to trip over above.

@QuarkTheAwesome can you give me the deets for dumping the syslog? My google fu is failing me.

cucholix commented 6 years ago

@gblues I don't use favorites, I load everything from playlist, so my playlists are already generated and don't suffer any change when loading a game, exiting, or changing games.

BTW I keep getting that DSI error, it's either DSI error or black screen.

After getting DSI upon changing to a NES game and looking at my retroarch.cfg the core changed correctly: libretro_path = "sd:/retroarch/cores/nestopia_libretro.rpx"

gblues commented 6 years ago

I have yet to reproduce the DSI error, although I've gotten the black screen error multiple times. @cucholix would you be able to attach your playlists? (just the *.lpl files). Might help me reproduce.

I went and looked at the methods referenced in the DSI error and.. yikes. None of the iosuhax code does any input validation, so you have potentially null pointers being passed into strlen() blindly and all sorts of other shenanigans. Gonna do some smoke testing to make sure the validation doesn't cause any other problems and then open a PR.

ashquarky commented 6 years ago

Based on the crash log, it's either a bad free or a corrupt heap. Might want to look at the "freeing a bad pointer" angle - I assume iosuhax uses allocations.

On 7 Jan 2018 7:49 am, "gblues" notifications@github.com wrote:

I have yet to reproduce the DSI error, although I've gotten the black screen error multiple times. @cucholix https://github.com/cucholix would you be able to attach your playlists? (just the *.lpl files). Might help me reproduce.

I went and looked at the methods referenced in the DSI error and.. yikes. None of the iosuhax code does any input validation, so you have potentially null pointers being passed into strlen() blindly and all sorts of other shenanigans. Gonna do some smoke testing to make sure the validation doesn't cause any other problems and then open a PR.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/libretro/RetroArch/issues/6025#issuecomment-355775500, or mute the thread https://github.com/notifications/unsubscribe-auth/AII1Qb68yEg-QQIe09Nlf0qcJmP-Xv4Gks5tH9xngaJpZM4RQH95 .

gblues commented 6 years ago

Well, this is interesting. Adding the input validation to the libiosuhax seems to have solved the crash.

Instead of crashing, log output simply stops.

RA remains usable, and ROMs load successfully. But no log messages.

cucholix commented 6 years ago

@gblues playlists.zip

ashquarky commented 6 years ago

@aliaspider did recently change up the network logger... Could it be related?

To be honest, this is screaming "heap corruption" to me.

On 7 Jan 2018 08:07, "gblues" notifications@github.com wrote:

Well, this is interesting. Adding the input validation to the libiosuhax seems to have solved the crash.

Instead of crashing, log output simply stops.

RA remains usable, and ROMs load successfully. But no log messages.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/libretro/RetroArch/issues/6025#issuecomment-355776916, or mute the thread https://github.com/notifications/unsubscribe-auth/AII1QWeguUwiiL4TufUgtt8SJywjG0Kfks5tH-CfgaJpZM4RQH95 .

gblues commented 6 years ago

This crash is still occurring. Looking at the new DSI error snapshot, looks like a bogus pointer is being fed into iosuhax.

I'm still not reproducing the DSI error though. Instead I get our old friend "system memory problem".

On a successful core launch, you'll see:

param->argc         : 0x00000002
param->argv         : 0x00802008
[INFO] [Video]: Does not have enough samples for monitor refresh rate ...
ARGV_PTR            : 0x00802000
argc                : 0x00000002

When it crashes, you only get the first three lines (the INFO line is the last thing that gets written). I even tried putting a log statement in immediately after opening the network connection to no avail--it's simply not getting to that point.

The crash only happens when switching cores. When re-opening the same core repeatedly, there's no crash.

ashquarky commented 6 years ago

System memory errors (160-2201?) are supposed to indicate a corrupt binary iirc. Whether that's a result of something in the toolchain or something in HBL's loading backend, I don't know. That ARGV_PTR is supposed to be the first thing that gets logged.

When you say "successful core launch", you mean that it works and the network logging stops?

gblues commented 6 years ago

By “successful core launch” I mean the game loads and runs. I was testing a fix for the stopped logging (the logger uses a poor-man’s spinlock and I think it got stuck). The stopped logging didn’t happen, at least. ;)

ashquarky commented 6 years ago

I got a log! This showed up in my syslogs after loading a snes9x game from my content history; coming from WonderSwan with no content loaded. These are both the 2018-01-07 nightlies.

Core1: Invalid instruction fetch from 0x10CABAC0 (from SRR0)
04;04;10;752: 
--Proc15-Core1--------- OSContext 0x10A9B3C0 --------------------

04;04;10;752: tag1  = 0x4F53436F (expecting 0x4F53436F)
04;04;10;752: tag2  = 0x6E747874 (expecting 0x6E747874)
04;04;10;752: TBR   = 0x000000D4_004ABB69
04;04;10;752: CR    = 0x42000000
04;04;10;752: CTR   = 0x10CABAC0
(quark: this offset [+0x39...] is likely wrong, compare it with the ones in the stack trace)
04;04;10;752: LR    = 0x0D38FE48 nsyshid|HIDGetDescriptor+0x391BFD8
04;04;10;752: SRR0  = 0x10CABAC0 <unknown>+0x0
04;04;10;753: SRR1  = 0x1000D072

04;04;10;753: state = 0x0006

04;04;10;753: r0   = 0x10cabac0 (     281721536)  r16  = 0x00000000 (             0)
04;04;10;753: r1   = 0x10a9ca18 (     279562776)  r17  = 0x00000000 (             0)
04;04;10;753: r2   = 0x1050b000 (     273723392)  r18  = 0x00000000 (             0)
04;04;10;753: r3   = 0x10ad8840 (     279808064)  r19  = 0x00000000 (             0)
04;04;10;753: r4   = 0x10a9de6c (     279567980)  r20  = 0x00000000 (             0)
04;04;10;753: r5   = 0x00000000 (             0)  r21  = 0x00000000 (             0)
04;04;10;753: r6   = 0x10a9b354 (     279556948)  r22  = 0x00000000 (             0)
04;04;10;753: r7   = 0x00000000 (             0)  r23  = 0x00000000 (             0)
04;04;10;753: r8   = 0x10a9cabd (     279562941)  r24  = 0x00000000 (             0)
04;04;10;753: r9   = 0x00000000 (             0)  r25  = 0x00000000 (             0)
04;04;10;753: r10  = 0x10a9cabd (     279562941)  r26  = 0x00000000 (             0)
04;04;10;753: r11  = 0x00000001 (             1)  r27  = 0x00000000 (             0)
04;04;10;753: r12  = 0x00000000 (             0)  r28  = 0x00000000 (             0)
04;04;10;753: r13  = 0x1050b000 (     273723392)  r29  = 0x00000000 (             0)
04;04;10;753: r14  = 0x00000000 (             0)  r30  = 0x10a9de60 (     279567968)
04;04;10;753: r15  = 0x00000000 (             0)  r31  = 0x10aa0000 (     279576576)
04;04;10;753: 
--Stack Trace--------------------------
04;04;10;753: 
Address:      Back Chain    LR Save
04;04;10;753: 0x10a9ca18:   0x10a9ca30    0x0d38fe28 nsyshid|HIDGetDescriptor+0x340
04;04;10;753: 0x10a9ca30:   0x10a9ca40    0x0d390004 nsyshid|HIDGetDescriptor+0x51c
04;04;10;753: 0x10a9ca40:   0x10a9ca58    0x0103c494 coreinit.rpl|__OSTestAssistReadPhysical32+0x6c
04;04;10;753: 0x10a9ca58:   0x00000000    0x01041d6c coreinit.rpl|OSCheckThreadStackUsage+0xb0

I dug around a bit in nsyshid, and it looks like it starts a thread as soon as it's loaded to poll for new devices (the thread is called {SYS HID Attach}). It calls a subroutine which has an indirect jump; which ends up jumping to the invalid address you see in SRR0. Based on the syslog, it seems like this is happening while the system is cleaning up the old core. I've attached the log below - look out for things like which process has the foreground (we're 15), calls to cleanup functions like OSDrivers_Done, calls to OSRestartGame, and things like VPADInit (which should give an idea of what the foreground app is doing). It's worth noting that we can put our own things in this log with OSReport, so it could be used as a debugging tool.

I'd guess one of two things:

nsyshid appears to be loaded as an application library rather than a system library (you can tell based on the memory locations). If some part of our code is overwriting parts of its data segments (which would be allocated on the heap) while it restarts, this could be the source of the problem. Its status as an application library means it could be placed after our own data/bss, which would make it vulnerable to overflows and bits of loader code that seemed safe at the time.
There's a flag continually checked by the thread which it uses to determine whether it should quit. It's possible this flag is meant to be set before loading the new core, but for whatever reason it's not. During the transition, all memory is supposed to be freed, zeroed and all the binaries are supposed to be reloaded, which would drastically change the address space. It's possible, then, that this is some kind of use-after-free. I'm looking into when and how the thread stops now.

sysconf_crashlog_0_20180108_101232.txt

ashquarky commented 6 years ago

So, uh, @gblues. There's talk of maybe moving your HID stuff behind a runtime toggle or an #ifdef until we can get it sorted. It's important that we have working nightlies for the users to use, after all. Any thoughts?

gblues commented 6 years ago

I did do a dev build with the HID initialization removed via ifdef, and still got the "system memory error" crash. So, at least in my case, it didn't help.

I am going to try a more complete removal (including skipping of the imports of nsyshid) and see if it helps. If it does, I'll open a PR (and I think that would validate your first guess and give us a route to explore for fixing it).

aliaspider commented 6 years ago

https://github.com/libretro/RetroArch/blob/master/frontend/frontend.c#L72-L82 looks like the rpx is loaded into memory (the exitspawn call) before the input driver is free'd. rarch_ctl(RARCH_CTL_DESTROY, NULL); is where the drivers are free'd so maybe move the exitspawn call after that and see if it helps. either that or directly call driver_uninit(DRIVERS_CMD_ALL);

ashquarky commented 6 years ago

Update: In regards to my second guess, it looks like nsyshid expects its entrypoint (start, not HIDSetup) to be called a second time by the system, and only then will the thread be stopped. Therefore the thread continuing to run is intended behaviour, so there's something else causing it to mess up. I'd expect that the entrypoint would be called at a later point during the core-to-core transition.

gblues commented 6 years ago

I still get the crash after completely extracting all HID-related code from the build.

I'll try what @aliaspider suggested.

ashquarky commented 6 years ago

Is this the iosuhax DSI, the system memory error, or a black screen? Might be worth pushing the code to your fork, I haven't been having any system memory errors so maybe there's something in our toolchains...

On 8 Jan 2018 14:55, "gblues" notifications@github.com wrote:

I still get the crash after completely extracting all HID-related code from the build.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/libretro/RetroArch/issues/6025#issuecomment-355880370, or mute the thread https://github.com/notifications/unsubscribe-auth/AII1QWx9gvyWc1RAfuD8Ck6rfxWHeW2Qks5tIZGrgaJpZM4RQH95 .

gblues commented 6 years ago

The system memory error. Putting the sanity checks into iosuhax seems to have put a stop to the black screen crashes, at least.

I've pushed up the ifdef'd code: https://github.com/gblues/RetroArch/commit/5894d0ef867d171df083830d2abc3da68ee5a607

See if that helps. Didn't help me, though. (neither did moving the driver_ctl call)

gblues commented 6 years ago

I have a thought. It's something of a guess.

Consider this bit of code:

https://github.com/libretro/RetroArch/blob/master/wiiu/hbl.c#L147-L181

This is where the RPX data loaded into memory is copied to kernel space.

If you load the same core over and over again, everything works fine because you're overwriting the code with itself.

But if you load a different core? They are diferent sizes:

-rw-r--r-- 1 gblues gblues 4333220 Jan 7 21:23 mednafen_ngp_libretro.rpx
-rw-r--r-- 1 gblues gblues 5711280 Jan 7 21:23 nestopia_libretro.rpx
-rw-r--r-- 1 gblues gblues 6486744 Jan 7 21:24 snes9x_libretro.rpx

So, suppose you load snes9x first: 6486744 bytes get loaded in. You play the game. Then you load mednafen_ngp. The first 4333220 bytes of snes9x gets overwritten by mednafen_ngp, but the other 2153524 bytes of snes9x are still there.

Now suppose the kernel is holding references to memory in that region. If you've just loaded snes9x over itself, no problem. The bits are still there, exactly as they were before. But if it changes? Maybe that's what's triggering my system memory errors. And it seems to jive with what @QuarkTheAwesome suggested above.

I'm also curious if it ever crashes when going from smaller -> larger core, or if the crashes only occur when loading a smaller core after loading a large one.

As far as why I'm getting the system memory error and nobody else seems to--is anyone else on 5.5.2U firmware?

Anyway, this is all speculation.

ashquarky commented 6 years ago

6065 fixed the issue I described. I haven't been able to make it crash yet

with those changes added.

@gblues I don't entirely get how the current RPX loaders work, though I do know that at some point it gets passed into the loader which puts it in the executable region, all nicely relocated. Given that, I don't see how the kernel would be referencing that copy of the file - it would make much more sense to deal with the relocated version, which is managed exclusively by the loader.

On Mon, Jan 8, 2018 at 6:40 PM, gblues notifications@github.com wrote:

I have a thought. It's something of a guess.

Consider this bit of code:

https://github.com/libretro/RetroArch/blob/master/wiiu/hbl.c#L147-L181

This is where the RPX data loaded into memory is copied to kernel space.

If you load the same core over and over again, everything works fine because you're overwriting the code with itself.

But if you load a different core? They are diferent sizes:

-rw-r--r-- 1 gblues gblues 4333220 Jan 7 21:23 mednafen_ngp_libretro.rpx -rw-r--r-- 1 gblues gblues 5711280 Jan 7 21:23 nestopia_libretro.rpx -rw-r--r-- 1 gblues gblues 6486744 Jan 7 21:24 snes9x_libretro.rpx

So, suppose you load snes9x first: 6486744 bytes get loaded in. You play the game. Then you load mednafen_ngp. The first 4333220 bytes of snes9x gets overwritten by mednafen_ngp, but the other 2153524 bytes of snes9x are still there.

Now suppose the kernel is holding references to memory in that region. If you've just loaded snes9x over itself, no problem. The bits are still there, exactly as they were before. But if it changes? Maybe that's what's triggering my system memory errors. And it seems to jive with what @QuarkTheAwesome https://github.com/quarktheawesome suggested above.

I'm also curious if it ever crashes when going from smaller -> larger core, or if the crashes only occur when loading a smaller core after loading a large one.

As far as why I'm getting the system memory error and nobody else seems to--is anyone else on 5.5.2U firmware?

Anyway, this is all speculation.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/libretro/RetroArch/issues/6025#issuecomment-355899421, or mute the thread https://github.com/notifications/unsubscribe-auth/AII1QcdMsPa0tKLQeqKS4VjiBs-dYpSxks5tIcZmgaJpZM4RQH95 .

gblues commented 6 years ago

WTF.

Welp, I’m kind of stuck until I can figure out why I (and apparently only I) am getting system memory errors when switching cores.

@quarktheawesome I sent you the details of my build process; can you look that over, and also give me details of your build environment? I’m using an Ubuntu 17.10 VM with the devkitpro blob in libretro/libretro-toolchains.

Think I’m gonna try a new SD card too.

ashquarky commented 6 years ago

@gblues Saw that, I don't see anything explicitly wrong with it. It is very different to what I do though. I couldn't get it to work - it would always fail with something like <core name> not fetched, skipping no matter where I put the core. In any case, it seems more or less fine.

My setup is a bit more ad hoc - I use Arch Linux as my main OS, so I don't need to worry about a VM. I have a devkitPro/devkitPPC install (powerpc-eabi-gcc (devkitPPC release 29-1) 6.3.0) which I installed with the archives from devkitPro's Sourceforge, rather than using RA's blob. In my ~/Code folder, I've got RetroArch cloned along with a few cores; all in their own directory. Then I just change into whatever I want to build and run the appropriate make (generally make -f Makefile.wiiu -j4 PC_DEVEL... for RA and make -j4 platform=wiiu for cores). Then I copy the core's .a file to ~/Code/RetroArch/libretro_wiiu.a and run make on RA, which links in the new core and makes an RPX. If I want to make several cores, I do it by hand in this fashion. Depending on what I want to do, I'll then either wiiload the .rpx over to HBL or I'll pop in my SD card and copy stuff into sd:/retroarch/cores as needed.

I've uploaded some .a files along with the generated RPXes for your testing pleasure - they don't cause any system memory errors for me; so it'll be interesting to see what they do on your system. These are all builds as of 581683d; just after #6065 was merged.

Here's some cores I compiled.

SHA1 checksums:
4f33da7635e4ed657155380e7eeba54d6461f5aa  mdfn_neopop.a
2b01c065c04682ae859ff505ed6adc12ddf3e816  mdfn_neopop.rpx
9ef1de554fb62cd50d88c5a89269ac594a3c761f  snes9x.a
7504e274129e4806811903be0bfc18ad14521c9e  snes9x.rpx

gblues commented 6 years ago

Thanks for those!

I don't get any crashes with those cores. So something is jacked up in my build environment. Annoying, but solvable.

ashquarky commented 6 years ago

I'd like it if you could send me some rpx/.a pairs that cause the error - I'd like to do some investigation into why this happens and this looks like an ideal way to diff the bad vs. good files. Y'see, the buildbot occasionally produces erroring files too, so it'd be good to get to the bottom of this.

On 9 Jan 2018 17:54, "gblues" notifications@github.com wrote:

Thanks for those!

I don't get any crashes with those cores. So something is jacked up in my build environment. Annoying, but solvable.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/libretro/RetroArch/issues/6025#issuecomment-356196451, or mute the thread https://github.com/notifications/unsubscribe-auth/AII1Qdr6K5QCnGvTiAP9SDr8ZaClX6rPks5tIw06gaJpZM4RQH95 .

gblues commented 6 years ago

Here's the bad build. I think it's at the same commit but I'm not sure. Should show on the load screen though.

bad-rpx.zip

ghost commented 6 years ago

Both of those cores are 5894d0ef8:

$ strings bad_mednafen_ngp.rpx | egrep '^[0-9a-f]{9}$'
5894d0ef8

gblues commented 6 years ago

That's the best I can do--I've already started nuking things to try to fix my build environment. @QuarkTheAwesome, you should do a build of 5894d0e and do your comparisons. Meanwhile, I'm working on new build scripts :)

ashquarky commented 6 years ago

Weird - I wasn't able to get your builds to crash at all. Looks like the 160-2203s may remain a mystery...

ashquarky commented 6 years ago

For sake of completeness, here's the checksums of gblues' builds which I couldn't get to crash (making sure there's been no bitrot in transit or anything)

sha1sums:
6cb5bbccfb8ba4edfe63577d768a7cba10a9b8e8  mednafen_ngp_libretro.rpx
f2550d81963c7e671d08ee46de5fd5c565b53bde  snes9x_libretro.rpx

gblues commented 6 years ago

How many load cycles did you go through? In my case, I have a matrix of 10 SNES and 5 NGC games and I basically just alternate down (repeating the NGC games). The time-to-crash for me varies, but I've generally been able to make it crash within 12-15 core switches.

When I tried your builds, I noticed that it took a long time for the cores to load; I'm assuming that's due to trying (and failing) to connect to your PC's LAN IP address. Did you have the same experience with my builds?

I've tuned up my build process and ensured I am using the latest devkitPPC, and will see if my builds still error out, and see if disabling the network logging makes a difference. If you want to test that theory on your own, put your Wii U on the 192.168.29.* subnet and put your PC on 192.168.29.137 and see if you get the same results with it successfully connecting to the net_listen.sh script.

I'll post my results tomorrow evening. Out of time for tonight.

gblues commented 6 years ago

Does buildbot make any use at all of dist-scripts/wiiu_core.sh?

ashquarky commented 6 years ago

Hrm, I didn't switch that many times, no. I didn't even do that with my cores, so they may be back on the chopping block if you can get them to crash. You're likely correct with the network logging - I noticed the same thing with your cores (I thought they black screened!)

Also: since it turns out I didn't actually follow the steps to reproduce the crash, that means it might not actually be a problem with your devkitppc install. Still can't hurt to fiddle though.

On 10 Jan 2018 18:44, "gblues" notifications@github.com wrote:

How many load cycles did you go through? In my case, I have a matrix of 10 SNES and 5 NGC games and I basically just alternate down (repeating the NGC games). The time-to-crash for me varies, but I've generally been able to make it crash within 12-15 core switches.

When I tried your builds, I noticed that it took a long time for the cores to load; I'm assuming that's due to trying (and failing) to connect to your PC's LAN IP address. Did you have the same experience with my builds?

I've tuned up my build process and ensured I am using the latest devkitPPC, and will see if my builds still error out, and see if disabling the network logging makes a difference. If you want to test that theory on your own, put your Wii U on the 192.168.29.* subnet and put your PC on 192.168.29.137 and see if you get the same results with it successfully connecting to the net_listen.sh script.

I'll post my results tomorrow evening. Out of time for tonight.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/libretro/RetroArch/issues/6025#issuecomment-356524608, or mute the thread https://github.com/notifications/unsubscribe-auth/AII1QXysNmXfA90FH22oqzOUu_C5_rKEks5tJGpagaJpZM4RQH95 .

gblues commented 6 years ago

Well, I think I may have found my problem!

FTPii.

So, my first build with the revised build system gave me the system memory error immediately.

I pulled the SD card and put it in my PC, and compared the MD5 of the core on my VM vs. the core on the SD card and.. they didn't match!

I deleted the bad copies and re-copied them via my PC, and lo and behold, no system memory error right off the bat with the same core.

Gonna do my stress test now.

gblues commented 6 years ago

Stress-test passed.

OK, I think we can close this issue out for real now, and I'll spend some time figuring out how to successfully remote-copy builds to my WiiU (toting the SD card just isn't an option).

ashquarky commented 6 years ago

Woah, that's crazy. If you don't need core switching, it's worth looking into wiiload to send individual RPXes over the network to HBL. I might let the dev of FTPiiU know what happened, see what he thinks too.

On 11 Jan 2018 05:54, "gblues" notifications@github.com wrote:

Stress-test passed.

OK, I think we can close this issue out for real now, and I'll spend some time figuring out how to successfully remote-copy builds to my WiiU (toting the SD card just isn't an option).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/libretro/RetroArch/issues/6025#issuecomment-356700741, or mute the thread https://github.com/notifications/unsubscribe-auth/AII1QRWy5br86b5u6733orBXjuXR2PV6ks5tJQd5gaJpZM4RQH95 .

gblues commented 6 years ago

I was using Dimok's FTPiiU, gonna try @FIX94's FTPiiU Everywhere to see if it works better.

libretro / RetroArch

[Wii U] Black screen or DSI upon loading a second content #6025

Description

Expected behavior

Actual behavior

Steps to reproduce the bug

Bisect Results

Version/Commit

System Info

6065 fixed the issue I described. I haven't been able to make it crash yet