libretro / LRPS2

GNU General Public License v2.0
159 stars 47 forks source link

Quick fix to kill lingering processes after exit #244

Closed Toastarrr closed 1 year ago

Toastarrr commented 1 year ago

Adds a 5 second timeout on an attempt to call vu1Thread.Cancel() inside pcsx2's destructor.

This should allow the lingering process to die after waiting 5 seconds for a mutex that will never unlock, but it does not fix the actual problem of this scenario occurring.

LibretroAdmin commented 1 year ago

Yeah, this seems to work. It takes a while for the zombie process to disappear from the task manager but it does eventually happen. Thanks for this and also for your other PRs.

@sonninnos was trying to research this issue too the past few days ago and he had this diff patch, but he still has issues with loading content a second time after closing the core/content with 'Close Content'.

https://gist.github.com/LibretroAdmin/ed7b65f8283483a3b42fba17ac37ab8b

Maybe you can further figure things out based on this. I'll merge this PR for now since it's already objectively an improvement.

BTW, since you seem to be a good contributor, there are a couple of things that we hope to eventually achieve for this core:

Any good contributors that could lend us a hand in doing all that would be much appreciated.

LibretroAdmin commented 1 year ago

@Toastarrr Hmm, sometimes the zombie process can still happen. Not all the time, but sometimes. Have you been able to reproduce this?

LibretroAdmin commented 1 year ago

@Toastarrr I've spent the past few days backporting a ton of stuff over from upstream, however the lingering process issue sometimes continues to persist at least on Windows. I can trigger it easiest with a game like Tekken Tag Tournament, selecting Close Content and then exiting RetroArch. Sometimes retroarch.exe leaves correctly from Process Manager and other times it just remains stuck.

Here is what I get when I dump the process into VS2022 and press 'Break All'.

image

LibretroAdmin commented 1 year ago

@Toastarrr After running it through VS debugger I came up with this brute force approach -

https://github.com/libretro/LRPS2/commit/715824deaf5fb149df783b8020f2841f4223929a

This seems to fix the lingering process so far (at least in the sense that the process gets despawned after the timeout elapses). Let me know if it works out on your end still as well and doesn't cause any extra additional issues.

LibretroAdmin commented 1 year ago

Nvm, it can still happen in some games. Sucks that this threading code is so problematic in this core...

LibretroAdmin commented 1 year ago

After plugging this -

https://github.com/libretro/LRPS2/commit/94c285ab2da67dfc2ed50e3f199713e8205173ff

I don't get a lingering process anymore after exiting and closing Soul Calibur II.

LibretroAdmin commented 1 year ago

Yeah, what a mess this is - it only happened once but I got another instance in Tekken Tag Tournament now where it gets stuck somewhere else during semaphore destruction - I think it's the m_sem_OpenDone semaphore in MTGS.cpp which gets used to see if the GS 'plugin' has finished initializing and reported any throwable errors or not. From there it seems to get stuck on a pthread mutex lock call - always the one mutex lock that is the issue where it just stays indefinitely waiting forever.

image

The weird thing is that it doesn't happen all the time, sometimes the process gets properly killed.

Hopefully we can get closer to untangling this threading mess and perhaps replacing it with something better (libretro-common's rthreads perhaps).

LibretroAdmin commented 1 year ago

OK, I think this might actually duct-tape fix it robustly enough for now - https://github.com/libretro/LRPS2/commit/071c01d5b35e23097e148ce00ae82ad5519512b5

Still doesn't fix the underlying issue but hopefully we can get to that as we remove most of the wxWidgets code and other dodgy threading code and replace it with libretro-common equivalents.

Toastarrr commented 1 year ago

@LibretroAdmin Yeah, I've been investigating this, but it seems like there's too many spots for threads to get locked up. I don't know if attempting to actually fix the issue before replacing the thread implementation is worth it.

It might be better to place some timeouts wherever needed like you've done and then hopefully discover the source of the issue while creating a new implementation.

I suspect the core crashing when loading a game for the 2nd time is probably also caused by this threading issue, but I'm not sure.