shpaass / yafc-ce

Powerful Factorio calculator/analyser that works with mods
GNU General Public License v3.0
84 stars 21 forks source link

Mod loading error dialog is not showing under Linux #348

Open veger opened 1 week ago

veger commented 1 week ago

When a mod fails to load and error dialog is suppose to show:

image

As reported by #346 this does not for for Linux.

Note the the error is logged in stdout/stderr, but this is less convenient compared to the dialog (and lacks the feature to disable the mod and retry).

veger commented 1 week ago

I found that when I throw new Exception("Boom") before this line (in Linux) the exception screen shows: https://github.com/shpaass/yafc-ce/blob/f59daaead8926b9ce1c8e96306523009ec95362f/Yafc.Parser/LuaContext.cs#L494

But when I throw the same exception after this, it is not caught (anymore) and the unhandled exception is shown on the console.

I am not a C# developer/expert, @SWeini @Dorus do you have any experience with exceptions not being caught anymore? Maybe because C# 'switched' to Lua to execute some code that the exception handler is 'disabled/paused' and needs to be reinstate (for Linux)? Or maybe you have any similar experiences with exception handling?

SWeini commented 1 week ago

I have seen exceptions not being caught anymore in a ruby integration. the ruby runtime handles exceptions by somehow jumping out of the call stack. doing some protected call instead (similar to lua_pcallk here) did the trick to catch those

in general the pcallk call looks a bit weird. I'm by no means a lua expert, but that -2 - argcount argument looks suspicious - it is supposed to be an index to a function on the lua stack

but then, is this an issue with lua exceptions or .net exceptions? I haven't tried to reproduce the issue yet

oh, and I also remember that the dotnet runtime was able to continue on windows, but it performed a hard crash on macos

I'd start by investigating how exactly the lua_pcallk is supposed to work

veger commented 1 week ago

It is about .net exceptions not being caught (Lua errors are checked when pcall returns and thrown as a LuaException in .net world). As mentioned, just throwing a (random) exception after the pcall is failing as well (so it had nothing to do with failing Lua code throwing it).

It feels like the pcall disabled the exception handler(s) (in Linux), causing the unhandled exception being shown in the terminal... But I do not know enough of .net and their exception handling processes to do a deep dive in it (a (not-so) quick Google search didn't show me anything usable)

Dorus commented 1 week ago

C# exceptions are usually caught in a Try {} Catch {} block. I dont see one of those around the code you linked, all i see is if (result != Result.LUA_OK) but that line would obvious not be reached when C# throws. Anyway the try catch block might very well be a nr of methods higher up the call stack and i wouldn't be able to tell what makes Linux special here that the exceptions are not caught there.

veger commented 1 week ago

The catch block is here: https://github.com/shpaass/yafc-ce/blob/6a8cd05717f9828bc6525dfbb45e2c1206f3768c/Yafc/Windows/WelcomeScreen.cs#L402

Here it starts loading the project mods, on Windows the catch block works, and on Linux it doesn't. My research found that after calling the lua_pcall function (to invoke a Lua func) from C# messes up the (Linux) C# exception handler (before calling it, my test exception was properly caught). And thus the catch clock above is not working anymore, resulting the Yafc crashing with an unhandled exception.

Dorus commented 6 days ago

Just to be clear: Any breakpoint on line 403 is not hit? (First line of the exception handler).

That does sound like a .net bug, might be worth raising it on one of the Microsoft pages.

veger commented 6 days ago

Just to be clear: Any breakpoint on line 403 is not hit? (First line of the exception handler).

breakpoint -> exception

Yes, any exception that is thrown after line 403 is not caught anymore... I agree it feels like a .net bug, but I have no idea how to reproduce in a simple example (I doubt that the .net devs are willing to run YAFC to reproduce the actual issue...)

So I was hoping that someone would know about this issue, and is able to provide a workaround

Dorus commented 6 days ago

Yeah true, some minimal testcase would be beneficial to reporting the bug, but even without one, easy steps to reproduce on the YaFC-CE project might be enough. Are there many external files needed? Since i'm not running linux, i'm limited in how far i can dive into this myself.