shpaass / yafc-ce

Powerful Factorio calculator/analyser that works with mods
GNU General Public License v3.0
40 stars 15 forks source link

Yafc freezes and crashes a LOT #178

Open sunrosa opened 3 weeks ago

sunrosa commented 3 weeks ago

Yafc freezes and then crashes with no stack trace in maybe 1 out of 30 operations/clicks. It causes the window stopped responding prompt, and never unfreezes. I've tried v0.7.0, v0.7.1, and v0.7.2 and they all have the same issue. I'm using the full pyanodons AE suite. I've tried many reinstalls. I've tried both running it on a HDD and an NVMe. It seems to crash less on the NVMe. It also seems to crash MUCH more if I'm clicking too quickly on things. If I make less than 10 clicks in 5 seconds, it rarely crashes. If I'm really active with the program, especially with middle-clicking things, it is bound to crash almost immediately. It also crashes sometimes on alt-tab. I don't see a log anywhere. This has been consistent for the past 5 or so days throughout hours of use.

sunrosa commented 3 weeks ago

The full mod list (note there are a few disabled mods)

image image image image

DaleStan commented 3 weeks ago

This happens on and off for me, and it's currently not happening. Would you replace the three SDL dlls that came with YAFC with the ones I added here, and let me know if that fixes things, please?

sunrosa commented 3 weeks ago

Where do these DLLs come from? How do they differ from what Yafc uses?

DaleStan commented 3 weeks ago

Those are newer versions of the same DLLs, downloaded from https://github.com/libsdl-org/SDL/releases/tag/release-2.30.4, https://github.com/libsdl-org/SDL_image/releases/tag/release-2.8.2, and https://github.com/libsdl-org/SDL_ttf/releases/tag/release-2.22.0. Whenever I've checked, it's hanging in the SDL dlls, so I'm hoping that just updating SDL to version from 2024 instead of 2020 will fix whatever is broken.

sunrosa commented 3 weeks ago

I will try the new DLLs. It seems to launch okay. How do you know it hangs in SDL?

Okay, like 5 seconds after opening the program, messing with the module templates, it crashed. I don't know if this makes a difference.

It crashed again after just another few minutes of use. The second crash had to do with me right-clicking things to delete them from the production chain.

It definitely seems like it's crashing more with these new DLLs. It's crashed a third time from an alt-tab despite me doing basically nothing in the program the minutes beforehand.

I removed the DLLs and replaced them with stock. It crashed a 4th time when trying to open usages of an item.

shpaass commented 3 weeks ago

If you can build from source and launch the debug mode, I think there is a bigger chance that IDE catches the exception that crashes the app. Just as an angle to investigate the problem.

sunrosa commented 3 weeks ago

Is there no error logging? It closes without a message, and I couldn't find any file. If not, what IDE is required? I vastly prefer VSCode if that's possible. Where are the build instructions? I've worked with .NET before so I don't need a rundown of the install of all that.

veger commented 3 weeks ago

On Linux there are a lot of log messages in the console, but for Windows there is still this open ticket #54...

sunrosa commented 3 weeks ago

If you open it in MINGW64 it has log output to stdout. I am going to see what I can do.

sunrosa commented 3 weeks ago

Welp. it's a segfault. How do you debug these? Is there a core dump somewhere? Do I have to build with debug flags? image

I think it segfaulted when I was spamming minimize and unminimize. But the crashes happens often other times too. I think it's some SDL call causing this issue though. Are you working with raw memory here at all?

Since this appears to be happening only on a select few people's systems seeing the lack of major crash issues here. Maybe it's because you're linking to a shared dynamic library on my own system. I am on Windows, if that matters.

sunrosa commented 3 weeks ago

There's another segfault. In normal use. image

sunrosa commented 3 weeks ago

It hung again. This time, no segfault, just a freeze. I was middle-clicking the void energy source next to the player icon in the production chain. I tried middle-clicking void on a fresher instance of the program, and it opened just fine. The log tail when it hung was just SDL window events and no errors.

shpaass commented 3 weeks ago

@DaleStan do I understand it correctly that you're also using YAFC on Linux? Just wanted to check.

DaleStan commented 3 weeks ago

@have-fun-was-taken No, I'm on Windows 10.

@sunrosa I know it's in SDL because when I pause the hung executable, the debugger tells me the program is in the SDL_RenderPresent call at DrawingSurface.cs:63, or occasionally one of the other SDL_* calls in that file.

I don't often see crashes, though; usually it just hangs. It hangs both with and without the debugger. It might be related to doing other GPU-intensive tasks at the same time, but I don't have any data to support that feeling.

sunrosa commented 3 weeks ago

Do you know which implementation of Present() it hangs in? Do you have a call stack?

Also I'm pretty sure it crashes most often when trying to mess with the modules and beacons menu. I'm pretty sure 0.7.0 crashes less than the more recent two versions. I had to downgrade when I tried 0.7.1. But 0.7.2 certainly crashes the most.

Edit: I actually think 0.7.2 was the main (but not only) cause of the problem. It's quite stable by comparison on 0.7.1.

veger commented 3 weeks ago

@have-fun-was-taken I am on Linux (never had a random crash or hang)

DaleStan commented 3 weeks ago

This is the stack trace I got when it hung, using the new SDL dlls. I can step out of the first three frames, but any attempt to step out of amdxn64.dll!...0283() just resumes the hang. I tried changing a driver setting that seemed suspicious, but that didn't help.

image

If you can compile Yafc, would you try my 178-yafc-freezes-and-crashes-a-lot branch branch, please? You'll lose the Enter and Escape keyboard shortcuts, but if that makes it behave better, I'll have to revisit what I did in #154. (EDIT: I came up with another way to implement #154, but this new branch doesn't completely fix things either.)

chrirupp commented 2 weeks ago

I am also having a lot of crashes on Windows. 0.7.2 crashes so much that it is unusable. 0.7.1 crashes much less, but still quite often. I could not identify a pattern for the crashes.

shpaass commented 2 weeks ago

@chrirupp Is it true for the earlier versions too, or has it started only in 0.7.1?

chrirupp commented 2 weeks ago

I downgraded to 0.7.0 now and did not get any crashes so far.

Fractional606 commented 1 week ago

I also was getting a lot of crashes on 0.7.2 and thus downgraded to 0.7.0 and I don't get any any more.

shpaass commented 1 week ago

I've marked 0.7.1 and 0.7.2 as unstable for now. The current suggested release to use is 0.7.0 until we fix the instability.

lexiconvict commented 1 week ago

I am also running 0.7.2.0 on Windows 10 and am experiencing very frequent crashes; most of the time abruptly, but sometimes it just freezes (not responding) and I must close it out from Task Manager. This is just on a completely vanilla Factorio run (lol), and for some reason, it wasn't doing this at all just last week with a full Seablock modpack. Nothing changed except now it's a new run with all mods disabled. I'd be happy to provide more info and technical specs if that would help you guys.

Cheers!

(I just downgraded to 0.7.0 for now. Will update this comment if that also gives me issues)

UPDATE: No issues at all since downgrading to 0.7.0. It seems like ya'll have already pinpointed the issue as happening after that release, but thought I'd corroborate with that as well.

veger commented 6 days ago

I think we need to find someone (technical) that has (or can reproduce) the issue, and have them do a bisect between 0.7.0 and 0.7.2 to find the commit that causes the issue. As we are stumped (and cannot reproduce).

Hopefully finding the offending commit can shine a bit of light on the cause so we can attempt to fix it.

DaleStan commented 6 days ago

Does anyone have experience with valgrind? If this is a double-free or use-after-free bug, valgrind may be able to find it even on OSX or Linux.

In the meantime, I'm working on reducing our use of SDL, but Avalonia is fighting with me. (Also, there's a lot of SDL in there.)

sfoster1 commented 5 days ago

I bisected yafc from 0.7.3 to 0.7.0. My test workflow was

At some point in here, yafc would crash or freeze. I bisected twice because I didn't really trust the first one; the one that seems more relevant is e451cf51402622f6538f5f82a3b8d90378dd8885 . On that commit I pretty reliably get a hang. I'm running in debug mode and do the good old run - pause - run - pause to see where I'm typically hanging and it is usually a callstack like this:

    Yafc.UI.dll!Yafc.UI.DrawingSurface.BeginRenderToTexture(out SDL2.SDL.SDL_Rect textureSize) Line 49  C#
    Yafc.dll!Yafc.MainScreen.FadeDrawer.CreateDownscaledImage() Line 657    C#
>   Yafc.dll!Yafc.MainScreen.ShowPseudoScreen.AnonymousMethod__64_0(object x) Line 510  C#
    Yafc.UI.dll!Yafc.UI.Ui.ProcessAsyncCallbackQueue() Line 212 C#
    Yafc.UI.dll!Yafc.UI.Ui.ProcessEvents() Line 158 C#
    Yafc.UI.dll!Yafc.UI.Ui.MainLoop() Line 50   C#
    Yafc.dll!Yafc.Program.Main(string[] args) Line 46   C#

Seems like the SDL_CreateTexture call was hanging, so I enabled mixed mode debugging which presumably downloaded the SDL debug solib, and then the commit stopped failing altogether, regardless of debug mode. Cool.

sunrosa commented 3 days ago

It should be noted that even moving the mouse can cause the crash under rare circumstances I believe. I'll be many seconds after performing an action, with yafc (0.7.1) still responsive, and then move my mouse a bit and it crashes (not freezes; closes instantly without an error). I've never seen moving the mouse cause a freeze.

sfoster1 commented 3 days ago

For a bit now I've been running 0.7.3 release under vscode debugger with appverifier running (this is in the drivers docs but it also works for applications) with the basics tests enabled. The hook here is that if an appverifier test fails it should inject a debughalt, and also if a crash happens then running the binary under the debugger should catch the halt (just attaching once the process is spawned won't do it, the process gets terminated). This is necessary because generally actually building and debugging 0.7.3 won't crash or freeze.

Unfortunately so far the crash I've got was in a thunk to gpu drivers. The main thread was in the render path. Since I'm running the release build, lots of data isn't accessible, but I could spotcheck things like the native pointers in the application-object renderer handle and they seemed fine. I'm going to keep running it and post dumps or the windows equivalent when I see crashes.

So far AppVerifier hasn't complained about anything.

craig-johnston commented 3 days ago

Despite being attached in a debugger, I've had it crash on me and the debugger just detach. With this error message: The program '[27664] Yafc.dll' has exited with code -1073741819 (0xc0000005). 0xc0000005 is segfault.

It also doesn't appear as a crash to Windows (not in event log as a crash). No popup to report the issue, etc. Everything was responsive, moving mouse around then crash.

sfoster1 commented 2 days ago

Despite being attached in a debugger, I've had it crash on me and the debugger just detach. With this error message: The program '[27664] Yafc.dll' has exited with code -1073741819 (0xc0000005). 0xc0000005 is segfault.

It also doesn't appear as a crash to Windows (not in event log as a crash). No popup to report the issue, etc. Everything was responsive, moving mouse around then crash.

Yeah this happens if it's attached to a debugger (like, you start the program from explorer and then attach in visual studio) but if you start the program from visual studio's debugger, which you can do by going to debug - yafc debug properties - new launch profile - select the release exe, then it will actually catch the segfault or whatever. This view: image

sunrosa commented 1 day ago

I've managed to get yafc to freeze even on 0.7.0 just now. It freezes much much less than the other versions though. There was no error message. I was just changing milestones and then I had exited the milestones menu, saved, and then it froze. It may have frozen right after I had pressed ctrl+n