turanszkij / WickedEngine

3D engine with modern graphics
https://wickedengine.net
Other
5.49k stars 574 forks source link

Editor Hangs when changing "content" script (on Linux). #855

Open ricejasonf opened 1 month ago

ricejasonf commented 1 month ago

Hi, I am not certain that this is related to linux specifically, but when I load different "content" scripts in the editor sometimes the application hangs and sometimes it won't even respond to signals. (ie I have to kill -9 the process.). I tried it in debug mode and found the problem point.

7238     // Initiate stalling CPU when GPU is not yet finished with next frame:
7239     if (FRAMECOUNT >= BUFFERCOUNT)
7240     {
7241       const uint32_t bufferindex = GetBufferIndex();
7242       for (int queue = 0; queue < QUEUE_COUNT; ++queue)
7243       {
7244         if (frame_fence[bufferindex][queue] == VK_NULL_HANDLE)
7245           continue;
7246 
7247         res = vkWaitForFences(device, 1, &frame_fence[bufferindex][queue], VK_TRUE, 0xFFFFFFFFFFFFFFFF);
7248         assert(res == VK_SUCCESS);
7249 
7250         res = vkResetFences(device, 1, &frame_fence[bufferindex][queue]);
7251         assert(res == VK_SUCCESS);
7252       }
7253     }

The call to vkWaitForFences hangs. I am new to this api (and modern graphics in general), but I see that the timeout is very large. Is this the right way to handle "CPU stalling"? I think at least this could loop on VK_TIMEOUT and use a reasonably small timeout (from what I have been googling). Also , here is the call stack from when I was able to stop the process:

* thread #1, name = 'WickedEngineEdi', stop reason = signal SIGSTOP
  * frame #0: 0x00007ffff791d9ed libc.so.6`__poll + 77
    frame #1: 0x00007fffda007cc3 libnvidia-glcore.so.550.78`___lldb_unnamed_symbol36082 + 147
    frame #2: 0x00007fffda422f59 libnvidia-glcore.so.550.78`___lldb_unnamed_symbol44349 + 73
    frame #3: 0x00007fffda407950 libnvidia-glcore.so.550.78`___lldb_unnamed_symbol44160 + 672
    frame #4: 0x00007fffda3239ae libnvidia-glcore.so.550.78`___lldb_unnamed_symbol42754 + 30
    frame #5: 0x0000555555c8668b WickedEngineEditor`wi::graphics::GraphicsDevice_Vulkan::SubmitCommandLists(this=0x000055555705a380) at wiGraphicsDevice_Vulkan.cpp:7247:26
    frame #6: 0x0000555555babf01 WickedEngineEditor`wi::Application::Run(this=0x00007fffff8d4990) at wiApplication.cpp:252:37
    frame #7: 0x00005555555b4661 WickedEngineEditor`sdl_loop(editor=0x00007fffff8d4990) at main_SDL2.cpp:16:19
    frame #8: 0x00005555555b4ce0 WickedEngineEditor`main(argc=1, argv=0x00007fffffffe818) at main_SDL2.cpp:162:23
    frame #9: 0x00007ffff7841d4a libc.so.6`___lldb_unnamed_symbol3264 + 122
    frame #10: 0x00007ffff7841e0c libc.so.6`__libc_start_main + 140
    frame #11: 0x00005555555b4285 WickedEngineEditor`_start + 37

I will play with this more next week, but I thought I would wait for some feedback on the intent with the large timeout.

Thanks.

EDIT: It occurred to me that maybe it is stuck in some loop and it just happens to always break while the process is waiting on that line (7247).

turanszkij commented 1 month ago

Hi, there is the "infinite" timeout for a purpose, it would be invalid to go further while the GPU is not finished with that frame which we are waiting on. Could you make sure that you have updated graphics drivers?

ricejasonf commented 1 month ago

I did a full update and verified I have the latest driver, and I was able to get to freeze again immediately (loading scripts under "Content").

local/nvidia 550.78-7
    NVIDIA drivers for linux

https://archlinux.org/packages/extra/x86_64/nvidia/

brakhane commented 4 weeks ago

@ricejasonf Wicked recently updated the dxcompiler to the May version, and that seems to be broken on Linux (#856) and caused all kinds of weird issues on various graphics drivers. It has been reverted to the previous version, can you update to master and give it another try?

ricejasonf commented 4 weeks ago

Sorry, but the problem still persists. It does not happen every time, but it still definitely freezes when loading a script.

brakhane commented 4 weeks ago

Did you delete the shaders/spirv directory just to make sure no compiled shaders from the dxcompiler remain?

ricejasonf commented 4 weeks ago

I deleted the entire build directory. If that is where they are located, then yes. (I am on the Discord if that is easier for back and forth stuff.)

ricejasonf commented 4 weeks ago

I can confirm that it is in fact getting stuck in that vkWaitForFences call. Consider the following small alteration to the point of interest:


7247         while (true) {
7248           res = vkWaitForFences(device, 1, &frame_fence[bufferindex][queue],
7249                                 VK_TRUE, uint64_t{10000000000});
7250           if (res == VK_SUCCESS) break;
7251           assert(res == VK_SUCCESS);
7252         }

Attempting to reproduce the error results in hitting the assert after 10 seconds of blank screen.

WickedEngineEditor: /home/jason/Projects/WickedEngine/WickedEngine/wiGraphicsDevice_Vulkan.cpp:7251: virtual void wi::graphics::GraphicsDevice_Vulkan::SubmitCommandLists(): Assertion `res == VK_SUCCESS' failed.
Aborted (core dumped)

It would be nice to find the bug, but I think there is also an opportunity for graceful error handling here.

ricejasonf commented 4 weeks ago

I realized that this is a duplicate of #804.

brakhane commented 4 weeks ago

Can you confirm that the hang always happens when queue is 3 (QUEUE_VIDEO_DECODE)? And never with any other value?

ricejasonf commented 4 weeks ago

I tried it several times and the value for queue was consistently 3. So, yes, that looks like the enum value for QUEUE_VIDEO_DECODE as you stated.

ricejasonf commented 4 weeks ago

When resizing the widget window for the entity component system, I can reproduce this very quickly just wagging it back and forth. Still always queue == 3