nxrighthere / UnrealCLR

Unreal Engine .NET 6 integration
MIT License
3.13k stars 231 forks source link

Segfault on Linux when unloading module during cook phase #232

Closed captaincrutches closed 2 years ago

captaincrutches commented 2 years ago

I'm consistently getting a segfault ("Signal 11" in UE4 logs) when trying to package a project for Linux.

My environment is:

I can reproduce by:

The packaging process fails during the cook phase. The editor's output log just mentions "Signal 11" and redirects me to a Log.txt file, which in turn redirects me to a Cook-<timestamp>.txt file, which says:


LogModuleManager: Shutting down and abandoning module UnrealCLR (274)
LogCore: === Critical error: ===
Unhandled Exception: SIGSEGV: invalid attempt to read memory at address 0x0000000000000008

LogCore: Fatal error!

0x00007f5a242d07a0 libc.so.6!UnknownFunction(0x3d7a0)
0x00007f5951cc4100 libUE4Editor-UnrealCLR.so!UnrealCLR::Module::ShutdownModule() [/home/captaincrutches/Projects/UnrealTest/Plugins/UnrealCLR/Source/UnrealCLR/Private/UnrealCLR.cpp:1372]
0x00007f5a2d577258 libUE4Editor-Core.so!FModuleManager::UnloadModule(FName, bool) [/home/captaincrutches/Downloads/UnrealEngine/Engine/Source/Runtime/Core/Private/Modules/ModuleManager.cpp:601]
0x00007f5a2d577eae libUE4Editor-Core.so!FModuleManager::UnloadModulesAtShutdown() [/home/captaincrutches/Downloads/UnrealEngine/Engine/Source/Runtime/Core/Private/Modules/ModuleManager.cpp:731]
0x0000000000246948 UE4Editor!FEngineLoop::Exit() [/home/captaincrutches/Downloads/UnrealEngine/Engine/Source/Runtime/Launch/Private/LaunchEngineLoop.cpp:4294]
0x0000000000246f70 UE4Editor!GuardedMain(char16_t const*) [/home/captaincrutches/Downloads/UnrealEngine/Engine/Source/Runtime/Launch/Private/Launch.cpp:193]
0x00007f5a2456f056 libUE4Editor-UnixCommonStartup.so!CommonUnixMain(int, char**, int (*)(char16_t const*), void (*)()) [/home/captaincrutches/Downloads/UnrealEngine/Engine/Source/Runtime/Unix/UnixCommonStartup/Private/UnixCommonStartup.cpp:264]
0x00007f5a242bc2fa libc.so.6!UnknownFunction(0x292f9)
0x00007f5a242bc3a8 libc.so.6!__libc_start_main(+0x77)
0x00000000002304d9 UE4Editor!_start()

Here's the entire files, in case there's something else significant in there: Cook-2022.01.23-01.27.18.txt Log.txt

I'm brand new to this plugin and even UE4 itself, so it's entirely possible this is environmental or just me doing something wrong with installation, in which case please do point out my mistake. And of course, let me know if there's any other logs/info that would help, or if I should make any local code changes to see if they help.

nxrighthere commented 2 years ago

Weird. On Windows, this stage passes just fine. Unfortunately, I don't have a Linux machine next to me to debug this at the moment.

nxrighthere commented 2 years ago

Try to change UnrealCLR::Module::ShutdownModule() in UnrealTest/Plugins/UnrealCLR/Source/UnrealCLR/Private/UnrealCLR.cpp to:

void UnrealCLR::Module::ShutdownModule() {
    if (UnrealCLR::Engine::Manager && !UnrealCLR::Engine::Manager->IsPendingKillOrUnreachable())
        UnrealCLR::Engine::Manager->RemoveFromRoot();

    UnrealCLR::Engine::Manager = nullptr;

    if (OnWorldPostInitializationHandle.IsValid())
        FWorldDelegates::OnPostWorldInitialization.Remove(OnWorldPostInitializationHandle);

    if (OnWorldCleanupHandle.IsValid())
        FWorldDelegates::OnWorldCleanup.Remove(OnWorldCleanupHandle);

    if (HostfxrLibrary)
        FPlatformProcess::FreeDllHandle(HostfxrLibrary);
}
captaincrutches commented 2 years ago

I've made those changes and added some log lines on top, and I'm still getting a similar segfault:

LogModuleManager: Shutting down and abandoning module UnrealCLR (274)
LogUnrealCLR: Warning: UnrealCLR module shutting down
LogUnrealCLR: Warning: Cleaning up engine manager
LogCore: === Critical error: ===
Unhandled Exception: SIGSEGV: invalid attempt to read memory at address 0x0000000000000008

LogCore: Fatal error!

0x00007f0a8b9727a0 libc.so.6!UnknownFunction(0x3d7a0)
0x00007f09d020f4a0 libUE4Editor-UnrealCLR.so!UnrealCLR::Module::ShutdownModule() [/home/captaincrutches/Projects/UnrealTest/Plugins/UnrealCLR/Source/UnrealCLR/Private/UnrealCLR.cpp:1375]
0x00007f0a94c19258 libUE4Editor-Core.so!FModuleManager::UnloadModule(FName, bool) [/home/captaincrutches/Downloads/UnrealEngine/Engine/Source/Runtime/Core/Private/Modules/ModuleManager.cpp:601]
0x00007f0a94c19eae libUE4Editor-Core.so!FModuleManager::UnloadModulesAtShutdown() [/home/captaincrutches/Downloads/UnrealEngine/Engine/Source/Runtime/Core/Private/Modules/ModuleManager.cpp:731]
0x0000000000246948 UE4Editor!FEngineLoop::Exit() [/home/captaincrutches/Downloads/UnrealEngine/Engine/Source/Runtime/Launch/Private/LaunchEngineLoop.cpp:4294]
0x0000000000246f70 UE4Editor!GuardedMain(char16_t const*) [/home/captaincrutches/Downloads/UnrealEngine/Engine/Source/Runtime/Launch/Private/Launch.cpp:193]
0x00007f0a8bc11056 libUE4Editor-UnixCommonStartup.so!CommonUnixMain(int, char**, int (*)(char16_t const*), void (*)()) [/home/captaincrutches/Downloads/UnrealEngine/Engine/Source/Runtime/Unix/UnixCommonStartup/Private/UnixCommonStartup.cpp:264]
0x00007f0a8b95e2fa libc.so.6!UnknownFunction(0x292f9)
0x00007f0a8b95e3a8 libc.so.6!__libc_start_main(+0x77)
0x00000000002304d9 UE4Editor!_start()

Best I can tell, the the shutdown method is only happening once, but is choking on the call to RemoveFromRoot.

nxrighthere commented 2 years ago

It seems like a bug in the engine on Linux. UnrealCLR::Engine::Manager instantiated and added to the root at initialization of the plugin, so it should remain valid until plugin's shutdown.

Please, send a bug report from the crash reporter.

captaincrutches commented 2 years ago

Digging into this more - it looks like the unique id of UnrealCLR::Engine::Manager is changing between StartupModule() and ShutdownModule(). The segfault is happening because, when we call RemoveFromRoot(), the (changed) id is not in the global UObject registry.

LogUnrealCLR: Display: StartupModule: Host library loaded successfuly!
LogUnrealCLR: Display: StartupModule: Host functions loaded successfuly!
LogUnrealCLR: Display: StartupModule: Host runtime assembly loaded succesfuly!
LogUnrealCLR: Display: StartupModule: Host runtime assembly initialized succesfuly!
LogUnrealCLR: Warning: StartupModule: Manager id is 16840
LogUnrealCLR: Display: StartupModule: Host loaded succesfuly!
LogInit: Display: LogUnrealCLR: Warning: StartupModule: Manager id is 16840
LogUnrealCLR: Warning: UnrealCLR module shutting down
LogUnrealCLR: Warning: ShutdownModule: Manager id is 32512

From what I can tell, this may be due (or related) to the fact that shutting down the modules after cooking is done after the engine itself exits and clears out the GUObjectArray (though the original id of the manager is never removed at that point, at least not via RemoveFromRoot which is where I added logging). Based on the additional logging I added, no other modules are trying to do something like this during their shutdown phases.

What baffles me more is, with the changes you suggested before, it looks like the call to IsPendingKillOrUnreachable() should also segfault, since it also does something with the underlying GUObjectArray - but it doesn't; we get past that call fine.

It looks like the actual object pointed to by UnrealCLR::Engine::Manager is being changed - possibly even destroyed - before we reach ShutdownModule(). Even its name is different:

LogUnrealCLR: Warning: StartupModule: Manager address is 518875744 and id is 16840 and name is UnrealCLRManager_0
LogUnrealCLR: Warning: ShutdownModule: Manager address is 518875744 and id is 32584 and name is None

You can also check this on windows by adding the following line at the end of StartupModule (right before "Host loaded successfully!") and the beginning of ShutdownModule:

UE_LOG(LogUnrealCLR, Warning, TEXT("%s: Manager address is %d and id is %d and name is %s"), ANSI_TO_TCHAR(__FUNCTION__), UnrealCLR::Engine::Manager, UnrealCLR::Engine::Manager->GetUniqueID(), *UnrealCLR::Engine::Manager->GetFName().ToString());

Based on all that, I don't think this is an engine-level issue... this looks more like an expired pointer doing scary expired-pointer things (i.e. undefined behavior).

Unfortunately, I'm having trouble building the engine or my test project with ubsan/asan to check further due to some apparently fairly-long-standing issues: here and here

nxrighthere commented 2 years ago

It looks like the actual object pointed to by UnrealCLR::Engine::Manager is being changed - possibly even destroyed - before we reach ShutdownModule(). Even its name is different:

LogUnrealCLR: Warning: StartupModule: Manager address is 518875744 and id is 16840 and name is UnrealCLRManager_0 LogUnrealCLR: Warning: ShutdownModule: Manager address is 518875744 and id is 32584 and name is None

You can also check this on windows by adding the following line at the end of StartupModule (right before "Host loaded successfully!") and the beginning of ShutdownModule:

Based on all that, I don't think this is an engine-level issue... this looks more like an expired pointer doing scary expired-pointer things (i.e. undefined behavior).

I'm unable to reproduce this on Windows. The address and ID remain the same, which is what I expect since the object is explicitly added to the root to prevent it from being garbage collected by the engine's GC.

nxrighthere commented 2 years ago

So I checked your Log.txt file, and I wonder why a lot of logs printed twice:

ProcessResult.StdOut:   LogUnrealCLR: Display: StartupModule: Host path set to "/home/captaincrutches/Projects/UnrealTest/Plugins/UnrealCLR/Runtime/Linux/host/fxr/6.0.1/libhostfxr.so"
ProcessResult.StdOut:   LogUnrealCLR: Display: StartupModule: Host path set to "/home/captaincrutches/Projects/UnrealTest/Plugins/UnrealCLR/Runtime/Linux/host/fxr/6.0.1/libhostfxr.so"
ProcessResult.StdOut:   LogUnrealCLR: Display: StartupModule: Host library loaded successfuly!
ProcessResult.StdOut:   LogUnrealCLR: Display: StartupModule: Host library loaded successfuly!
ProcessResult.StdOut:   LogUnrealCLR: Display: StartupModule: Host functions loaded successfuly!
ProcessResult.StdOut:   LogUnrealCLR: Display: StartupModule: Host functions loaded successfuly!
ProcessResult.StdOut:   LogUnrealCLR: Display: StartupModule: Host runtime assembly loaded succesfuly!
ProcessResult.StdOut:   LogUnrealCLR: Display: StartupModule: Host runtime assembly loaded succesfuly!
ProcessResult.StdOut:   LogUnrealCLR: Display: StartupModule: Host runtime assembly initialized succesfuly!
ProcessResult.StdOut:   LogUnrealCLR: Display: StartupModule: Host runtime assembly initialized succesfuly!
ProcessResult.StdOut:   LogUnrealCLR: Display: StartupModule: Host loaded succesfuly!
ProcessResult.StdOut:   LogUnrealCLR: Display: StartupModule: Host loaded succesfuly!

Also, it seems that the packaging process is somehow different on Windows and Linux.

Try to replace code at line 1355 in UnrealCLR/Private/UnrealCLR.cpp with:

if (!UnrealCLR::Engine::Manager) {
    UnrealCLR::Engine::Manager = NewObject<UUnrealCLRManager>();
    UnrealCLR::Engine::Manager->AddToRoot();
}
captaincrutches commented 2 years ago

I added a logging destructor to UUnrealCLRManager and can confirm that it is in fact dying (in fact, two are - where's the other one coming from?) before we reach ShutdownModule:

LogUnrealCLR: Warning: StartupModule: Creating new manager
LogUnrealCLR: Warning: StartupModule: Manager address is 1876846960 and id is 16840 and name is UnrealCLRManager_0
LogUnrealCLR: Display: StartupModule: Host loaded succesfuly!
LogUnrealCLR: Warning: Manager with id 16839 is dying
LogUnrealCLR: Warning: Manager with id 16840 is dying
LogUnrealCLR: Warning: UnrealCLR module shutting down
LogUnrealCLR: Warning: ShutdownModule: Manager address is 1876846960 and id is 32619 and name is None

The change you suggested has no effect, unfortunately - we still have 2 managers, so I think you may be onto something there, but the other one wasn't pointed to by UnrealCLR::Engine::Manager, so we created a second one anyway. In any case, both died here.

This happens as the engine is shutting down the object subsystem, which tracks with my earlier observation of that happening prior to module unloading:

LogWorld: UWorld::CleanupWorld for None, bSessionEnded=true, bCleanupResources=true
LogExit: Editor shut down
LogCore: Warning: Removing object 22196 from root
LogUnrealCLR: Warning: Manager with id 16839 is dying
LogUnrealCLR: Warning: Manager with id 16840 is dying
LogExit: Object subsystem successfully closed.
LogShaderCompilers: Display: === FShaderJobCache stats ===
LogShaderCompilers: Display: Total job queries 0, among them cache hits 0 (0.00%)
LogShaderCompilers: Display: Tracking 0 distinct input hashes that result in 0 distinct outputs (0.00%)
LogShaderCompilers: Display: RAM used: 0.00 MB (0.00 GB) of 3225.60 MB (3.15 GB) budget. Usage: 0.00%
LogShaderCompilers: Display: ================================================
LogShaderCompilers: Display: Shaders left to compile 0
LogModuleManager: Shutting down and abandoning module BlueprintCompilerCppBackend (622)

I've managed to build up a test project on a Windows machine, with the same log lines added to the plugin, and I see the same thing happening - the manager dies before ShutdownModule starts (and there are 2 of them). The address and id of UnrealCLR::Engine::Manager are the same in this case, but its name is different (None by the time we reach shutdown).

It looks like, on Windows, the pointed-to memory is clobbered less (or at least differently) after the object is dead, so the id is kept intact and it happens to work - but this isn't safe and shouldn't be counted on, as we can see on Linux.

nxrighthere commented 2 years ago

We can try to move instantiation and destruction of the UnrealCLR::Engine::Manager from UnrealCLR::Module::StartupModule() and UnrealCLR::Module::ShutdownModule(). This object used only in the play mode anyway.

nxrighthere commented 2 years ago

Try to apply this changes:

@@ -1352,8 +1352,6 @@ void UnrealCLR::Module::StartupModule() {
                                return;
                        }

-                       UnrealCLR::Engine::Manager = NewObject<UUnrealCLRManager>();
-                       UnrealCLR::Engine::Manager->AddToRoot();
                        UnrealCLR::Status = UnrealCLR::StatusType::Idle;

                        UE_LOG(LogUnrealCLR, Display, TEXT("%s: Host loaded succesfuly!"), ANSI_TO_TCHAR(__FUNCTION__));
@@ -1368,11 +1366,6 @@ void UnrealCLR::Module::StartupModule() {
 }

 void UnrealCLR::Module::ShutdownModule() {
-       if (UnrealCLR::Engine::Manager) {
-               UnrealCLR::Engine::Manager->RemoveFromRoot();
-               UnrealCLR::Engine::Manager = nullptr;
-       }
-
        FWorldDelegates::OnPostWorldInitialization.Remove(OnWorldPostInitializationHandle);
        FWorldDelegates::OnWorldCleanup.Remove(OnWorldCleanupHandle);

@@ -1382,6 +1375,8 @@ void UnrealCLR::Module::ShutdownModule() {
 void UnrealCLR::Module::OnWorldPostInitialization(UWorld* World, const UWorld::InitializationValues InitializationValues) {
        if (World->IsGameWorld()) {
                if (UnrealCLR::WorldTickState == TickState::Stopped) {
+                       UnrealCLR::Engine::Manager = NewObject<UUnrealCLRManager>();
+                       UnrealCLR::Engine::Manager->AddToRoot();
                        UnrealCLR::Engine::World = World;

                        if (UnrealCLR::Status != UnrealCLR::StatusType::Stopped) {
@@ -1430,6 +1425,8 @@ void UnrealCLR::Module::OnWorldCleanup(UWorld* World, bool SessionEnded, bool Cl
                }

                UnrealCLR::Engine::World = nullptr;
+               UnrealCLR::Engine::Manager->RemoveFromRoot();
+               UnrealCLR::Engine::Manager = nullptr;
                UnrealCLR::WorldTickState = UnrealCLR::TickState::Stopped;

                FMemory::Memset(UnrealCLR::Shared::Events, 0, sizeof(UnrealCLR::Shared::Events));
captaincrutches commented 2 years ago

Excellent - that change seems to fix it. My project builds successfully on Linux and I can run the in-engine tests. Thank you!

(There is still a manager that dies in the same place as before - but we're no longer making another one that's causing that UB - so I'm gonna call that OK)

nxrighthere commented 2 years ago

Great, thanks! Fixed in https://github.com/nxrighthere/UnrealCLR/commit/1f29fa2543f831021618a1efcd7e3792c855a0e0.

(There is still a manager that dies in the same place as before - but we're no longer making another one that's causing that UB - so I'm gonna call that OK)

Yes, I expect that.