hbeni / fgcom-mumble

A (flightsim) radio communication simulation based on mumble
GNU General Public License v3.0
17 stars 5 forks source link

Bug: Mumble hangs on shutdown with fgcom-mumble enabled #78

Closed mill-j closed 3 years ago

mill-j commented 3 years ago

Describe the bug Latest mumble hangs (until manually ended with Ctrl+C) on shutdown with fgcom-mumble enabled. On my computer(Linux Mint 19.3) it's pretty well everytime. This is not a problem when the plugin is not enabled. Although disabling the plugin freezes the mumble settings dialog so I have to edit mumble's config files.

It also does not seem to be a problem with an older build that I think is 80de426

Expected behavior Quit mumble completely without hanging.

Logfiles Hangs at:

FGCom [2021-02-14 19:30:25.212]: Shutdown plugin
FGCom [2021-02-14 19:30:25.212]: [DBG] stopping threads
FGCom [2021-02-14 19:30:25.213]: [DBG] sending UDP shutdown request to port 16662
FGCom [2021-02-14 19:30:25.213]: [DBG] waiting for threads to finish

Additional context Did some debugging and found if I modify the while loop in fgcom-mumble.cpp/mumble_shutdown() to:

// wait for all threads to have terminated
    pluginDbg("waiting for threads to finish");
    int shutdownTimeout = 0;
    while (udpServerRunning || udpClientRunning || fgcom_gcThreadRunning || fgcom_debugthread_running) {
        // just wait for the servers to come down. This should not take long.
        // TODO: this may block forever. We probably should have some kind of timeout here.
        std::this_thread::sleep_for(std::chrono::milliseconds(100));

        if(udpServerRunning){pluginLog("Waiting on udpServerRunning");}
        if(udpClientRunning){pluginLog("Waiting on udpClientRunning");}
        if(fgcom_gcThreadRunning){pluginLog("Waiting on fgcom_gcThreadRunning");}
        if(fgcom_debugthread_running){pluginLog("Waiting on fgcom_debugthread_running");} 

        if(shutdownTimeout > 20)
            break;
        shutdownTimeout++;
    }

This allows us to see which bool is keeping it from exiting:

FGCom [2021-02-14 19:30:25.212]: Shutdown plugin
FGCom [2021-02-14 19:30:25.212]: [DBG] stopping threads
FGCom [2021-02-14 19:30:25.213]: [DBG] sending UDP shutdown request to port 16662
FGCom [2021-02-14 19:30:25.213]: [DBG] waiting for threads to finish
FGCom [2021-02-14 19:30:25.313]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:25.313]: Waiting on fgcom_gcThreadRunning
FGCom [2021-02-14 19:30:25.313]: Waiting on fgcom_debugthread_running
FGCom [2021-02-14 19:30:25.413]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:25.414]: Waiting on fgcom_gcThreadRunning
FGCom [2021-02-14 19:30:25.414]: Waiting on fgcom_debugthread_running
FGCom [2021-02-14 19:30:25.514]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:25.514]: Waiting on fgcom_gcThreadRunning
FGCom [2021-02-14 19:30:25.514]: Waiting on fgcom_debugthread_running
FGCom [2021-02-14 19:30:25.614]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:25.614]: Waiting on fgcom_gcThreadRunning
FGCom [2021-02-14 19:30:25.615]: Waiting on fgcom_debugthread_running
FGCom [2021-02-14 19:30:25.624]: [DBG] [GC] thread finished
FGCom [2021-02-14 19:30:25.706]: [DBG] ---------DEBUG THREAD FINISHED---------
FGCom [2021-02-14 19:30:25.715]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:25.815]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:25.916]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:26.016]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:26.116]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:26.217]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:26.317]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:26.418]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:26.518]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:26.618]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:26.719]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:26.819]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:26.920]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:27.020]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:27.120]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:27.220]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:27.321]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:27.421]: Waiting on udpServerRunning
FGCom [2021-02-14 19:30:27.421]: [DBG] mumble_shutdown() complete.
<W>2021-02-14 19:30:27.427 PulseAudio: Forcibly disconnected from PulseAudio
<W>2021-02-14 19:30:27.511 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.512 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.512 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.512 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.512 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.512 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.512 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.512 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.512 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.512 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.512 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.512 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.512 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.513 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.513 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.513 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.513 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.514 Clearing leaked memory from a plugin
<W>2021-02-14 19:30:27.514 Clearing leaked memory from a plugin

A quick fix would obviously be implementing the timeout as I did above, but the indefinite loop is the likely the result of some other problem. I'm guessing another locking issue, which would make sense if something changed in mumble.

hbeni commented 3 years ago

A locking issue could be, however i more suspect that the UDP shutdown message is not relayed properly to the udp server, keeping him alive (waiting for data / blocked socket read). The UDP server needs a message (any packet) in order to free the socket, because the socket read is currently implemented blocking.

hbeni commented 3 years ago

Hello @krzmbrzl, sorry for bothering again.... We have found another mumble freeze issue with the mumble_shutdown() plugin hook function. We call the API function mumAPI.log inside it.

The freeze occurs directly after the pluginLog message was printed to the terminal. The expected message to mumble's chat window (mumAPI.log) does not occur anymore:

pluginLog("[UDP-server] server connection established from "+clientHost_str+":"+std::to_string(clientPort));
mumAPI.log(ownPluginID, std::string("UDP server connection established from "+clientHost_str+":"+std::to_string(clientPort)).c_str());

If i comment all the mumAPI calls, it works again.

:question: Is it unsafe to call API functions inside shutdown? I would expect that it's not, because i want to print stuff to the chat window to inform the user that the plugin has cleaned up after itself...

Krzmbrzl commented 3 years ago

Is it unsafe to call API functions inside shutdown?

No it should be safe. In fact the testPlugin does this and I never had issues with that: https://github.com/Krzmbrzl/mumble/blob/deb74a6634898f68d37fd8ba58253acf08c7b43b/plugins/testPlugin/testPlugin.cpp#L73-L77

Looking at your log I see this message being logged a couple of times:

Clearing leaked memory from a plugin

This means that you have multiple memory leaks in your code. This message gets logged if there are resources allocated for the plugin that were not freed afterwards. Idk know though whether the cleanup code simply did not run because of the freeze :shrug:

Could you link me to the actual code part here on GitHub so I can see the entire context of that API function call?

hbeni commented 3 years ago

@krzmbrzl Sure, the function is here: https://github.com/hbeni/fgcom-mumble/blob/b28a40e529fe00b234d9c72d6abe98eab99f887b/client/mumble-plugin/fgcom-mumble.cpp#L555-L587

Basicly i just stop my threads there.

Regarding the leaked memory: I'm not sure where that comes from. I do not always see them, especially at the cases when i disable the mumAPI.log calls they seem to be gone. However i did not extensively tested that, and it may just be a coincidence.

Krzmbrzl commented 3 years ago

I am a bit confused now though. The original issue seems to have reported that the while loop waiting for the different threads to finish was blocking the shutdown. But if that was the case, the API call wouldn't even be reached.

Or is the API call blocking if the workaround suggested here is in place (breaking out of the loop if it iterates too often)?

hbeni commented 3 years ago

@krzmbrzl The proposed workaround here is not the solution to the actual problem - rather would be to remove the calls to mumAPI.log if those are considered invalid in the context of the mumble_shutdown hook.

The issue is that we get into an endless loop, because mumAPI.log is blocking the shutdown of a thread. That thread (the udp server) uses recv on an blocking socket and thus needs a UDP packet to continue, which i try to send. It is sent and then received by the server, but then mumAPI.log blocks further shutdown of the thread. That is exaclty caused here:

Krzmbrzl commented 3 years ago

Ah so the code part you have originally linked me is not he one where the API call is blocking. Okay.

Then I think I know where the issue lays:

I guess I'll have to implement async API calls sooner rather than later :sweat_smile:

hbeni commented 3 years ago

I hope i understand correctly:

Did i summarize correctly?

Is there anything else i currently could do to have the messages in the client, while also waiting for the threads to stop?

Krzmbrzl commented 3 years ago

short term workaround could be to not call the API functions inside the blocking mumble_shutdown (ie. commenting the mumAPI.log calls)

No. The problem is that API functions are being called from a different thread while the main thread is blocked with waiting for these extra threads to finish.

a quick solution would also be to be able to use mumAPI.log from blocking function calbacks clean solution is to enhance mumble for async calls, so the described locking situation does not occur

These are the same. As I explained before I can't make the call non-blocking (due to the function's return value).

Is there anything else i currently could do to have the messages in the client, while also waiting for the threads to stop?

Implement a custom "event loop" for these cases where the separate thread messages the main thread to do the logging. That could then be done every time the sleep is over in your loop. But I don't think this is worth the hassle just to get some log messages...

hbeni commented 3 years ago

OK - i think i got it fixed. I disabled a not-important gui log message at shutdown time and moved the "first contact" message to the "not in shutdown mode" branch. That is enough to have a non-blocking code path for shutdown.

Thank you again, very much, @krzmbrzl