mumble-voip / mumble

Mumble is an open-source, low-latency, high quality voice chat software.
https://www.mumble.info
Other
6.38k stars 1.12k forks source link

Mumble snapshots locking up with Guild Wars 2 #1205

Closed Tarun80 closed 7 years ago

Tarun80 commented 10 years ago

For a while now I've been noticing that Mumble will lock up (Not Responding) when Guild Wars 2 is open. This issue seems to be easy to reproduce when you first open Guild Wars 2.

This can also occur when alt-tabbed from the game. I have previously reported the issue on the sourceforge forums.

An earlier reported issue sounded very similar to this, also. However it has been occurring for quite some time prior to this report.

I've also noticed it's reproducable at a higher rate by running two fullscreen games at once. Usually Guild Wars 2 and then something like Team Fortress 2.

This problem originally appeared around the time of Aug 24, 2013. So between the 1.2.4 release and the snapshot of 1.2.4-139-g7c2d1a3 is when this issue may have truly started. I recall that in Guild Wars 2 the overlay didn't work in 1.2.4 and I've been using snapshots since.

Hope this helps.

Kissaki commented 10 years ago

Guild Wars 2 will lock up (Not Responding) when Guild Wars 2 is open

Did you mean the Mumble client?

Tarun80 commented 10 years ago

I did indeed. I'll edit my post and correct that now.

Also, I just tested the latest snapshot as of this post (268) and just had Mumble go into a state of Not Responding while sitting at the character select screen.

I have noticed this happens much more when loading Guild Wars 2 through Steam to have the Steam overlay.

Hope this information helps.

Kissaki commented 10 years ago

What time-span do I have to look for if I want to experience it?

Kissaki commented 10 years ago

Are we talking about a fullscreen-only issue? Did you ever try running borderless window or windowed mode?

Tarun80 commented 10 years ago

I've been messing with it some more while talking with some friends on Mumble. I've noticed that it will happen within thirty seconds to a minute and a half. In my last test it happened after about 45 seconds and I was using the Steam overlay.

It definitely happens a lot more when launching the game through Steam to make use of the Steam overlay.

I've done about ten tests with and without the Steam overlay so far.

This is using fullscreen only, yes. I don't know if it happens with borderless window. The few times I've run Guild Wars 2 windowed it has also happened.

Kissaki commented 10 years ago

Could you provide some more information? What exactly is the issue with Mumble? Does it recover? How can I notice the issue occurs?

I started TF2 (borderless) and GW2 (fullscreen) from steam with my self-compiled Mumble (includes PR #1220) running. I played for almost an hour, tabbed out and in of fullscreen, used the steam overlay for a bit. No issue.

I then started GW2 with Mumble snapshot 1.2.5-268 running. Opened steam overlay and waited a bit. No issue.

This issue seems to be easy to reproduce when you first open Guild Wars 2. You mean GW2 before Mumble? I do not get an overlay at all then, and no issue neither.

It may be that I missed the issue as I was alone in Mumble - depending on what you are reporting it to be.

Tarun80 commented 10 years ago

More Information Step by step:

Issue and Recovery If I Alt-Tab I will see that it says Mumble (Not Responding). It does recover after a brief amount of time.

What You'll Notice Two things I notice when it locks up:

So essentially the entire overlay along with the program itself stops responding.

The only other issue I've had occur is when I've had to Alt-Tab outside of GW2 but I continue talking with people on Mumble. While Alt-Tabbed it will randomly lock up.

Mumble Audio Input: http://i.imgur.com/1TBseon.png Mumble Audio Output: http://i.imgur.com/8TDLvsH.png Mumble Overlay options: http://i.imgur.com/ST4yMk6.png

I also am not using Positional Audio.

I'll see if I can capture this while streaming GW2. Once I do I'll make a YouTube clip to show you what I see when it happens.

Tarun80 commented 10 years ago

@Kissaki - I've got the videos uploaded to YouTube (please excuse the language one guildie used in clip 1 of 2)

The first clip, I played as normal, loading into a map in GW2. The second clip I waited on the character select screen for the client to lock up. In clip 2, when Mumble froze my Push-to-Talk key was pressed and it got stuck on during that time.

Ironically, while Alt-Tabbed out of GW2 to upload and post this the Mumble client locked up. Either allowing it time to recover or going back into GW2 will allow it to resume working normally.

If you need hardware specs or any other information please let me know.

Kissaki commented 10 years ago

Alright, that helped. Seems to happen pretty guaranteed. I was able to reproduce it. Thanks.

Tarun80 commented 10 years ago

Awesome! Thanks Kissaki.

I admit, I'm curious what the problem is and things of that nature. One of my friends and I tried to figure out what all was the cause for a while.

Kissaki commented 10 years ago

tl;dr: Potential issue: Freeing dead OverlayClient.

When the issue occurs the client (debug) logs:

Overlay: Dead client detected

The client then proceeds to “schedule” a delete of that client.

The client considers an OverlayClient dead when > 1024 bytes are in queue to be sent to it, yet it does not pull the data from the socket/pipe for five seconds.

The overlay library on the other hand checks for data in the socket on each draw call.

Guild Wars 2 creates one D3D device for the launcher, and a second one when the game launches. The overlay injection for the first device likely never draws again, but is not destroyed. Hence it will not pull data from the socket and be considered dead by the Mumble client. Or it is the second device that is not drawn - either way one is being drawn, the other one not.

It seems to be 45 seconds until issue when no additional overlay messages are sent from the client to the overlay library. I guess with your shorter time to issue, within that time span a user joined or left the channel for example, resulting in the active overlay area changing or similar.

The culprit is https://github.com/mumble-voip/mumble/blob/640b532fb2f34c6d74d716323e627d2d8698d41c/src/mumble/OverlayClient.cpp#L103 qlsSocket->abort(); in ~OverlayClient().

(I also noticed inadequate sendMessage handling for/since the FPS counter logic.)

Kissaki commented 10 years ago

I guess a Qt-4.8 issue:

QLocalSocket::abort() is supposed to “immediately close” http://qt-project.org/doc/qt-4.8/qlocalsocket.html#abort

QLocalSocket::abort() calls close() https://qt.gitorious.org/qt/qt/source/68a911862e05400ced87971c43fb27fb5d5d8ebd:src/network/socket/qlocalsocket_win.cpp#L361

close() calls disconnectFromServer https://qt.gitorious.org/qt/qt/source/68a911862e05400ced87971c43fb27fb5d5d8ebd:src/network/socket/qlocalsocket_win.cpp#L429-431

I have a client to a server. If server queued data for the client which he did not read for x time, the client is considered dead. Hence, the server wants to abort. My issue now is that abort() actually seems to wait for data delivery; 30 s.

Apparently, not calling abort() but deleteLater() right away results in the same 30 seconds lock-up. So maybe it’s not the disconnectFromServer being called that I suspected?

Kissaki commented 10 years ago

The issue does not occur when launching GW2 directly. It does occur when starting it through Steam. When launching directly the OverlayClient is destructed right when the launcher quits and the GW2 game starts. When launching through Steam the OverlayClient is not destructed at that point.

Kissaki commented 10 years ago
ChildEBP RetAddr  
0030d96c 74e3149d ntdll!ZwWaitForSingleObject+0x15
0030d9d8 76511194 KERNELBASE!WaitForSingleObjectEx+0x98
0030d9f0 76511148 kernel32!WaitForSingleObjectExImplementation+0x75
0030da04 56b62b9d kernel32!WaitForSingleObject+0x12
0030da34 56c0ec00 QtCored4!QThread::wait+0xad [d:\dev\mumble\mumble\qtmumble\src\corelib\thread\qthread_win.cpp @ 561]
0030da5c 69cb6e69 QtCored4!QWindowsPipeWriter::~QWindowsPipeWriter+0x60 [d:\dev\mumble\mumble\qtmumble\src\corelib\io\qwindowspipewriter.cpp @ 70]
0030da64 69cb6e9e QtNetworkd4!QWindowsPipeWriter::`scalar deleting destructor'+0x9
0030da74 00a52338 QtNetworkd4!QLocalSocket::abort+0x1e [d:\dev\mumble\mumble\qtmumble\src\network\socket\qlocalsocket_win.cpp @ 359]
0030daa8 00a52928 mumble!OverlayClient::~OverlayClient+0x298 [c:\dev\mumble\mumble\src-mine\src\mumble\overlayclient.cpp @ 111]

QWindowsPipeWriter is a QThread and waits on destruction. So next: Check how we got a running thread there.

https://qt.gitorious.org/qt/qt/source/01fd1edbb074b26a054bb545ffed979100f6be12:src/corelib/io/qwindowspipewriter.cpp#L70

Kissaki commented 10 years ago

QLocalServer is what we use for communication between overlay library and Mumble. Connections spawn QLocalSockets. On Windows, this uses named pipes which are encapsulated in QWindowsPipeWriter. The overlay code manually connects to the named pipe and does not use Qt code for this.

As said, the spawned QLocalSocket QThread::waits on destruction. QWindowsPipeWriter has a QMuted lock and QWaitCondition waitCondition.

(One potential issue I found is that QWindowsPipeWriter::~QWindowsPipeWriter() calls waitCondition.wakeOne(); rather than waitCondition.wakeAll();, which could trigger a QWindowsPipeWriter::waitForWrite rather than run (only the latter handles the quitNow property). However, none of our code seems to use waitForWrite.) => Does not change anything.

Having done a small test, if the overlay library does not connect to the named pipe, the issue does not seem to occur. (Probably/Maybe: No fps update messages, thus no OverlayClient->update calls, thus less data in pipe?.)

Quitting GW2 while Mumble hangs will instantly release the freeze. Which could indicate that somehow the locking end is in the overlay library after all - although I do not understand how that reflects onto the Client - which is merely connected via a named pipe.

Tarun80 commented 10 years ago

I can't help but wonder if this is the same issue for when you alt-tab out of Guild Wars 2 (when run through Steam) and Mumble stops responding when you try talking to people? I've unfortunately had it happen at random intervals. Alt-tabbing back in fixes it immediately, though.

Tarun80 commented 10 years ago

I've discovered another issue that I had assumed was due to a graphics card change months ago but may be related to this issue as well.

I've got an Nvidia GTX 760 (switched from an ATI Radeon HD 5970) and have noticed that since February when I'd close Guild Wars 2 either by Exit to Desktop or Alt+F4 I'd get a brief crash dialog appearing.

I'm able to reproduce the issue with Mumble running (No Steam, no overlays except Mumble) and just Guild Wars 2 and get the crash every time. You don't even have to be in the game for a minute for the crash to occur when you close the game through either of the aforementioned methods.

Honestly, I'm not sure if the fixes you've looked into @Kissaki would also resolve this crash or not. Initially I assumed it was their game client (and that may be the case) but when further diagnosing I found that having GW2 + Mumble (with overlay on) would trigger the crash every time.

Tarun80 commented 10 years ago

@Kissaki, Guild Wars 2 just had a major update yesterday. September 2014 Feature Pack.

Their launcher has changed and I'm hearing it's now HTML5 like their new Trade Post. A few times when I've started the game I've noticed that the lockup has not occurred anymore. I've been able to launch the game through Steam without any lockups (that I've noticed) so far. Additionally, the crash when closing the game has not been happening either.

I will continue to monitor this however; as there are still some issues with the launcher they are now using (not remembering login credentials such as password for example). Just wanted to post an update on this.

Kissaki commented 10 years ago

Ah, nice. Thank you for reporting.

Tarun80 commented 9 years ago

I believe this issue has been resolved, however one that has been ongoing and is somewhat related still exists. If you alt-tab out of GW2 (or any full-screen game for that matter) but keep talking on Mumble. Over time Mumble will go into a Not Responding state and lock up briefly.

To unlock it you either wait or go back into the full screen game.

mkrautz commented 9 years ago

I think that is the same issue that @Kissaki found. It happens in other scenarios as well, not just GW.

It might not be an upstream Qt issue, BTW.

We patch Qt's LocalSocket implementation to add a buffer: https://qt.gitorious.org/qt/mumble-developers-qt/commit/50c7e47c2ac423b012627c60cad2f486c64a232a

But it doesn't seem to necessarily be what's going wrong in the stack trace by @Kissaki

mkrautz commented 9 years ago

Oh, and that's the server...

mkrautz commented 9 years ago

The wait was increased from 100 ms to 30 seconds in Qt 4.8: https://bugreports.qt.io/browse/QTBUG-4425

mkrautz commented 9 years ago

I wonder if we can create a D3D app that can consistently reproduce this easily.

Tarun80 commented 9 years ago

I can reproduce it with any fullscreen game. Moreso when I run two or games fullscreen at once.

Tarun80 commented 9 years ago

Any progress on this issue?

hacst commented 9 years ago

ad1ed221148cf6e721529f8ff36af45ce4ec5fed should at least give us back the old behavior (with 1s instead of a 30s timeout).

Tarun80 commented 9 years ago

Looking forward to testing this. I wonder if a part of the hangups are related to #1637 and #1641 as you referenced. Hopefully the Qt 5.4.1 issue with Drag and Drop problems with caused #1610 to crop up will get fixed soon too. 5.4.2 and 5.5 are slated to have it fixed too.

Kissaki commented 7 years ago

So, this was an issue with Guild Wars 2s launcher, which was fixed/changed on their side.

According to the logging the Overlay still injects into D3D9 in the launcher (createdevice functions), but no Overlay is visible. When launching the game, we inject into DXGI+D3D11 (Present, ResideBuffers, Device add/remove). The Overlay displays fine, no Mumble client lockup, no "Dead Overlay client detected" message in the debug logging.

Supposedly, the QWindowsPipeWriter is no longer using a thread, and should thus not cause hungups for us after Qt Version 5.6.0. (change, qt ticket).

As the GW2 issue does no longer exist, I suggest we close this ticket. For the other issue, we should create a separate ticket - I’m not sure what to make of it right now/yet though. What does still occur, when does it occur? @Tarun80 @mkrautz

Kissaki commented 7 years ago

We already have that as #1641. Closing this then.

Kissaki commented 7 years ago

I also tried fullscreening GW2 and tabbing out. Mumble 1.3.0~1937 does not lock up.

mkrautz commented 7 years ago

I tried GW2 a while back (after Qt 5.6) and I was still able to lock it up :-(

mkrautz commented 7 years ago

I don't recall how to do it, exactly... Maybe it was when I closed the game?

Kissaki commented 7 years ago

Do you want to reopen and assign to yourself to test again?

@Tarun80 said it was fixed for him above, so we don't have anybody else to reproduce despite you?

Or maybe he or someone else can run some tests and try to reproduce.

Tarun80 commented 7 years ago

It's been a while since I used the Mumble overlay since I have Mumble on a second monitor now. However I'll try to test this out over the next few weeks. Alt-tabbing was an issue that would write to the console with errors, though I forget what they were.

Regardless, I'll enable the overlay and keep an eye on things. I say over the next few weeks as life has been rather busy lately.

Tarun80 commented 7 years ago

While it's not Guild Wars 2, this is an issue in World of Warcraft that happened on the loading screens. Turning off the overlay stopped the issue. https://i.imgur.com/KTkLmYp.jpg

mkrautz commented 7 years ago

@Tarun80: Yes, that's a WoW-specific bug. Only affects loading screens AFAIK