Closed an0nfunc closed 4 years ago
Confirmed this exact issue happens to me and another user using the rc1 client.
Same thing on FreeBSD 11.3-RELEASE with murmur-1.3.0.r1_4
via pkg
.
Clients connect and get disconnected instantly with this log entry:
Connection closed: Error while reading: erro r:140E0197:SSL routines:SSL_shutdown:shutdown while in init
Same thing with murmur-1.3.0.r1_5
via ports.
I failed to reproduce the issue on FreeBSD 12.0 (Qt 5.12.2) and Fedora 30 (Qt 5.12.4).
Just confirmed it's happening with Qt 5.13.0 and murmur 1.3.0-rc2 as well.
Steps to reproduce:
Connection closed: Error while reading: error:140E0197:SSL routines:SSL_shutdown:shutdown while in init [20]
After some testing it seems like the database used is somehow related. I got no problems after switching the same config from the QMYSQL adapter to sqlite. Same settings otherwise.
Just to confirm, @metsuke0 & @reelsense, what database backend are you using?
EDIT: I'm gonna try and get a SQL log from mariadb for this later today.
I'll try to look into this again some time in the next week. I have a feeling it has to do with my large(900MB) sqlite database and murmur 1.3. I moved my murmur server to Ubuntu 18.04 and it runs 1.2.19 PPA.
I didn't have time to figure out what dependency was having a problem. So I moved all my mumble servers to Ubuntu 18.04. And since I upgraded them to the new murmur 1.3 release I haven't had a problem.
This is a critical issue, we received a private report this week explaining exactly how to trigger it.
We will probably disclose it through a CVE as soon as it's fixed and 1.3.1 is released.
We've also noticed this trying to build with a new OpenSSL (using the Dockerfile, with the unreleased Ubuntu focal instead of disco).
Interestingly, only some users randomly get kicked out every few minutes.
Example log:
mumble_1 | <W>2020-03-22 21:43:05.034 1 => <34:(-1)> Connection closed: Error during SSL handshake: error:14209102:SSL routines:tls_early_post_process_client_hello:unsupported protocol [13]
mumble_1 | <W>2020-03-22 21:43:05.050 1 => <33:user1234 (12)> Connection closed: Error while reading: error:140E0197:SSL routines:SSL_shutdown:shutdown while in init [20]
@davidebeatrici Seeing this issue frequently as well, also with an arch server. Is there any mitigation for this aside from an update, or if not, an ETA on 1.3.1? I haven't seen it on plumble (third-party android app), but have had pymumble bots with the same issue. It also seems to occur with more users more frequently. The server sees clients as ghosts and disconnects them, but client reports the remote host closed the connection. Also mentions being unable to send UDP packets and switching to TCP mode, and as @AnonFuncsAreAwesome mentioned, disconnects usually every five minutes or so (though has disconnected twice in a row within ~15 seconds before).
Update: tested on an alpine server and had the same result.
@bin as mentioned above, downgrading Qt on the server should help. Alternatively, you can use a static build, such as the one used in this docker image: https://github.com/PHLAK/docker-mumble
@cfstras Do this for client, server, or both?
@bin on the server. I wouldn't expect any user of my mumble server to use specific versions of anything...
Thanks, so static linking against QT 15.2.1 should be a safe work-around?
@bin you will have to try that for yourself ;)
I stumbled upon an astonishingly easy way to trigger this reliably as a denial of service. @davidebeatrici, do you need help tracking down exactly where this issue is? Reason I ask is because I don't know if you are working on this privately and I don't want to waste time debugging this if you already know how to fix it.
Thank you for asking!
We're currently finishing up the new build environment and system, so that all developers can easily build Mumble.
Help is of course appreciated to find the root cause of this issue.
The following function is called right after the warning by OpenSSL: https://github.com/mumble-voip/mumble/blob/b54cf63549ed93ca2ce386ebecee6cb34c064cec/src/murmur/Server.cpp#L1424-L1475
Returning early from the function prevents the client from being disconnected.
It's definitely an issue related to Qt, but I'm not sure whether it's due to a bug or wrong API usage and we're only now seeing the effects due to internal Qt changes.
Found it. In https://github.com/qt/qtbase/commit/93a803a6de27d9eb57931c431b5f3d074914f693 Qt incorrectly calls SSL_shutdown() in OpenSSL mid-handshake. I compiled qt5-base on Arch Linux with that commit reverted and Murmur no longer disconnects clients or shows the error message.
@TredwellGit What condition has caused this to trigger for you on the murmur server?
I don't feel that comfortable sharing how to trigger it because it is a very effective denial of service attack and since @davidebeatrici already stated in https://github.com/mumble-voip/mumble/issues/3679#issuecomment-601450953 that he wants to wait for it to be fixed.
Found it. In qt/qtbase@93a803a Qt incorrectly calls SSL_shutdown() in OpenSSL mid-handshake. I compiled qt5-base on Arch Linux with that commit reverted and Murmur no longer disconnects clients or shows the error message.
@TredwellGit Good job!
I actually had found that commit too, but didn't investigate further because I mistakenly thought QSslSocketBackendPrivate::destroySslContext()
was only called after the session is closed.
From https://www.openssl.org/docs/manmaster/man3/SSL_shutdown.html:
SSL_shutdown() only closes the write direction. It is not possible to call SSL_write() after calling SSL_shutdown(). The read direction is closed by the peer.
The behaviour of SSL_shutdown() additionally depends on the underlying BIO. If the underlying BIO is blocking, SSL_shutdown() will only return once the handshake step has been finished or an error occurred.
If the underlying BIO is non-blocking, SSL_shutdown() will also return when the underlying BIO could not satisfy the needs of SSL_shutdown() to continue the handshake. In this case a call to SSL_get_error() with the return value of SSL_shutdown() will yield SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE. The calling process then must repeat the call after taking appropriate action to satisfy the needs of SSL_shutdown(). The action depends on the underlying BIO. When using a non-blocking socket, nothing is to be done, but select() can be used to check for the required condition. When using a buffering BIO, like a BIO pair, data must be written into or retrieved out of the BIO before being able to continue.
After SSL_shutdown() returned 0, it is possible to call SSL_shutdown() again to wait for the peer's close_notify alert. SSL_shutdown() will return 1 in that case. However, it is recommended to wait for it using SSL_read() instead.
SSL_shutdown() can be modified to only set the connection to "shutdown" state but not actually send the close_notify alert messages, see SSL_CTX_set_quiet_shutdown(3). When "quiet shutdown" is enabled, SSL_shutdown() will always succeed and return 1.
@bin Please send me an email, I will reply with details.
Qt incorrectly calls SSL_shutdown() in OpenSSL mid-handshake.
Does this mean this is ultimately a Qt bug? If so has it been reported already? If it has not, we should probably report it in order to get it fixed...
More interestingly is why only mumble seems to be affected. Is mumble doing something unique with Qt's SSL implementation?
I actually had found that commit too, but didn't investigate further because I mistakenly thought
QSslSocketBackendPrivate::destroySslContext()
was only called after the session is closed.
Even then, I wouldn't expect other SSL sessions to be affected.
Possible bug in OpenSSL?
OpenSSL's behavior seems to be fine: when returning early from Server::connectionClosed()
, preventing the function from effectively disconnecting the user, its session is not affected at all.
Server::connectionClosed()
is triggered by Connection::connectionClosed()
:
Connection::connectionClosed()
is emitted when QSslSocket::error()
is:
It seems like Qt is emitting error()
in the wrong QSslSocket
.
Does this mean this is ultimately a Qt bug? If so has it been reported already? If it has not, we should probably report it in order to get it fixed...
It appears to be a Qt bug. Calling SSL_shutdown() mid-handshake is incorrect and not handling the resulting error is really incorrect. See how NGINX handles this: https://github.com/nginx/nginx/blob/65ae8b315211988a821bdc32050768f41571ddae/src/event/ngx_event_openssl.c#L2732-L2824
More interestingly is why only mumble seems to be affected. Is mumble doing something unique with Qt's SSL implementation?
What other server programs are you aware of that use Qt's TLS implementation?
Even then, I wouldn't expect other SSL sessions to be affected.
This is what is confusing me.
Workaround in #4032.
@TredwellGit Could you open an issue in Qt Bug Tracker, please?
I can do it too, but you deserve the credits since you're the one who pinpointed the exact line causing the bug.
Do you know why Qt emits error() from unrelated QSslSocket(s)? Is this an additional bug?
I will open an issue report, but I am really more concerned about this getting fixed than I am getting credit.
I believe it's an additional bug that was already present and didn't cause any issues until qt/qtbase@93a803a6de27d9eb57931c431b5f3d074914f693.
@TredwellGit Did you open an issue in Qt's tracker yet?
Excellent, the description clearly explains the issue in detail.
Fixed: qt/qtbase@36a8bdbc8417506513207daf4f36533a3d6632f3
Any indication in which version this will make it?
No, but the fix has been merged in the 5.15
, 5.14
and 5.12
branches.
So I guess we can surround our workaround by an #if QT_VERSION < QT_VERSION_CHECK(5, 15, 0)
, right?
Yes, we can.
However, I would also like to disable the workaround for the next 5.12 and 5.14 versions.
However, I would also like to disable the workaround for the next 5.12 and 5.14 versions.
I guess as long as we're checking for the minor version number as well, this should be fine :)
Qt 5.12.9: qt/qtbase@36a8bdbc8417506513207daf4f36533a3d6632f3 Qt 5.14.3: qt/qtbase@8ddffc6ba4f38bb8dbeb0cf61b6b10ee73505bbb
Glad that this was resolved :1st_place_medal:
We requested a CVE ID for Mumble, but MITRE created it for Qt because it's effectively an upstream issue: CVE-2020-13962
After upgrading to Qt 5.12.2 all clients got a disconnect after about 5 minutes of connection time. Reproducible with Qt 5.12.2 and Qt 5.12.3, downgrading to 5.12.1 solves this issue.
Not sure if this is a Qt bug or not, so I'm reporting it here first, however it looks to me like a bug in Qt.
murmur.log
Mumble/Murmur 1.3.0_rc1 Archlinux
Edit: Only the Qt version of murmur is relevant, client runs latest Qt just fine with murmur on Qt 5.12.1.