mumble-voip / mumble

Mumble is an open-source, low-latency, high quality voice chat software.
https://www.mumble.info
Other
6.41k stars 1.12k forks source link

users randomly unable to send messages or switch channels #3985

Closed quickdude111 closed 4 years ago

quickdude111 commented 4 years ago

After so many hours of being connected to the server, a random user will eventually become afflicted with a bug where they cannot send any chat messages (when sending a chat message that user sees it but it does not show for anyone else) or switch channels (attempting to switch channels does nothing) unless another admin drags them. What causes this is unknown and seems to be after a random period of time, usually over 6 hours, and the issue persists even after apt purge of the mumble-server package and reinstall. The current workaround is to disconnect and reconnect the afflicted client.

client: windows 10/windows 7/debian sid x64, x86_64 server: debian sid x86_64 Version 1.3

after an apt-purge of the server and reinstall, the issue was fixed for several days but has now re-surfaced.

after reviewing some of the other older issues, it should be noted that on my server there is a channel where users are supressed (the afk channel), and it is unknown whether entering and leaving the channel has anything to do with this issue.

Krzmbrzl commented 4 years ago

Is voice communication still possible for these clients?

What version of murmur (the server) is this happening on? Does it only happen on that specific server or did you also encounter the bug in another server?

quickdude111 commented 4 years ago

Sorry, yes voice communication still works for them.

server is "1.3.0+dfsg-1+b1 amd64" It seems to only happen with this server and I have not seen this issue reported anywhere else and believe me I have searched.

Krzmbrzl commented 4 years ago

Is there any kind of server-configuration that is "out of the ordinary"? Could you get that configuration and post it here?

quickdude111 commented 4 years ago

as for the database it is 7 channels each with a small jpg image in the description and one being the afk channel with speak, message, and channel creation denied in acl. here is the .ini config, it is the only one I know of and I have only changed a few lines, the serverpassword, registername, and welcome text. mumble-server.ini.txt

Krzmbrzl commented 4 years ago

Thanks for the config. From looking at it, I don't see anything standing out though. I also took the liberty to cut out all the commented stuff:

; Murmur configuration file.

; Path to database. If blank, will search for
; murmur.sqlite in default locations or create it if not found.
database=/var/lib/mumble-server/mumble-server.sqlite

; If you want to use ZeroC Ice to communicate with Murmur, you need
; to specify the endpoint to use. Since there is no authentication
; with ICE, you should only use it if you trust all the users who have
; shell access to your machine.
; Please see the ICE documentation on how to specify endpoints.
#ice="tcp -h 127.0.0.1 -p 6502"

; Ice primarily uses local sockets. This means anyone who has a
; user account on your machine can connect to the Ice services.
; You can set a plaintext "secret" on the Ice connection, and
; any script attempting to access must then have this secret
; (as context with name "secret").
; Access is split in read (look only) and write (modify)
; operations. Write access always includes read access,
; unless read is explicitly denied (see note below).
;
; Note that if this is uncommented and with empty content,
; access will be denied.

;icesecretread=
icesecretwrite=

; Specifies the file Murmur should log to. By default, Murmur
; logs to the file 'murmur.log'. If you leave this field blank
; on Unix-like systems, Murmur will force itself into foreground
; mode which logs to the console.
logfile=/var/log/mumble-server/mumble-server.log

; If set, Murmur will write its process ID to this file
; when running in daemon mode (when the -fg flag is not
; specified on the command line). Only available on
; Unix-like systems.
pidfile=/run/mumble-server/mumble-server.pid

; The below will be used as defaults for new configured servers.
; If you're just running one server (the default), it's easier to
; configure it here than through D-Bus or Ice.
;
; Welcome message sent to clients when they connect.
; If the welcome message is set to an empty string,
; no welcome message will be sent to clients.
welcometext="welcome to fluors mumble"

; Port to bind TCP and UDP sockets to.
port=64738

; Password to join server.
serverpassword=deerpiss

; Maximum bandwidth (in bits per second) clients are allowed
; to send speech at.
bandwidth=72000

; Maximum number of concurrent clients allowed.
users=100

; Per-user rate limiting
;
; These two settings allow to configure the per-user rate limiter for some
; command messages sent from the client to the server. The messageburst setting
; specifies an amount of messages which are allowed in short bursts. The
; messagelimit setting specifies the number of messages per second allowed over
; a longer period. If a user hits the rate limit, his packages are then ignored
; for some time. Both of these settings have a minimum of 1 as setting either to
; 0 could render the server unusable.
messageburst=5
messagelimit=1

; Respond to UDP ping packets.
;
; Setting to true exposes the current user count, the maximum user count, and
; the server's maximum bandwidth per client to unauthenticated users. In the
; Mumble client, this information is shown in the Connect dialog.
allowping=true

; If Murmur is started as root, which user should it switch to?
; This option is ignored if Murmur isn't started with root privileges.
uname=mumble-server

; You can configure any of the configuration options for Ice here. We recommend
; leave the defaults as they are.
; Please note that this section has to be last in the configuration file.
;
[Ice]
Ice.Warn.UnknownProperties=1
Ice.MessageSizeMax=65536

Would it be viable for you to remove that AFK channel from the server and check whether the problem is still occurring?

quickdude111 commented 4 years ago

Yes I will remove it and see. it may take up to a few days for the issue to resurface, so sit tight.

Krzmbrzl commented 4 years ago

Thanks

No worries - it's not like we run out of issues any time soon xD

quickdude111 commented 4 years ago

update: Removing the afk channel and restarting the mumble-server service had no effect, the issue persists.

Krzmbrzl commented 4 years ago

Alright thanks :)

Would it be possible to switch the server version to a new one with some Debug logs compiled into it? I'd like to check whether the messages even arrive at the server or if it seems to be a problem on the client side (the message not being sent to the server) :point_up:

quickdude111 commented 4 years ago

Yes I will switch to the dbg version right now. just need to compile it first.

quickdude111 commented 4 years ago

ok i may have done something wrong in compiling it, it says no debug symbols found in ~/mumble/release/murmurd, which is the binary i compiled with "qmake CONFIG+=symbols -recursive". I'm a bit lost at this point as I ran it with gdb and there is no verbose info in the debug log despite me connecting and moving channels around and stuff. edit: ive mis-spelled "config" in the qmake command. gonna try this again

edit: alright that didn't work either, gonna try 'make debug' i guess edit: ok that doesn't work either, I'm not sure how im supposed to do this. we'll figure it out im on irc now

final edit: got it. give us a few days to see if the issue resurfaces again and we'll have a log for you. note that i am now using 1.4 murmur instead of 1.3 however.

TredwellGit commented 4 years ago

This issue is the same as #3831, #3662, and #3724.

To give some background, a rate limiter was introduced in Murmur that originally used both ctime and std::chrono. Mumble at the time did not require support for C++11 and the ctime implementation was broken so it was replaced in #3695 by a QTime implementation that was also broken, so it was replaced again in #3868 with a QDateTime implementation. However, this QDateTime version is currently in master and not 1.3.0 so the issue exists that if someone is connected and the clock moves past midnight, then the rate limiter bans the user.

You can mitigate this by setting messageburst to a large value which allows for more messages to be sent after the users have been banned. For example: messageburst=65535

Since Mumble now targets C++11, I would recommend that there be a new release that utilises std::chrono since a monotonic clock is the proper way to measure elapsed time and would avoid issues like this in the future. Just reuse the original rate limiter implementation minus the broken ctime support.

Krzmbrzl commented 4 years ago

@TredwellGit thank you very much for the insights! :)

If this is indeed the root cause of this bug, @quickdude111 shouldn't be experiencing it anymore with the 1.4 snapshot. Let's hope the best :)

Since Mumble now targets C++11, I would recommend that there be a new release that utilises std::chrono since a monotonic clock is the proper way to measure elapsed time and would avoid issues like this in the future. Just reuse the original rate limiter implementation minus the broken ctime support.

I assume you mean a new 1.3.x release, right? Aka 1.3.1

quickdude111 commented 4 years ago

update: its been a few days just to confirm, and the issue is completely gone on murmur 1.4 I believe TredwellGit is correct.

Krzmbrzl commented 4 years ago

Thank you very much for the update! Great to hear that it appears to be fixed :ok_hand:

@TredwellGit apparently the current code is working fine. Is there any particular reason you suggested switching to an implementation using std::chrono instead?

davidebeatrici commented 4 years ago

@TredwellGit Thank you very much for the detailed history and suggestion.

Unfortunately we have so much work going on for 1.4.x that we forgot this (serious) bug still exists in 1.3.x and thus didn't give 1.3.1 proper priority.

Now that we have confirmation the latest rate limiter implementation is working, I believe we can safely backport #3868 and release 1.3.1.

TredwellGit commented 4 years ago

@Krzmbrzl

Is there any particular reason you suggested switching to an implementation using std::chrono instead?

Yes, because QDateTime is a wall clock and std::chrono::steady_clock is a monotonic clock. Wall clocks are only intended to tell you the time at the current time and nothing else. A monotonic clock never decreases and has a constant amount of time between increments. A completely valid wall clock can stand still, go backwards, have same increments of time that are different (like one second that takes longer or shorter than another), have smaller or larger increments of time be larger or smaller than the other (like one second being longer than an hour), can skip time (like a leap year or second), and can be redefined (like when a time zone is modified) among many other things that disqualify it for being used as a means of tracking elapsed time. You can not measure elapsed time by subtracting two wall clock timestamps. Using std::chrono::steady_clock eliminates the possibility of issues like this because you can measure elapsed time by subtracting two monotonic clock timestamps. Since it is included in the C++11 standard library, just about only requires changing types, and there is already an implementation written (#3695) that just requires removing the completely separate ctime implementation, it would be better to use std::chrono.

@davidebeatrici

Now that we have confirmation the latest rate limiter implementation is working, I believe we can safely backport #3868 and release 1.3.1.

Thank you very much for the detailed history and suggestion.

Thank both of you for the work that is put in Mumble. A 1.3.1 release would be appreciated.

Krzmbrzl commented 4 years ago

Thanks for the explanation. Sounds very reasonable to me. It also reminded of the QElapsedTimer class which seems to fill the gap in the same way as std::chrono::steady_clock but has the advantage of fitting nicely in the rest of Qt's API.

I think I'll use that one instead.

@quickdude111 would you perhaps be willing to test and verify that the issue won't reappear with those changes (after I made them of course)? That'd be greatly appreciated :)

quickdude111 commented 4 years ago

@quickdude111 would you perhaps be willing to test and verify that the issue won't reappear with those changes (after I made them of course)? That'd be greatly appreciated :)

yes certainly

Krzmbrzl commented 4 years ago

yes certainly

Cool, thank you! :) The changes are in #4004. Do you know how to pull and build from my branch this PR is based on?

quickdude111 commented 4 years ago

Do you know how to pull and build from my branch this PR is based on?

I am compiling your branch now. ill get it up and running asap and i'll report back in a couple of days to make sure it's working.

quickdude111 commented 4 years ago

been running the Krzmbrzl branch of murmurd for one midnight cycle now. One user has reported experiencing something like issue #3701, disconnecting and reconnecting fixes it. unsure if it is a server issue but possibly a client UI issue. The main issue of this thread is gone and I'm pretty sure they are not related.

Krzmbrzl commented 4 years ago

Thanks for testing this out :+1:

The linked issue is unrelated (99% certain of that) and seems to be a problem with sizing the respective UI component properly... It is quite curious though, but not a matter for this thread here :)