mumble-voip / mumble

Mumble is an open-source, low-latency, high quality voice chat software.
https://www.mumble.info
Other
6.28k stars 1.11k forks source link

Weird packet loss since upgrading server to 1.3.0 #3831

Closed bkacjios closed 4 years ago

bkacjios commented 4 years ago

Ever since I upgraded my server from 1.2.19 some of my friends, and myself, have been experiencing intermittent issues where some of the TCP command packets just fail to go through. For example, once this issue arises, people can't see your chat messages, can't see you locally muting/deafening yourself, etc, until you reconnect. It's weird though because you can still move channels and talk.

My bot also gets affected by this. For example, the bot uses the idle timer in UserStats and warns people of being AFK after almost 2 hours, then moves them a little after the warning. It can still move people, but the warning message stops getting seen. The bot can also move people to our designated AFK channel, but it can no longer move itself to other channels, locally mute/deafen itself, etc, just like what other users are experiencing.

I checked the log files and there was nothing strange or out of the ordinary, and I am also having trouble replicating this on command, since it usually seems to happen randomly, usually after a good 4+ hours of being connected to the server. If there's any more info I can provide please let me know. I can send raw packets through my bot to test things if need be.

Again, this issue only became a thing after the upgrade to 1.3.0. Never experienced anything like this before.

davidebeatrici commented 4 years ago

Probably related to #3510.

Could you try to tune the messageburst and messagelimit settings in Murmur's configuration, please?

bkacjios commented 4 years ago

Oh okay, I'll give that a shot. Is there anyway to disable the whole message burst limits in general? I don't really need this functionality.

davidebeatrici commented 4 years ago

We didn't add a setting to disable it because it would leave the server vulnerable to DDoS attacks.

The system could definitely be improved though.

bkacjios commented 4 years ago

I see, my server is private and is only used by like 5 or so close friends, so I'm not too worried about that. It's just weird that people seem to be getting hit by this limit, including the bot. The most messages I could possibly imagine that get sent when we're all talking is like 5 a minute. It's not like were spamming actions or anything. It's also weird because once you get hit by the rate limit, the "ban" seems permanent.

davidebeatrici commented 4 years ago

That definitely sounds like a different issue, because:

bkacjios commented 4 years ago

Yeah, I've had people AFK for like hours on end, come back, unmute/undeafen themselves, move out of the AFK channel, and when they try to post a link or just a message in general they will be unable to, and remain that way, until they reconnect.

Still, I upped the message burst to like 100/500, and I guess I'll report back if the issue still happens.

davidebeatrici commented 4 years ago

Sure, no problem.

bkacjios commented 4 years ago

Well, it's been two days of uptime without any issue. Raising those values seemed to do the trick!

davidebeatrici commented 4 years ago

We probably have to tune the default values.

vith commented 4 years ago

people can't see your chat messages, can't see you locally muting/deafening yourself, etc

I've had the same issues a few times since upgrading my server to 1.3.0. At one point other people couldn't see my chat messages. In another case, when I unmuted myself my client showed I was unmuted in the toolbar but not in the user list, and it would show on my end that I was broadcasting but I don't think I could be heard. I eventually went and did a local loopback test which worked, but a server loopback test was silent.

I just looked in my server's log and I don't see anything about rate limits being tripped. Should there be log messages about it if it was that new feature being triggered?

I was also not doing multiple actions in quick succession, so it seems really odd that I would have been caught even by the default settings. I am raising them really high now, from (5, 1) to (5000, 1000). I'll report back if it happens again.

davidebeatrici commented 4 years ago

Unfortunately the client doesn't provide any feedback to inform the user that the limiter was triggered, we definitely have to add it.

bkacjios commented 4 years ago

So, figured I'd report back.. With these settings below, and after a month straight of uptime on my server and bot, the issue came back again. Obviously something is wrong with the message burst limits, it must be very slowly building over time for whatever reason, eventually ignoring all packets a client sends to the server.

messageburst=100
messagelimit=50

Setting higher values is obviously just masking the issue.

Image of uptime

Krzmbrzl commented 4 years ago

It has been suggested that the present issue is the same as #3985, which in turn seems to have been fixed in the current master branch. Did anyone happen to have run murmur 1.4 build and experience this problem?

The fix will be part of the 1.3.1 release, so even if you couldn't try it with a 1.4 snapshot, you'll be able to test the fix in that release.