Murmur can't handle more than 230 connections

AmrAshraf commented 3 years ago

Description I tried to perform load testing to figure out how much mumble can handle number of clients without sending voice over it. I figured out after opening 230 clients it will start not seeing the messages/pings comes from client then it will get timeout and close the connections of the users like the following:

<W>2021-04-28 19:50:03.789 1 => <1360:t-174(254)> Timeout
<W>2021-04-28 19:50:03.792 1 => <1360:t-174(254)> Connection closed:  [-1]

Steps to Reproduce

I used nodejs and also python3 client to perform this test. I tried to open each client in a seperated process and also in one thread. I tried to open bunch of clients in several servers. Also i expand my ulimits and every thing in sysctl and no difference. Also i disable requiring user certificate as i use username and password in my test.

The easiest way for generating clients is using pymumble: $pip install pymumble

import pymumble_py3
import time
from pymumble_py3.callbacks import PYMUMBLE_CLBK_SOUNDRECEIVED as PCS

pwd = "userPassword"  # password
server = "serverIP"

nick = "t-"
for i in range (1,300):
    mumble = pymumble_py3.Mumble(server, nick+str(i), password=pwd)
    mumble.start()

while 1:
    time.sleep(1)

Expected behavior Accept connection as much as i reach the tcp limit of server or when 100% of CPU at least.

Desktop:

OS: ubuntu 20.04
Version: Murmur 1.3 (mumble-server)

davidebeatrici commented 3 years ago

Thanks a lot for the report!

A limit of 230 connections is indeed awful.

I'm not sure what could be the bottleneck, as you mentioned that the CPU is not completely used. However, we wanted to rewrite the server from scratch anyway and this is an additional incentive to do it.

Krzmbrzl commented 3 years ago

This does not sound right to me. I know for a fact that the Mumble server is actively being used with >1000 active clients without issues.

I assume you hosted the server on the same machine that you had your bots running on? In that case the first thing that would pop into my mind is that the network socket throughput on your machine might be overloaded. Did you check that?

davidebeatrici commented 3 years ago

I know for a fact that the Mumble server is actively being used with >1000 active clients without issues.

Which one?

Krzmbrzl commented 3 years ago

EVE online servers and also another that I can't give details about.

The EVE Online one was told me about by @dessix and now that I looked that conversation up again, it seems that it is more like 10k users instead of >1000.

Dessix commented 3 years ago

Yep, as @Krzmbrzl mentioned, we've had several thousand in a single channel, before.

AmrAshraf commented 3 years ago

This does not sound right to me. I know for a fact that the Mumble server is actively being used with >1000 active clients without issues.

I assume you hosted the server on the same machine that you had your bots running on? In that case the first thing that would pop into my mind is that the network socket throughput on your machine might be overloaded. Did you check that?

I tested on a seperated server one time with 1 giga of ram and 1 core and second time with 8 giga of ram and 4 cpu and the result is the same.

Krzmbrzl commented 3 years ago

I tested on a seperated server one time with 1 giga of ram and 1 core and second time with 8 giga of ram and 4 cpu and the result is the same.

Hm okay.

Did you check whether the 230 concurrent client limit may be inside PyMumble? I could imagine that PyMumble creates a new thread for every connection and 230 active threads (keep in mind that connected clients will always exchange ping messages with the server) sounds like it might hit some limit on the computer running the clients.

In order to check that I would suggest hosting the server remote and then firing clients at it from multiple computers each with 100-150 clients each.

AmrAshraf commented 3 years ago

I tested on a seperated server one time with 1 giga of ram and 1 core and second time with 8 giga of ram and 4 cpu and the result is the same.

Hm okay.

Did you check whether the 230 concurrent client limit may be inside PyMumble? I could imagine that PyMumble creates a new thread for every connection and 230 active threads (keep in mind that connected clients will always exchange ping messages with the server) sounds like it might hit some limit on the computer running the clients.

In order to check that I would suggest hosting the server remote and then firing clients at it from multiple computers each with 100-150 clients each.

I did that. I test with all client on one server. Second time i distribute the load from 3 servers. And i don't only test with python client, however i tested that with nodejs and python clients. One time i tested that each client have its own process to run. Other case to make some on a channel and other half of clients on another channel. The result i get is the same.

Krzmbrzl commented 3 years ago

Okay I think I know what the issue is: You are effectively benchmarking the amount of concurrent authentications on a Mumble server. Meaning that if there are >230 people trying to concurrently connect to a Mumble server, some of them might experience a timeout (depending on their client's settings - idk what the PyMumble uses as a default).

If you want to benchmark how many connections the server can handle, you have to add a small delay in your test script.

With the following script I was able to easily have 500 clients connected to a server hosted on the same machine (localhost):

#!/usr/bin/env python3

import pymumble_py3
import time

server = "localhost"

nick = "t-"
for i in range (0,500):
    print(i)
    mumble = pymumble_py3.Mumble(server, nick+str(i))
    mumble.start()
    time.sleep(0.2)

time.sleep(10)

AmrAshraf commented 3 years ago

Okay I think I know what the issue is: You are effectively benchmarking the amount of concurrent authentications on a Mumble server. Meaning that if there are >230 people trying to concurrently connect to a Mumble server, some of them might experience a timeout (depending on their client's settings - idk what the PyMumble uses as a default).

If you want to benchmark how many connections the server can handle, you have to add a small delay in your test script.

With the following script I was able to easily have 500 clients connected to a server hosted on the same machine (localhost):
#!/usr/bin/env python3

import pymumble_py3
import time

server = "localhost"

nick = "t-"
for i in range (0,500):
    print(i)
    mumble = pymumble_py3.Mumble(server, nick+str(i))
    mumble.start()
    time.sleep(0.2)

time.sleep(10)

I am sorry but i also did that delay. All were authenticated well. But i will try it soon again. What os version you have and which murmur version?

Krzmbrzl commented 3 years ago

I am using KDE Neon 5.21 (built on Ubuntu 20.04) and was using a server version compiled from the current master branch (aka: 1.4.0 snapshot)

AmrAshraf commented 3 years ago

@Krzmbrzl You are rigth. The problem was in nodejs clients. I managed to open 900 users with 7giga of ram for clients bot. The code that works in python:

import pymumble_py3
import time

pwd = "pass"  # password
server = "ip"

for x in range(1, 998):
    print(x)
    nick = "t-" + str(x)
    mumble = pymumble_py3.Mumble(server, nick, password=pwd)
    mumble.start()
    time.sleep(0.2)

time.sleep(10000)

Thanks guys for your help.

Krzmbrzl commented 3 years ago

Alright then I'll close this issue as resolved :+1:

davidebeatrici commented 3 years ago

7 GB of RAM is quite a lot though.

AmrAshraf commented 3 years ago

7 GB of RAM is quite a lot though.

It is not a memory of Murmur. It is the memory needed to run the previous python script.

davidebeatrici commented 3 years ago

Oh, I see.

mumble-voip / mumble

Murmur can't handle more than 230 connections #4955