jamulussoftware / jamulus

Jamulus enables musicians to perform real-time jam sessions over the internet.
https://jamulus.io
Other
1k stars 224 forks source link

At about 70 client connections the client/server do not show/list all clients anymore #547

Closed corrados closed 4 years ago

corrados commented 4 years ago

See the report in this post: https://github.com/corrados/jamulus/issues/455#issuecomment-681572514

corrados commented 4 years ago

I did a quick test where I created 100 artificial clients with the following code in socket.cpp:

            // server:

            int iCurChanID;

// TEST
for ( int i = 0; i < 100; i++ )
{
    if ( i > 0 )
    {
        RecHostAddr.iPort++;
    }

            if ( pServer->PutAudioData ( vecbyRecBuf, iNumBytesRead, RecHostAddr, iCurChanID ) )
            {
                // we have a new connection, emit a signal
                emit NewConnection ( iCurChanID, RecHostAddr );

                // this was an audio packet, start server if it is in sleep mode
                if ( !pServer->IsRunning() )
                {
                    // (note that Qt will delete the event object when done)
                    QCoreApplication::postEvent ( pServer,
                        new CCustomEvent ( MS_PACKET_RECEIVED, 0, 0 ) );
                }
            }
}

But I cannot reproduce the issue. I could see all 100 faders in the client and the server table was also complete. Maybe the problem was caused by the CPU being at 100% which can cause weird effects.

storeilly commented 4 years ago

We were doing undocumented tests last night on a server I've compiled to allow 255 clients and had similar issues, on one of my clients I could see 102 clients while the other was stuck at 53. I was watching htop and no cpu (4 total) was over 50% I plan to do this test properly with properly recorded results over the next few days. Audio at 102 clients was badly broken but intelligible.

maallyn commented 4 years ago

I just completed the following test and observation.

First, I started with 30 connections on the server. The client could see those 30 connections.

Then I added another 20 clients slowly, so as to not overrun the server with fast connection requests; I did each one every 5 seconds.

Then someone else came on (client with a different name)

However, with 50 connections, I could not see that person on my Jamulus client (Windows 10). However, I could see him on the master list as viewed with my connection status panel.

I then reduced the number of connections (by stopping each) until I got down to about 25 connections. Now I could see all of them without having to scroll sideways. However, I still could not see the one client whom I was talking with.

I then stopped and re-started the client and I did see him.

This suggests to me that there may be some data stuck in the client that had to be reset upon a restart of the client.

Throughout all of this, the sound from that one client was very good despite the fact that I could not see his fader on my panel.

Throughout all of this, there were only occasional instances when any of the four dedicated CPU's hit over fifty percent.

corrados commented 4 years ago

@softins Maybe the issue is related to https://github.com/corrados/jamulus/issues/255. The client list for the audio mixer board gets quite big in case of 100 clients (similar to the server list). Is it possible that you support us with a wireshark analysis of that situation? Maybe if a test is running with more than 50 clients, you could log onto that session and check the protocol messages sent to your client?

softins commented 4 years ago

Sure, I'd be happy to help, although not available today (Sat).

softins commented 4 years ago

Just a couple of general comments before I disappear for the day:

corrados commented 4 years ago

So it looks to me more likely to be something got out of sync between the protocol and the GUI.

The protocol transmits messages "message by message" in the order the messages are scheduled. If one message does not make it through to the client because of fragmentation, the protocol mechanism will be stuck trying to transmit that one message and does not process any further messages. That would explain the described behaviour I think (or better, I guess... Therefore it would be good to get a proof of that by checking the network traffic with wireshark).

maallyn commented 4 years ago

Okay, thank you for the suggestion, Simon. Sometime next week when I have time, I will re-set up everything but install wireshark on both the server sunning the clients and the jamulus server itself and somehow capture the screens and record them as a video. I have a question. Can I attach a video or at least a link to a video on this ticket?

Or better yet, can I set up a jitsi or zoom session and show the stuff live?

corrados commented 4 years ago

Thanks for offering the test. I guess it would be easier if you talk to softins when you start your test so that he can connect to your server and capture the wireshark output on his side. So you should do the test when you and softins have time for it.

softins commented 4 years ago

Hi Mark, there's no Simon on this thread. I'm Tony (also on Facebook). Happy to liaise with you in the week. I'm on UK time, I think you are EDT? If you haven't seen it, check out my repo https://github.com/softins/jamulus-wireshark

maallyn commented 4 years ago

I'm embarrassed that I made this mistake. I keep thinking that Corrados is Simon. Thank you for straightening me out. If you are in UK, then the best time for me would be my morning, which is your late afternoon. So, if I try to do this around nine in the morning Pacific U.S. Time, that would be six in the evening for you, is that an okay time?

corrados commented 4 years ago

I was looking at the original error description again. Here's what I found:

And the client slider panel stopped showing sliders after about 60 or so connections. However, the total count (the number indicated adjacent to the server name on the master server listings showed the total count (81), along with the total number on the top of the client's panel.

I checked in the code of the client that the total count (in this case 81) is only shown if a client list protocol message for the audio mixer board was correctly and completely received. So maybe the issue is not related to #255. But I think it still does make sense to check the protocol anyway under this stress situation to check that everything works as expected.

softins commented 4 years ago

If you are in UK, then the best time for me would be my morning, which is your late afternoon. So, if I try to do this around nine in the morning Pacific U.S. Time, that would be six in the evening for you, is that an okay time?

Except for the week straddling Oct/Nov (we end DST a week earlier than the US), we are 8 hours ahead of Pacific time, so your 9am is UK 5pm. That would be fine for me. I normally go to eat about 6pm or so. I think any day this week works for me. You can find me on Discord as Tony M or on Facebook Messenger as Tony Mountifield

maallyn commented 4 years ago

I completed a test where I had 60 real connections (not using Volker's modification to the socket.cpp file). Those 60 connections were done from another server in the cloud.

Then I made a connection from my client, which is under a different name and could be identified on the mixer panel.

I confirm with the latest download for windows 10, that could be seen at the far right end of the mixer with the bottom slider all the way to the right.

Unfortunately, I cannot create over 60 clients from a ubuntu server because the jack software starts to break down, which is an unrelated issue here.

corrados commented 4 years ago

We were doing undocumented tests last night on a server I've compiled to allow 255 clients and had similar issues, on one of my clients I could see 102 clients while the other was stuck at 53.

@storeilly Today I did some multithreading tests on your "jam mt 26" server and could reproduce the issue. And I think I know now whats the issue. If you quickly start a massive number of clients, we get a huge protocol traffic to the clients. For each new client which connects to the server, the complete clients list is updated and also immediately the mixer levels are updated. So there is massive number of protocol messages in the queue. According to the Jamulus protocol MAIN FRAME design, there is a "1 byte cnt" which identifies the messages in the queue. So we can have 256 different messages. If we now have that massive number of messages which are transmitted in a very short time, it can happen that the order of the received network packet is changed and that a packet which should be 256 protocol messages later processed is received, since the counter wraps around the protocol mechanism thinks it has received the correct message and acknowledges it. But it was the incorrect one (caused by the wrap around). If that happens, the protocol system gets stuck and no message is delivered anymore.

One solution to the problem would be to use 2 bytes instead of just one for the counter. But that would break the compatibility to old Jamulus versions (client and server) which is not good.

@softins Do you think with your wireshark tools, you could prove that my assumption is true?

storeilly commented 4 years ago

I see the logs, you were connecting at about 40 per second at one stage. Well done!! I don't have the ability to do that! I was working with @brynalf and we were leaving 5 seconds between each new connection, not sure if this helps?

storeilly commented 4 years ago

We were doing undocumented tests last night on a server I've compiled to allow 255 clients and had similar issues, on one of my clients I could see 102 clients while the other was stuck at 53.

@storeilly Today I did some multithreading tests on your "jam mt 26" server and could reproduce the issue. And I think I know now whats the issue....... @softins Do you think with your wireshark tools, you could prove that my assumption is true?

I've just installed Tshark on that machine, so if you want to 'hit' it again just give me a little notice to start the capture and we can send it to @softins or yourself for analisys?

softins commented 4 years ago

Just seen this. Happy to help. I usually capture using tcpdump rather than tshark, and then copy to my PC for viewing.

I have a script that sets up tcpdump to capture just protocol packets without the audio - makes the files a lot smaller!

[root@vps2 ~]# cat capture-jamulus-proto.sh 
#!/bin/sh

DATE=`date '+%Y%m%d-%H%M%S'`
FILE=jamulus-proto-$DATE.pkt

cd /var/tmp

tcpdump -C 128 -i eth0 -nn -p -s0 -w $FILE udp portrange 22120-22139 and '(udp[8:2] == 0 and udp[4:2]-17 == (udp[14]<<8)+udp[13])' </dev/null >/dev/null 2>&1 &

You may need to change the interface name from eth0 depending on your system.

softins commented 4 years ago

@corrados It's certainly possible that the 8-bit sequence number rolled over. Happy to verify that if I can reproduce it or be sent a capture file.

If that is what is happening, then maybe when sending to a client, it could pause before using a sequence number that is still unacked from the previous time, and wait until the ack comes in? This situation would seldom occur in practice, so on the rare occasion it does, a few ms pause would be tolerable, I would think.

I haven't looked at the code, but that is how I would initially approach it.

softins commented 4 years ago

The Jamulus dissector for Wireshark is a single .lua file, available at https://github.com/softins/jamulus-wireshark

corrados commented 4 years ago

Thanks for all your support. By looking at the code I have found a possible bug in the protocol mechanism. Hopefully I get some time this evening to further investigate this. I'll keep you informed by my progress.

maallyn commented 4 years ago

Folks:

Here is what I was finally able to do to overcome the restrictions of running 60 clients on Jackd.

I was able to create a second user on the ubuntu server that I am using to send the clients to newark-music.allyn.com and I was able to have that new user also send 60 clients over to newark-music as it had it's own instance of jackd, using a separate aloop audio device.

I configured the script to wait 2 seconds between each client invocation as to not to overrun the server.

After about 70 to 80 clients sent to the server, I notice the at listing from the master server started to have a big gap of empty lines in the listing for newark-music before the listing for the next jamulus server. The total count listed at the top (next to the server listing itself did say 100, but apparently not all 100 are listed; those after about 70 or so had a blank line.

I re-ran the scenario but with no delay between client invocations and the effect was the same. So to me, it does not seem to be an overrun with speed issue.

I also notice that I could not do a systemctl restart of the jamulus server; I had to do a full reboot of the machine.

After I rebooted the machine, the master server's listing of the newark serve remained stuck for a full three or four minutes until it finally reset to 0 at about the time that the newark server finisted it reboot. So, that seems to indicate to me that the master server does not correct the listing for a while after my server was rebooted.

I am wondering if the issue is with the master server being overloaded. I checked the logs on the newark server and I saw no error indications.

All of these tests have been made with no music sent on any of the clients.

I hope this all helps.

Mark Allyn Bellingham, Washington

corrados commented 4 years ago

By looking at the code I have found a possible bug in the protocol mechanism.

Unfortunately it turned out not to be a bug.

since the counter wraps around the protocol mechanism thinks it has received the correct message and acknowledges it

I also do not think that this is the case anymore after I have checked some things today.

I re-ran the scenario but with no delay between client invocations and the effect was the same. So to me, it does not seem to be an overrun with speed issue.

That is interesting. I just run a set of tests this evening and I found out the opposite. When I start the clients without a delay, there is a threshold of 58 clients until the server get's confused and all sort of strange things happen. If I put a delay of about a second after the creation of each test client, I can start more than 58 and do not see the issue.

I'll further investigate the issue...

maallyn commented 4 years ago

Just out of curiosity, I slowed down the script so that it issued a client connection once every 20 seconds. This got interesting. The missing connections on the listing from the master server git fewer. It made it to about 80 connections (instead of upper 60's low 70's). But there was still a gap in the listing and I could not connect my own client from my PC after we have 90 connections (server has capacity of 100).

At 45 connections, I then initiated my own connection from my PC and was able to hear myself. However after about 75 connections, my return sound was very much warbly and distorted. I checked htop and found no cpu's hitting over 80 percent and I can see all four CPU's engaged. This is a dedicated cpu instance on Linode/Newark.

I check network and disk utilization on the Linode dashboard and the network never hit more than about 7 MB outbound. Disk and memory were only nominal.

I am wondering if we may have something both performance (ability to handle fast multiple connections) as well as functional (slowing connection requests to one per 20 seconds) reduced, but not eliminated the issues.

This entire session would have resulted in a too big of a file if I ran tcpdump.

I hope this all helps; if there anything I can try to do more, please let me know.

Mark

maallyn commented 4 years ago

How do I get and compile in your change? Just do a new git sync or do I need a tag or CONFIG?

corrados commented 4 years ago

If you are in Git master, a git pull should be sufficient to get the latest changes.

storeilly commented 4 years ago

The network only hit 80MB at about 21:30 last night (the resolution drops on AWS as time progresses). The network capacity is 7TB so I doubt that is the issue. I'll build that commit shortly. Thanks @corrados

maallyn commented 4 years ago

I just tried the tip as of about 1 hour ago. My server never registered with the master or central server. Connections stopped at about 74 users (label at the top of my client mixer board on my PC. Server log, however shows all 93 (46 from each of the automated client from my Seattle server and one from my desktop in Bellingham. I tried to do a 2nd connection from my PC, but it hangs on trying to connect. After fifteen minutes, my server never shows up in All genres on the listing from the central server. I also looked on jamulus.softins.co.uk and did not find it there as well. My own server log in /var/log/syslog shows registration full (which I guess is okay). However, I also notice my syslog see a lot of repeats of this "2020-09-04 17:22:15, 67.161.92.199, connected (92)"

I need to let you know that unfortunately, the rest of my Friday is shot. I might not be able to do any more testing until tonight. Also, I did not attempt to do tcpdump as it may be too big. If you have suggestions of whittling down the tcpdump's bulk, I will try to run it.

softins commented 4 years ago

Also, I did not attempt to do tcpdump as it may be too big. If you have suggestions of whittling down the tcpdump's bulk, I will try to run it.

Mark, have a look at my comment further up, where I included my capture-jamulus-proto.sh script, that starts tcpdump to capture only protocol packets, not audio.

maallyn commented 4 years ago

I now re-ran tests with the tcp dump with Tony's script running on the server. New observation - This time, I did see the listing in the central servers' listings. It turns out that the listing on the central servers says 93 clients. I did see a small space at the end of them all; maybe missed one client.

My client's window showed the number to be 80 something. Not the 93 as shown from the central server's listing. Also I had noticed that the server's listing froze on the central server's list and that the latency jumpted to >500 ms. I could not connect from my PC; stuck in trying to connect.

I do have a file up on-line at http://allyn.com/jamulus-proto-20200904-202941.pkt for the packet capture

Mark

maallyn commented 4 years ago

Folks; I just did a re-pull from git and recompile of the server at newark-music.allyn.com. Here is the information you folks might need:

What I did

git clone https://github.com/corrados/jamulus.git cd into jamulu git fetch --all --tags

Here is the git log: ############################################################################## commit ef0c5d7a90aecb01530616277cb07e88556df971 Author: Volker Fischer corrados@users.noreply.github.com Date: Fri Sep 4 22:37:41 2020 +0200

cleanup

commit bdd2c09b6e1dfa48210c8ad165d60272c0ac2ae9 Author: Volker Fischer corrados@users.noreply.github.com Date: Fri Sep 4 22:16:38 2020 +0200

make the protocol load lower if the number of clients changes at the server by only sending mute/solo gain updates to the server if necessary

G commit c9c5e41456609bb7573d0d6577bf21c7ea4b72d7 Author: Volker Fischer corrados@users.noreply.github.com Date: Fri Sep 4 16:50:16 2020 +0200

hopefully solves #547 (issue with protocol gets stuck on heavy load)

commit 999d0e32ebfd25178c981a6ea8379b272cad1649 Merge: 85f9c83 2d4683b Author: Volker Fischer 46655886+corrados@users.noreply.github.com Date: Thu Sep 3 23:03:01 2020 +0200

Merge pull request #566 from tormodvolden/headless

Do not install desktop file and icons if headless

commit 2d4683bd48e85323c4d0c2d7e876a368e4c19d41 Author: Tormod Volden debian.tormod@gmail.com Date: Wed Sep 2 23:34:16 2020 +0200

Do not install desktop file and icons if headless

Signed-off-by: Tormod Volden <debian.tormod@gmail.com>

commit 85f9c830996f26aa550947800fe3b35e9eddd60e Author: Volker Fischer corrados@users.noreply.github.com Date: Wed Sep 2 21:43:05 2020 +0200

remove the bIsCallbackAudioInterface from the soundbase (because no interface is using it now)

###########################################################################################

Change line 180 in src/global.h to:

define MAX_NUM_CHANNELS 250 // max number channels for server

sudo apt-get install build-essential qt5-qmake qtdeclarative5-dev libjack-jackd2-dev qt5-default

qmake "CONFIG+=nosound" Jamulus.pro make clean make:

Jamulus start command:

############################################################################################

systemctl daemon-reload systemctl start jamulus

=====================================================================

Server speedtest results as of 4 PM U.S. Pacific Time (Washington State) root@bicycle-bellingham:/usr/local/bin# speedtest Retrieving speedtest.net configuration... Retrieving speedtest.net server list... Testing from Linode (172.104.29.25)... Selecting best server based on latency... Hosted by Kansas Research and Education Network (Wichita, KS) [43.14 km]: 35.756 ms Testing download speed........................................ Download: 772.75 Mbit/s Testing upload speed.................................................. Upload: 86.53 Mbit/s

=======================================================================

=======================================================================

Unfortunately, I don't have time to do any stress testing until later tonight. You guys are welcome to try to test this configuration.

The question that I have here is is it safe for me to put the servers' login name and password and root password here for you folks to play with? How safe is github.com/corrados/jamulus/issues as far as noting access information?

maallyn commented 4 years ago

Please also note that this build may also be appropriate for testing on issue 455.

corrados commented 4 years ago

is it safe for me to put the servers' login name and password and root password here for you folks to play with?

No, you should never publish a root password in a public forum.

Thanks for your offer. I just tried out your server to see if I can reproduce the issue but it was not possible. It seems the delay to Newark is too high so that the issue does not occur. Just with Stefens server I can see the issue because the delay is much smaller.

softins commented 4 years ago

I've updated my tcpdump script to capture fragmented IP packets too (which Wireshark will reassemble for display):

[root@vps2 ~]# cat capture-jamulus-proto.sh
#!/bin/sh

DATE=`date '+%Y%m%d-%H%M%S'`
FILE=jamulus-proto-$DATE.pkt

cd /var/tmp

tcpdump -C 32 -i eth0 -nn -p -s0 -w $FILE '(ip[6:2]&0x3fff) != 0 or (udp portrange 22120-22139 and (udp[8:2] == 0 and udp[4:2]-17 == (udp[14]<<8)+udp[13]))' </dev/null >/dev/null 2>&1 &

Of course, portrange 22120-22139 can be replaced by a specific port designation such as port 22124

maallyn commented 4 years ago

I changed my server configuration and did some more tests.

First of all, I moved the client generation from my Seattle server to a new server located in the same Linode data center as newark-music.allyn.com. It is a 2 processor dedicated server that I intend to rent only for the duration of this and the performance issue 455. It's name is client.allyn.com

With this configuration, I can hit the server with client requests far quicker than with the client generating server in the Vultr Seattle data center as I had before.

However, I did notice something interesting. When I deliberately slowed down the generation of clients to one every 20 seconds, I still saw the issue with missing clients (and distorted sound) after about 70 client connections. The display at jamulus.softins.co.uk worked fine with automatic updating until about 75 clients and then it stopped and showed newark-music.allyn.com with no clients.

I did notice that when I rebooted the client generation sever and knocked al 92 clients off-line, it took about 5 minutes for the server to get rid of them (without having to reboot) and seem to recover all operation.

I have also made changes in the server so that I have a script that does a git sync, then apply a patch to increase the max clients to 250 and then do a compile for both the server software and the client software. It is run from logging into newark-music.allyn.com. It also automatically modified the server description announcement you see in the chat window when you connect to it.

Unfortunately, I lost track of the patch to have the names of the clients change for each invocation, so all clients are named No Name.

Although Volker told me that I cannot include passwords here, Tony already has a key for account maallyn@newark-music.allyn.com. I duplicated that key to maallyn@client.allyn.com and c1@client.allyn.com and c2@client.allyn.com The c1 and c2 accounts are used to generate the clients. Because of limitation of the Jack server, only about 50 clients can be created on each account. Each account has to have it's own invocation of the jack server (which is done via the vncserver facility. I am giving Tony the okay to install keys for the other developers if they are interested.

storeilly commented 4 years ago

Noticed a test was done last night, so uploaded files to dropbox jamulus-proto-20200907-112301.pkt and jamulus10.log

softins commented 4 years ago

Noticed a test was done last night, so uploaded files to dropbox jamulus-proto-20200907-112301.pkt and jamulus10.log

There was nothing of interest in that packet file. Just a short-lived connection from a client in Malaysia at around 17:43 UTC yesterday.

softins commented 4 years ago

However, I did notice something interesting. When I deliberately slowed down the generation of clients to one every 20 seconds, I still saw the issue with missing clients (and distorted sound) after about 70 client connections. The display at jamulus.softins.co.uk worked fine with automatic updating until about 75 clients and then it stopped and showed newark-music.allyn.com with no clients.

@maallyn Now that's interesting, and I'd like to observe that. I now have a tcpdump on the backend of Jamulus Explorer (capturing specific IP addresses including newark-music), and a corresponding one on newark-music (capturing only traffic with my server and with client.allyn.com). If you can rerun your big test when convenient, I'll look at the traces. Please ping me on discord before you do, so I can make sure I am watching Explorer

softins commented 4 years ago

OK, I have looked at the packet traces from both newark-music and jamulus.softins while you did the test of one new client every 20 seconds. As we saw, jamulus.softins stopped displaying the Version/OS and client list for newark-music once the number of clients reached 62. This is partly due to the design of the jamulus explorer backend: it sends out all its pings to the servers in the server list; when it gets a ping back it sends a version/OS request and a client list request. Because some servers will not respond, it needs to wait until it has received no packets for a certain length of time. This idle timeout is currently 1.5 seconds, which is usually plenty, after which it sends the accumulated data back to the jamulus explorer front end. Increasing the idle timeout makes it take longer for the front end to display when switching genres.

But when the number of clients in the jamulus server reaches a threshold, the delay in responding starts to increase disproportionately, making the replies too late to be caught by the jamulus explorer client. This is what I observed in this test:

Clients Delay in responding to CLM ping or request
61 ~0.8s
62 ~2s
63 ~4s
64 ~6s
65 ~8s
66 ~10s

I looked at the other traffic at the time, and while there is a lot of traffic taken with sending level lists and client lists to the connected clients, there are still a lot of gaps, indicating that it is not due to network saturation. It is interesting to see how the delay increases so much.

softins commented 4 years ago

@maallyn On client.allyn.com, I have also made an updated version of your junker script as junker2 to give the clients individual names:

#/bin/bash
for i in {1..46}
do
  sleep 20
  NAME=`echo -n Test $i | base64`
  INIFILE=".jamulus$i.ini"
  echo "<client><name_base64>$NAME</name_base64></client>" >$INIFILE
  /home/maallyn/jamulus/Jamulus -i $INIFILE -j -n --connect 172.104.29.25 >/dev/null 2>&1 &
done
maallyn commented 4 years ago

Thank you, Tony! This saves me from having to dig into the code!

Mark

From: "Tony Mountifield" notifications@github.com To: "corrados/jamulus" jamulus@noreply.github.com Cc: "Mark Allyn" allyn@well.com, "Mention" mention@noreply.github.com Sent: Tuesday, September 8, 2020 2:32:51 PM Subject: Re: [corrados/jamulus] At about 70 client connections the client/server do not show/list all clients anymore (#547)

@maallyn On client.allyn.com , I have also made an updated version of your junker script as junker2 to give the clients individual names:

/bin/bash

for i in {1..46} do sleep 20 NAME=echo -n Test $i | base64 INIFILE=".jamulus$i.ini" echo "$NAME" >$INIFILE /home/maallyn/jamulus/Jamulus -i $INIFILE -j -n --connect 172.104.29.25 >/dev/null 2>&1 & done

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or unsubscribe .

-- Mark Allyn Bellingham, Washington www.allyn.com

softins commented 4 years ago

If replying to a github message by email, you need to avoid quoting the message being replied to! I discovered this myself the other day.

maallyn commented 4 years ago

I just did test using the feature_protosplit.

Here is what I did on newark_music.allyn.com to do build for both client and serve:

==================================

!/bin/bash

cd /home/maallyn rm -rf jamulus git clone https://github.com/corrados/jamulus.git cd jamulus git fetch --all --tags git tag git checkout feature_protosplit

Do any local changes here

git apply /home/maallyn/max-client.patch

move to client

cd /home/maallyn ssh maallyn@client.allyn.com rm -rf /home/maallyn/jamulus rsync -av jamulus maallyn@client.allyn.com:

do compile on client

ssh maallyn@client.allyn.com rm -f client_compile.sh

rm -f client_compile.sh

cat << 'EOF' > client_compile.sh

!/bin/bash

cd jamulus qmake Jamulus.pro make clean make EOF

chmod +x client_compile.sh scp client_compile.sh maallyn@client.allyn.com:client_compile.sh ssh maallyn@client.allyn.com /bin/bash client_compile.sh

do compile here

cd jamulus server_qmake="CONFIG+=nosound"

qmake $server_qmake Jamulus.pro make clean make

==========================================================

The first set of 46 clients launched okay.

However after about client 28 on the 2nd set, the server stopped reporting and the output on jamulus.softins.co.uk collapsed.

When I did a killall, the server did restore proper operation without having to reboot or restart.

corrados commented 4 years ago

Do you run the test clients and the server on the same PC?

maallyn commented 4 years ago

To Volker: I have two machines in the cloud in the same data center. One runs the server. The other runs the clients which are run via vncserver. Both are in the Linode Newark data center. If you need to have access and look around, I can install your ssh key in them. Tony already has access.

corrados commented 4 years ago

The first set of 46 clients launched okay. [...] However after about client 28 on the 2nd set, the server stopped reporting and the output on jamulus.softins.co.uk collapsed.

That is interesting. On my test today I could run about 70 clients on storellys server and it still worked with good audio quality. The question is why your server hold so much less clients...

softins commented 4 years ago

The first set of 46 clients launched okay. [...] However after about client 28 on the 2nd set, the server stopped reporting and the output on jamulus.softins.co.uk collapsed.

Note that there are limitations with Jamulus Explorer:

  1. It only uses connectionless (CLM_xxx) messages, and these haven't had the split extension applied yet. But in any case, the back-end has no problem with large UDP fragmented messages.
  2. As mentioned in my comment above, once a Jamulus server has more than a certain number of clients connected (61 in the instance I observed), it starts to take exponentially longer to respond to CLM_xxx messages, which causes the Jamulus Explorer back-end to time out before it receives the replies. I could increase the timeout, but that would make it longer to respond for everyone, as it needs to wait until the timeout to decide there are no more replies to be received.

This exponential delay in the Jamulus server responding is a separate issue that will need investigation at some point.

maallyn commented 4 years ago

Tony: If this is the case, then does this explain why the audio degrades after about 60 to 70 client connections? In addition to that, since most Jamulus use cases (jamming with instruments in small groups) does not approach 60 users, the Jamulus, both the client and the server as they are now, is perfectly adequate for the vast majority of cases, including our own worldjam, where even the waiting room rarely even touches 40 clients?

The one major exception would be for choirs and perhaps a large orchestra.

I have been trying to sell this to my choirs, one of which has 50 and the other has130 voices.

However, I am beginning to feel that I should not be attempting to sell my choir members to use a client desktop that looks like a sound mixer and may be intimidating to choir members in my church / chorus who are techno-phobic and just want something simple plug and play to participate, like Zoom. I have already got strong pushback by other members of my Unitarian Fellowship's audio visual and tech committee, which now I am beginning to agree with.

If there is a very simple client, without the faders and the VU meters, would the traffic being handled by the server be less and the issue you are seeing with 60 to 70 plus connections go away?

softins commented 4 years ago

It may be related. I did some tests yesterday between separate client and server machines on my LAN, while monitoring the interface data rates using SNMP. I found the bandwidth usage on a server increased linearly with the number of clients, which makes sense, since it sends and receives one stream to/from each client. I only went up to 21 clients, so wasn't pushing the limits - I would need more client machines than just my Raspberry Pi to really exercise the server.

However, thinking about it, the demands placed on the system by the mixing will increase by the square of the number of clients: with N clients connected, and each client having their own separate mix generated, the server will be producing N mixes, each from N streams. So as the number of clients increases, there will come a point where it quickly degrades and can't keep up. That will depend on the power of the hardware. As I understand it (I'm still trying to fully understand the code structure), there is a high priority thread that handles the audio mixing, and a lower priority thread that is responsible for handling the protocol. I don't yet know how that is affected by the multi-threading enhancements.

I'm not sure that a simpler client would reduce the traffic enough to be significant, when compared to the bandwidth consumed by the actual audio, and the N² factor on the mixing. However, I can certainly see the benefit of such a client for user friendliness, where the users do not need/want to have their own fine control of a large number of participants.

softins commented 4 years ago

Maybe for a choir kind of usage, we need some kind of architecture where there is a generic mix produced, that can be controlled by one person, but that is then sent identically to most of the clients. That would reduce both the mixing load and the technical usability burden on those users. There could still also be clients that have their own custom mix.

But these would be ideas for Jamulus V4.