WolfganP commented 4 years ago

Follows from https://github.com/corrados/jamulus/issues/339#issuecomment-657076545 for better focus of the discussion.

So, as the previous issue started to explore multi-threading on the server for better use of resources, I first run a profiling of the app on debian.

Special build with: qmake "CONFIG+=nosound headless noupcasename debug" "QMAKE_CXXFLAGS+=-pg" "QMAKE_LFLAGS+=-pg" -config debug Jamulus.pro && make clean && make -j

Then run as below, and connecting a couple of clients for a few seconds: ./jamulus --nogui --server --fastupdate

Once disconnecting the clients I gracefully killed the server pkill -sigterm jamulus

And finally run gprof, with the results posted below: gprof ./jamulus > gprof.txt

https://gist.github.com/WolfganP/46094fd993906321f1336494f8a5faed

It would be interesting to see those who observed high cpu usage run test sessions and collect profiling information as well to detect bottlenecks and potential code optimizations, before embarking on multi-threading analysis that may require major rewrites.

storeilly commented 4 years ago

Thanks sir, no worries, I'll plan another test this week.. Great stuff

On Mon 24 Aug 2020, 10:39 Volker Fischer, notifications@github.com wrote:

@storeilly https://github.com/storeilly Too bad, I had a bug in the multithreading Jamulus server which caused the server not to process the connected clients audio correctly. Therefore your test, unfortunately, with, e.g., "T# ! 72 clients" did not give any useful results with that previous buggy Jamulus code since the OPUS encoding of the clients was not done at all.

Anyway, I think I have fixed the bug and I have created a new label now: https://github.com/corrados/jamulus/releases/tag/multithreading_testing3

Sorry for the inconvenience.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/corrados/jamulus/issues/455#issuecomment-679022387, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIJSQI3ZSB2VZ74I224IHTSCIYNNANCNFSM4OXQXIXQ .

maallyn commented 4 years ago

Thank you all for the help.

I have just re-compiled with the multithreading-testing3 tag (HEAT at a5186b5) on my 4 CPU server at newark-music.allyn.com in New Jersey, in the Northeast United States. I will make announcements on Facebook to request people to test it and try to break it.

I do have a question; when I did the git chekout of the multithreading-testing3 tag (after re-doing git clone on a clean directory), I got this messages from git; I don't know if this is your message or from the github itself.

Note: checking out 'multithreading_testing3'.

You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example:

git checkout -b

HEAD is now at a5186b5... bug fix

I did not do the git checkout -b command, but I did my comple using the multithreading build option which seemed to be okay. The message from git did not mention whether it is okay to do a compile and I have no intention of making any changes in the code so I am assuming it's okay for me not to do the git checkout -b command.

melcon commented 4 years ago

I do have a question; when I did the git chekout of the multithreading-testing3 tag (after re-doing git clone on a clean directory), I got this messages from git; I don't know if this is your message or from the github itself.

$ git fetch --all --tags # to update your local repo and make sure you have all the tags in $ git checkout tags/mutithreading-testing3 # checkout the chosen tag named multithreading-testing3

Ref.: https://devconnected.com/how-to-checkout-git-tags/

maallyn commented 4 years ago

Folks: I built and install multithreading-testing3 onto newark-music.allyn.com (Linode 4 CPU Dedicated in Newark, NJ Data center. I then used two cloud machines (Linode in Toronto and Vultr in Seattle to hit newark-music.allyn.com with mass generated clients. With the help of some others, we were able to prove that audio quality was okay with up to about 55 connections; however, that was a time when newark-music was not dedicated. After I made newark-music dedicated, I then tried to connect 80 connections (40 each from Seattle and Toronto). At about 69 or 70 connections, both the master server stopped listing connections (just had a big white gap after about listing 61 connections). And the client slider panel stopped showing sliders after about 60 or so connections. However, the total count (the number indicated adjacent to the server name on the master server listings showed the total count (81), along with the total number on the top of the client's panel. I let it sit there for a while with no music playing in any of the clients. The htop showed about 40 to 60 percent CPU on each of the four CPU's on the machine (this is dedicated). I waited about 15 minutes, but the listing on the master server seemed to be stuck with the big white gap after about 60 listings of connections and the client's panel showing about 60 faders. At this point, I did not bother to try to feed music, but I tried to stop and start the server using the systemctl, but it would stop for about 5 minutes. Right now, I rebooted all three machines; newark-music is now up and running; people can use it. I still plan to run a test with other people at 4 PM US Pacific Time on Thursday, August 26th. Please note that I am willing to get up and be available at about 8 AM Pacific US time on Thursday if anyone from the UK team needs my help. I will also note this on the Jamulus World Jam Facebook Page.

corrados commented 4 years ago

With the help of some others, we were able to prove that audio quality was okay with up to about 55 connections

That's a new record :-). Thank you for that test. That shows that the new multithreading modifications improved the situation, i.e. we are going in the right direction with the multithreading implementation. My next step is to merge the multithreading branch to the master (some slight modifications are still missing before I do that).

At about 69 or 70 connections, both the master server stopped listing connections (just had a big white gap after about listing 61 connections). And the client slider panel stopped showing sliders after about 60 or so connections.

This seems to be a bug in the Jamulus software. I have created a separate Issue for that: https://github.com/corrados/jamulus/issues/547

WolfganP commented 4 years ago

Based on my understating that Qt based Threads, FutureSynchronization and Timers are basically queues, wouldn't it be worthy to instrument somehow the code to measure the queues when critical/time sensitive functions are invoked (ie OnTimer) and be able to detect processing/cycle overruns?

Somehow I have the feeling we're kind of moving blindly in terms of where to focus improvements or understand where the real performance showstoppers are in load processing for large ensembles; but that may be just be my perception... (as I'm still fighting with ALSA to present virtual devices that allow me to inject real audio for a set of massive clients run in parallel for testing)

maallyn commented 4 years ago

Folks: Thank you for the feedback. Just for the fun of it, I am now running aout 45 connections simultaneously onto newark-music.allyn.com as of about 10 AM US Pacific Time Thursday Aug 27. I intent to leave this up and running until about 4 PM today (Thursday), when I will stop them and let us try our in-person test. They are all playing the same music. So far, if I listen to any one of them, the music is clear. You are welcome to join in; if you want to try your own jamming over an above what I am duing, you can use the SOLO button. I tested it and it seems to work okay.

maallyn commented 4 years ago

To Wolfgan: What I am doing for parallel music is to first of all do everything from a cloud server. I first set up vncserver on the server and vnc back to my desktop (my internet connection is slow so I don't want music shoved through it to many connections). Once I set up vncserver, I set up alsa with a loopback device (a Linode dedicated cloud instance has no audio/video devices; it is headless).

Here is my also startup

alias and-card-0 snd-aloop options snd-aloop index=0 pcm_substreams=8

Here is my modules.conf file

snd-aloop

Here's the stuff that I install:

apt-get install alsa-tools apt-get install alsa-utils apt-get install qjackctl pulseaudio-module-jack apt-get install alsa-base apt-get install libasound2

apt-get install tightvncserver apt-get install -y xfce4 xfce4-goodies apt-get install -y xubuntu-core^ apt-get install build-essential qt5-qmake qtdeclarative5-dev qt5-default qttools5-dev-tools libjack-jackd2-dev

I then create a script to kick off the clients

/bin/bash

for i in {1..20} do ./llcon-jamulus/Jamulus --connect 172.104.29.25 & done

Then I got into qjackctl and connect pulseaudio jack sink (on the left hand side of the connect panel ) to each jamulus (on the right side of the panel).

I hope this helps.

Mark

WolfganP commented 4 years ago

Thx @maallyn I got to that point as well with snd_aloop, but the problem is that those channels are silent, and not actual audio goes back and forth to the server.

My goal is to actually define some kind of reproducible recipe to lab test with standard tools, be able to inject audio to those jamulus clients playing a wav file and get audio back that I may direct to null sink if not interesting in measuring quality, but have actual "real" sound load thru the entire system. Played a lot with .asoundrc and jack routing but I wasn't able to make it work... yet :)

storeilly commented 4 years ago

What about a collaboration, I can publish my 7 servers with the different config builds (fast/no fast)/(multi threaded/single threaded)/plain 3.5.10) which are on the same machine and share the htop results over a zoom screenshare. With enough live clients and dummy clients between us we can gauge the breakdown thresholds. A live edit/recompile is not out of the question either.... thoughts?

maallyn commented 4 years ago

I am willing to partake in the during our test meeting at 4 PM pacific us time. Or if you want, I can get on at 1 PM US Pacific, which, I belive, is 4 PM us east coast or 9 PM London, which if I am not mistaken where you are. I am tied up until 1 PM US Pacific time. If you don't want to do zoom, I do have a jitsi video bridge at conference.allyn.com/jamulus

storeilly commented 4 years ago

Servers are now on the jazz central. IP address = 52.49.128.29 for scripts... ports are 22124..22140 in the following sequence. "Jam st 24" = port 22124; single threaded Jam stF 25" = port 22125; single threaded with fast updates Jam mt 26" = port 22126; multi threading enabled Jam mtF 27" = port 22127; multi threading enabled with fast updates Jam 35a 28" = port 22128; standard release tag 3.5.10 Jam 35aF29" = port 22129; standard release tag 3.5.10 with fast updates Jam mt 30" = port 22130; multi threading enabled, no fast updates, MAX_NUM_CHANNELS = 256

maallyn commented 4 years ago

Folks: If you want to poke around with the machine that I am using to create the artificial connections, it is tester.allyn.com and you can ssh into it as user jamulus with password jamulus123 That is a guest account to let you poke around, but it does not have privileges to kill any of the current clients. You could, however, theoretically cause your own clients from that machine. It is a dedicated 4 CPU Linode at the Toronto data center that I am renting for the next few days.

maallyn commented 4 years ago

FYI, I am now sharing the htop output of newark-music.allyn.com on my jitsi conference server at conference.allyn.com/jamulus

maallyn commented 4 years ago

We had a group of up to 14 live people on newark-music.allyn.com tonight and all went very well. I heard no issues with the audio and I could not discern any latency jittering as people entered and left.

Earlier today, I loaded the system down with about 55 artificial connections (all done from the same source and playing the same music on a loop recording). I did some sound checks by doing a solo between myself and someone else and the sound was fine. I watched the htop and it seems to me that none of the processors ever went over 50 percent or so.

I am not going to try to do the stress testing any more as I feel that the multi-threading patch did work and I did not want to pay for more time on the dedicated server that I was using to send the connections (Linode in Ontario). That server is now gone.

However, for the time being, I plan to leave the conference.allyn.com/jamulus Jitsi meet locked onto a screen shot of the htop on the newark-music.allyn.com until I go to bed tonight and and I will re-start it tomorrow morning.

Thank you all for the help!

Mark Allyn Bellingham, Washington

WolfganP commented 4 years ago

Good testing @maallyn

Earlier today, I loaded the system down with about 55 artificial connections (all done from the same source and playing the same music on a loop recording). I did some sound checks by doing a solo between myself and someone else and the sound was fine. I watched the htop and it seems to me that none of the processors ever went over 50 percent or so.

Do you mind to share how did you inject the sound into the clients? (which player, which sound device used as input / output, ...) It seems that's the part I'm still missing to nail (maybe easier to document the whole stuff on a gist?) Thanks!

38github commented 4 years ago

I tried out I think the Newark server a couple of days ago and I think it reacted very slow to lowering volumes and soloing etc. The changes took up to maybe ten seconds to apply to the sound.

maallyn commented 4 years ago

To Wolfgan: Can we schedule a Jitsi meet so that I could walk through my procedure with you? Or better yet, if any of the others are here, are you all interested in a brief Jitsi or Zoom get together? I am willing.

corrados commented 4 years ago

@storeilly I just have changed the enabling of the new multithreading functionality. Now it is not required to re-compile Jamulus to activate this functionality since now the switch is no longer a qmake CONFIG flag but a simple command line flag. The code is available on Git master (https://github.com/corrados/jamulus/commit/7ca52e3adfcf555f82f3003252d25c21713c5102). You can now test the new multithreading functionality by using: ./Jamulus -s -n -T I hope that this simplifies the multithreading testing.

corrados commented 4 years ago

I tried out I think the Newark server a couple of days ago and I think it reacted very slow to lowering volumes and soloing etc. The changes took up to maybe ten seconds to apply to the sound.

Did you observe this issue when you were the only client on the server or when there was the multithreading test going on where about 50 clients were connected at that time?

38github commented 4 years ago

I tried it when there were at least 25 clients broadcasting (probably) pre-recorded material.

Aug 29, 2020, 13:51 by notifications@github.com:

I tried out I think the Newark server a couple of days ago and I think it reacted very slow to lowering volumes and soloing etc. The changes took up to maybe ten seconds to apply to the sound.

Did you observe this issue when you were the only client on the server or when there was the multithreading test going on where about 50 clients were connected at that time?

— You are receiving this because you commented. Reply to this email directly, > view it on GitHub https://github.com/corrados/jamulus/issues/455#issuecomment-683280047> , or > unsubscribe https://github.com/notifications/unsubscribe-auth/AELSLBL6LG6VD6YWW3JNIQ3SDDTT5ANCNFSM4OXQXIXQ> .

corrados commented 4 years ago

I tried it when there were at least 25 clients broadcasting (probably) pre-recorded material.

Ok, if you have a lot of clients connected, a lot of information must be exchanged with the protocol. If the server has a high ping time of, e.g., > 100 ms, that not only has influence on the audio but also on the speed of the protocol messages. All protocol messages must be acknowledged so that you get a delay of about 200 ms per protocol message. If you, e.g., need to send a gain update for 25 clients, that would take about 5 seconds. If you connect to a server close to you with, e.g., a ping of 15 ms, the same amount of protocol messages only require 750 ms so the response is much faster.

brynalf commented 4 years ago

Tested with server started with -T on 32 logical processor machine fastupdates disabled on server buffer delay 128 for all clients mono-in/stereo out, audio quality high for all clients ==> 657 kbps audio stream rate per client 100 clients connected (on the same LAN as the server ==> low ping times)

Results: Audio quality: Good and stable Client control responsivenes: Fast cpu load: no logical processor peak load above 60% Screenshot from 2020-08-29 16-59-30

brynalf commented 4 years ago

Edited global.h to increase number of clients above 100. Server broke at 104 clients. cpu load: no logical processor peak load above 65% Same config as my last post.

corrados commented 4 years ago

Thank you for your tests. I assume you have 6 physical CPU cores. One core for OPUS decoding and the socket thread makes 5 cores left for mixing and OPUS encoding. Each of these cores get 20 clients to process so we are at about 100 clients.

What you could do is to increase this number a bit: https://github.com/corrados/jamulus/blob/master/src/server.cpp#L1023

Maybe try to set iMTBlockSize to 22 or even 25. Maybe you can increase the maximum number of supported clients at the server further with that modification...

Edit: Oops, I thought your machine has 12 logical cores. But reading your initial message again showed that you have 32 logical cores... So my calculation is obviously incorrect. Anyway, maybe changing the number still makes a difference :-)

brynalf commented 4 years ago

Yes, 16 physical cpu cores I will try your suggestion tomorrow I hope.

brynalf commented 4 years ago

I have now played around wiith the iMTBlockSize in the range 10 to 25. The main conclusion is that it does not affect the upper limit for the number of users with ok audio quality. This (subjective) quality limit is constant at approximately 100 users when all clients have settings for 657 kbps and at 45 users when all clients have settings for 900 kbps.

storeilly commented 4 years ago

I'm sure the client count per thread can be optimised with code with the right metrics (outside my skill set) but Looking at the cpu graphs and our results the cpu never goes above 60% before the audio quality drops. Does this mean that something other than cpu count and power is capping client count......? Is there a pinch point in the process and what is the maths behind it please?

corrados commented 4 years ago

I have now played around wiith the iMTBlockSize in the range 10 to 25. The main conclusion is that it does not affect the upper limit for the number of users with ok audio quality.

That was not expected. Can you please set it to 1 and check if you still get the same number of clients?

100 users when all clients have settings for 444 kbps and at 45 users when all clients have settings for 900 kbps.

444 kbps = mono-in/stereo-out at 128 samples block size? 900 kbps = mono-in/stereo-out at 64 samples block size? Are these the settings you have used? If yes, you could also try the following:

Start the server with -T and also -F and check if you know can server more clients on 900 kbps rate. Since with -F the server also runs on 64 samples block size.

brynalf commented 4 years ago

The 900 kbps were mono-in/stereo-out at 64 samples block size on the clients and -F on the server. Above 45 users the audio quality is not ok. Above 50 users the server breaks (ping > 500 ms). The 657 kbps (corrected from my earlier statement of 444 kbps) was mono-in/stereo-out at 128 samples block size and high Audio Quality=high on the clients without -F on the server ==> 100 clients ok audio quality (Also, for the fun of it, tested the 657 kbps (mono-in/stereo-out at 128 samples block size on the clients) with -F on the server ==> 95 clients ok audio quality.) I will test the iMTBlockSize=1. That will be many threads:)

brynalf commented 4 years ago

Tested with iMTBlockSize=1 mono-in/stereo-out at 128 samples block size and AUdio Quality=high on the clients without -F on the server ==> 90 clients ok audio quality All logical processors get their share as can be seen by the attached image. DIfferent load distribution compared to corresponding plot in one of my earlier posts in this thread. There seems to be some other bottle neck than processor load. Additional observation: The iMTBlockSize=1 for the server resulted in much more responsive behavior when killing a group of 50 clients on a load computer. Screenshot from 2020-08-30 22-05-23

corrados commented 4 years ago

That's very interesting. So using a larger MTBlockSize gives us an improvement from about 90 clients to 100. That is less than I have hoped but at least we get some improvement. Have you tried to do the test without the -T? That would be the reference from where we started before the multithreading implementation. To my knowledge the highest number was about 35. Another interesting number would be to use mono for all clients with -T, without -F and 128 samples. Unfortunately, I do not have such a fast CPU as you have so I cannot run such tests myself.

WolfganP commented 4 years ago

Nice tests, interesting results @brynalf . Thx!

What I'm puzzled is that the sound breaks after certain number of clients connected, but it seems not to be a case of CPU boundaries/load now (as the performance seems to improve a lot with latest multi-threading improvements by @corrados -big thanks again-).

What's the most likely candidate for impacting performance now? Network I/O, packet processing at app/stack level, memory/stack handling, Qt queues, ???

I still think that some performance instrumentation would help a lot to focus on areas of improvement and later be able to validate effectiveness between strategies (as suggested a few days ago https://github.com/corrados/jamulus/issues/455#issuecomment-682015046). Thoughts?

pljones commented 4 years ago

It looks like the network traffic's running at 6.4 MB/s. For a 1Gb/s interface, that's pretty low -- i.e. max would be about 100MB/s. So for 6.4% of network capacity the CPU/core capacity is running around the 25-30% mark. Neither is "high"...

I have to agree with @WolfganP -- it's going to take more detailed metrics.

maallyn commented 4 years ago

Folks I tried the multi threading method outlined in Volker's suggestion.

I did a git clone of the latest code as of about 9 AM Pacific Time (US) on Monday, August 31. git clone https://github.com/corrados/jamulus.git

I then did the compile:

cd jamulus/
qmake "CONFIG+=nosound" Jamulus.pro
make clean
make

I then used the following jamulus.service file (note the -T in the command)

[Unit]
Description=Jamulus-Server
After=network.target
[Service]
Type=simple
User=jamulus
NoNewPrivileges=true
ProtectSystem=true
ProtectHome=true
Nice=-20
IOSchedulingClass=realtime
IOSchedulingPriority=0
# This line below is what you want to edit according to your preferences
ExecStart=/usr/local/bin/jamulus/Jamulus --server --nogui -T \
--log /var/log/jamulus/jamulus.log \
--centralserver jamulusallgenres.fischvolk.de:22224 \
--serverinfo "newark-music.allyn.com;Newark, NJ Linode;225" \
--welcomemessage "<h2>This is a Jamulus Server on the Linode cloud in Newark, New Jersey. This is for testing of multithreading. It is not yet stable code, but all are welcome. If you have any questions, please email (Mark Allyn) at allyn@well.com if you have any comments or questions. <p>In addition I am trying another experiment. I have set up a Jitsi Conference server at conference.allyn.com/jamulus-seattle for those of you who want to have video. Jitsi is an open source alternative to Zoom. It sometimes has better latency than Zoom. When you join the Jitsi conference, make sure that you stay muted. You only want video as you will be getting audio through Jamulus.<p>Please give me feedback at allyn@well.com on how you feel about the Jitsi.<p>Thank you.<p>I luv you all!</h2>"\
--numchannels 100
# end of section you might want to alter
Restart=on-failure
RestartSec=30
StandardOutput=journal
StandardError=inherit
SyslogIdentifier=jamulus
[Install]
WantedBy=multi-user.target

Once this is running, I then go to another machine (in the cloud); using vncserver, I run a script to attempt to kick off 80 clients:

for i in {1..20}
do
  sleep 2
  ./llcon-jamulus/Jamulus -n --connect 172.104.29.25 &
done

Note that the -n signals no gui.

After about 50 interations, I cannot add any more clients because I get the following error:

- connect on startup to address: 172.104.29.25
 *** Jamulus, Version 3.5.9
 *** Internet Jam Session Software
 *** Released under the GNU General Public License (GPL)
- no GUI mode chosen
- connect on startup to address: 172.104.29.25
 *** Jamulus, Version 3.5.9
 *** Internet Jam Session Software
 *** Released under the GNU General Public License (GPL)
- no GUI mode chosen
- connect on startup to address: 172.104.29.25
Network Error: Cannot bind the socket (maybe the software is already running).
- no GUI mode chosen
- connect on startup to address: 172.104.29.25
Network Error: Cannot bind the socket (maybe the software is already running).
- no GUI mode chosen
- connect on startup to address: 172.104.29.25
Network Error: Cannot bind the socket (maybe the software is already running).

This seems to consistantly happen around 50 connections; with or without sound.

You can get the tcpdump.out (at the time of the failure) at: allyn.com/tcpdump.out

Mark

corrados commented 4 years ago

After about 50 interations, I cannot add any more clients

This is because of the following line: https://github.com/corrados/jamulus/blob/master/src/socket.h#L49 #define NUM_SOCKET_PORTS_TO_TRY 50 If you increase this number to, e.g., 200, you should not get this error message anymore.

corrados commented 4 years ago

Please note that qmake "CONFIG+=nosound" Jamulus.pro is not meant to be used for the client. The "nosound" interface is not synchronized and produces much too much network traffic. Today I have implemented some blocking which does not work as expected (the buffer LEDs are still red) but the traffic is much lower and it comes closer to a real audio interface. See my commit: https://github.com/corrados/jamulus/commit/d9d4dfdbf8e97567414bfd287329881715268218

maallyn commented 4 years ago

Volker:

Thank you. I will try the server with the NUM_SOCKET_PORTS_TO_TRY up to 200.

Also, the CONFIG+=nosound was compiled on the server, not the client. I have made no changes to the client compile.

I am running the clients from another ubuntu server on the cloud using vncserver, so they all have the GUI. I am also pumping audio into each client using audacity connected to pulse and then pulse-src connected to each client. I have to manually set those up using the qjackctl connection panel. If you know how to script the connections, that would be a big help.

corrados commented 4 years ago

You can take a look at this script: https://github.com/corrados/jamulus/wiki/Tips,-Tricks-&-More#jamulus-client-linux-start-script There are examples for disconnection and connection, e.g.:

jack_disconnect system:capture_1 Jamulus:'input left'
jack_connect gx_head_fx:out_0 Jamulus:'input left'

maallyn commented 4 years ago

Thank you very much!!! I will try this later today.

brynalf commented 4 years ago

Here are the results from the first part of the requested baseline/reference test without multithreading on the same machine as my previous tests.

Tested with multithreading disabled (=not using -T) mono-in/stereo-out at 128 samples block size and audio quality high (i.e. 657 kbps) on the clients without -F on the server ==> 75 clients ok audio quality The audio quality was only breaking down when the peak load on the logical processor reached 100%. For 75 clients the peak load was below 98% also during connection of the clients and the audio quality very good and stable. Attached you find the processor load during 60 seconds for the 75 client case. Screenshot from 2020-09-01 09-49-21

brynalf commented 4 years ago

I discovered that I have made an error that affect the bandwidth calculations. I previously stated that my clients use 444 kbps, when they were in fact using 657 kbps(mono-in/stereo-out with Audio Quality set to High). I have edited my posts above to the correct statement. Sorry for any potential confusion. The main coclusions still stand though:)

brynalf commented 4 years ago

Here are the results from the second part of the requested baseline/reference test without multithreading on the same machine as my previous tests.

Tested with multithreading disabled (=not using -T) mono at 128 samples block size and audio quality normal (i.e. 366 kbps) on the clients without -F on the server ==> 101 clients ok audio quality The audio quality was only breaking down when the peak load on the logical processor reached 100%. For 101 clients the peak load was below 99% also during connection of the clients. The audio quality was a bit affected during the connection of the clients when reaching above approximately 80 clients, but ok when the clients had stabilized. Attached you find the processor load during 60 seconds for this 101 client case. Screenshot from 2020-09-01 10-58-54

brynalf commented 4 years ago

Multithreading test as comparison on the same machine as my previous tests.

multithreading enabled ( -T ) iMTBlockSize=20 (i.e. default value) mono at 128 samples block size and audio quality normal (i.e. 366 kbps) on the clients without -F on the server ==> 108 clients ok audio quality No logical processor reached above 40%. The audio quality was a bit affected during the connection of the clients when reaching above approximately 80 clients, and very affected while connecting clients above approximately 100, but ok when the clients had stabilized. At 108 the ping time seen in the client settings window swings between 0 and 6 ms compared to the normal 0 to 1 ms for this setup. Attached you find the processor load during 60 seconds for this 108 client case. Has been stable with ok Audio quality for 40 minutes.

Screenshot from 2020-09-01 11-39-54

corrados commented 4 years ago

Thank you brynalf for all your testing. Let's put your results in a table:

32 logical processor machine Jamulus server performance

-T	-F	B	mode	#
X	X	64	stereo	45
X	X	128	stereo	95
X		128	stereo	100
X		128	mono	108
		128	stereo	75
		128	mono	101

Conclusions

Even in single-threading mode if all clients use mono we can reach the 100 clients.
The multi-threading mode with 64 samples stereo has a very low number.
- Would be good to know why the number is so low, this is not expected.
Direct comparison between single-threaded and multi-threaded of most common case with 128 samples stereo gives about 30 % improvement.
Why is the multi-threaded with 128 samples and mono so bad? This shows that we hit another limit which is not the CPU...

maallyn commented 4 years ago

Folks:

I just discovered a new issue, but I don't think it's jamulus related, but it affects my ability to do large number of client testing. I have been doing my testing by launching large numbers of connections from a 2nd server on the cloud (a Volkr instance in Seattle) hitting the newark-music.allyn.com (a Linode instance in Newark).

Apparently the Jack sound service for Linux starts to break down when I have more than about 60 connections to is. I have it configured so that audacity is sending music to the pulse-jack sink and then I connect the pulse-jack source to each jamulus client. I have to manually do the routing for each of the clients.

What I noticed is that even before I do the routing, Jack breaks down at about 60 connections.

I did try the trick of creating thirty connections, stopping, then manually reconnecting those to the pulse-jack-source, confirmed music was getting through (watching the vu lights). After that, then I try to create the next batch of 30 connection. It varies, but it seems that around a total of 60 connections, Jamulus complains that the Jack is not running. Jamulus then tries to re-start Jack and fails.

I manually tried to re-start jack, but then it seems that once I do that, I lose all existing Jamulus clients and everything gets into a funny state and I have to reboot the machine.

I know that this is most likely NOT a Jamulus issue, but it does impact my ability to help you folks.

Is there anyone out there who can help me with the Jamulus / Jack relationship in handling large numbers of connections?

I am asking this here because I don't know if it would be appropriate to ask this type of question on the World Jam Discord Channel as that is World Jam production oriented.

brynalf commented 4 years ago

-T	-F	B	mode	#	bitrate/client kbps	bitrate tot Mbps
X	X	64	stereo	45	900	41
X	X	128	stereo	95	657	62
X		128	stereo	100	657	66
X		128	mono	108	366	40
		128	stereo	75	657	49
		128	mono	101	366	37

It is also clear that it is not the total audio bitrate that is the limiting factor.

pljones commented 4 years ago

The multi-threading mode with 64 samples stereo has a very low number. Would be good to know why the number is so low, this is not expected.

Indeed. More investigation around the small buffers / fast update code is needed. People have said to me it does seem to cause problems that switching it off fix. Although sometimes switching it on fixes problems. Potentially an old wives tale, of course.

But seeing something more concrete like this number does indicate it could be a real problem somewhere.

brynalf commented 4 years ago

For the sake of structured problem solving I also tested buffer=64 and -F without multithreading. For that case I hit the processor limit before hitting our mystery limit (see last row in the table below)

T	-F	B	mode	audio quality setting	#	bitrate/client kbps	bitrate tot Mbps	processor limited
X	X	64	stereo	High	45	900	41	N
X	X	128	stereo	High	95	657	62	N
X		128	stereo	High	100	657	66	N
X		128	mono	Normal	108	366	40	N
		128	stereo	High	75	657	49	Y
		128	mono	Normal	101	366	37	Y
	X	64	stereo	High	44	900	40	Y

brynalf commented 4 years ago

Additional observation. When the server breaks (ping>500 ms), for the non-processor limited cases, the processor load goes down. Audio is still processed but is returned distorted.

jamulussoftware / jamulus

Server performance & optimization #455

Here is my also startup

alias and-card-0 snd-aloop options snd-aloop index=0 pcm_substreams=8

Here is my modules.conf file

snd-aloop

/bin/bash

for i in {1..20} do ./llcon-jamulus/Jamulus --connect 172.104.29.25 & done

32 logical processor machine Jamulus server performance

Conclusions