WolfganP commented 4 years ago

Follows from https://github.com/corrados/jamulus/issues/339#issuecomment-657076545 for better focus of the discussion.

So, as the previous issue started to explore multi-threading on the server for better use of resources, I first run a profiling of the app on debian.

Special build with: qmake "CONFIG+=nosound headless noupcasename debug" "QMAKE_CXXFLAGS+=-pg" "QMAKE_LFLAGS+=-pg" -config debug Jamulus.pro && make clean && make -j

Then run as below, and connecting a couple of clients for a few seconds: ./jamulus --nogui --server --fastupdate

Once disconnecting the clients I gracefully killed the server pkill -sigterm jamulus

And finally run gprof, with the results posted below: gprof ./jamulus > gprof.txt

https://gist.github.com/WolfganP/46094fd993906321f1336494f8a5faed

It would be interesting to see those who observed high cpu usage run test sessions and collect profiling information as well to detect bottlenecks and potential code optimizations, before embarking on multi-threading analysis that may require major rewrites.

dingodoppelt commented 4 years ago

Hi, I don't know if this is of much help since I don't know what I'm doing here ;) but I generated the data as instructed on my private cloud server.

https://gist.github.com/dingodoppelt/802c40b1cb13c75d96f38b9604fa22df

cheers, nils

WolfganP commented 4 years ago

Thanks @dingodoppelt Could you please describe the test session/environment? (ie, how many clients connected, which hardware/Operating System you were running the server on, whatever you feel noticeable)

WolfganP commented 4 years ago

@sthenos you mentioned in https://www.facebook.com/groups/507047599870191/?post_id=564455474129403&comment_id=564816464093304 that you're running the server in linux now. Would it be possible for you to run a profiling session in any of the casual/preparation jam sessions with multiple clients to measure REAL server stress? (obviously not during the WJN main event :-)

dingodoppelt commented 4 years ago

I tested with 12 clients connected from my machine with small network buffers enabled on 64 samples buffersize. the server is a cloud hosted kvm virtual rootserver (1fire.hosting) with 2 vcpus, 1 gig of ram on ubuntu 20.04 lowlatency. If i get the opportunity I'll repeat the test on my other server in a real life scenario. cheers, nils

storeilly commented 4 years ago

Are you still interested in these data? I can run a few tests on ubuntu over the weekend. Is there a particular release or tag we should checkout as I tried last week to compile but it froze my server.

pljones commented 4 years ago

One quick comment @WolfganP -- for some reason you build command line (rather than a simple qmake) causes the TARGET = jamulus line to trigger... I don't understand why!

EDIT Dawn strikes... Yes, it does have noupcase in the CONFIG+=... I just couldn't see it...

So if you're on anything but Windows, you'll probably want to mv it back again, otherwise start up scripts, etc, won't work.

Final edit to note: jamulus.drealm.info is running with profiling. I'll leave it up over the weekend so it should amass a fair amount of data. I'll run the gprof on Monday. Obviously a bit more "real world", as I run with logging and recording enabled, so I'm expecting different number...

pljones commented 4 years ago

A different view should come from the Rock and Clasical/Folk/Choir genre servers that I've just updated to r3_5_9 with profiling.

make distclean
qmake "CONFIG+=nosound headless debug" "QMAKE_CXXFLAGS+=-pg" "QMAKE_LFLAGS+=-pg" -config debug Jamulus.pro
make -j
make clean

They probably won't show much OPUS usage but this should show anything that's "weird" with server list server behaviour (although they only have about 20 registering servers, until Default).

I wasn't sure what CONFIG+=debug and -config debug added -- the code appeared to have symbols regardless.

WolfganP commented 4 years ago

@pljones yes, I added debug flags to qmake just to make sure all symbols are included and no stripping is applied. Anyways, a good way to check if symbols were included in the final exec and gprof instrumentation applied is objdump --syms jamulus | grep -i mcount (mcount* being the snippets of code added for profiling instrumentation)

pljones commented 4 years ago

Standard build:

peter@fs-peter:~$ objdump --syms git/Jamulus-wip/Jamulus | grep -i mcount
0000000000000000       F *UND*  0000000000000000              mcount@@GLIBC_2.2.5

This just changing the binary to "Jamulus", IIRC:

peter@fs-peter:~$ objdump --syms git/Jamulus/Jamulus | grep -i mcount
0000000000000000       F *UND*  0000000000000000              mcount@@GLIBC_2.2.5

This was make distclean; qmake "CONFIG+=nosound headless debug" "QMAKE_CXXFLAGS+=-pg" "QMAKE_LFLAGS+=-pg" -config debug Jamulus.pro; make -j; make clean

peter@fs-peter:~$ objdump --syms git/Jamulus-stable/Jamulus | grep -i mcount
0000000000000000       F *UND*  0000000000000000              mcount@@GLIBC_2.2.5

Had a few people tonight noticing additional jitter. Not everyone... Those who noticed - myself included - had just upgraded to 3.5.9. No idea why... (I "fixed" it for the evening by upping my buffer size from 64 to 128.)

14 clients connected to the server and it's looking like this in top:

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
32704 Jamulus  -21   0  320.3m  28.6m   9.1m S 10.8  0.2  80:09.58 Jamulus-server
 1066 root     -51   0    0.0m   0.0m   0.0m S  0.2  0.0  10:15.69 irq/130-enp2s0-
 1070 root     -51   0    0.0m   0.0m   0.0m S  0.1  0.0   3:27.74 irq/132-enp2s0-
 1071 root     -51   0    0.0m   0.0m   0.0m S  0.1  0.0   4:00.96 irq/133-enp2s0-

Mmm, I guess those enp2s0 IRQ handlers are a bit busy as that's the network interface... There are actually five, it seems:

peter@fs-peter:~$ ps axlww -L | grep enp2s0
1     0  1063     2  1063 -51   -      0     0 -      S    ?          0:00 [irq/129-enp2s0]
1     0  1066     2  1066 -51   -      0     0 -      S    ?         10:06 [irq/130-enp2s0-]
1     0  1068     2  1068 -51   -      0     0 -      S    ?          0:01 [irq/131-enp2s0-]
1     0  1070     2  1070 -51   -      0     0 -      S    ?          3:22 [irq/132-enp2s0-]
1     0  1071     2  1071 -51   -      0     0 -      S    ?          3:56 [irq/133-enp2s0-]

but it copes with only three in heavy demand. 129 and 131 seem left out.

Let's see what the gprof looks like in the morning :).

pljones commented 4 years ago

OK, I decided to restart the central servers without profiling before I totally forget, so all the number are now in. Jamulus-Central1 gprof.out Jamulus-Central2 gprof.out jamulus.drealm.info gprof.out

WolfganP commented 4 years ago

OK, I decided to restart the central servers without profiling before I totally forget, so all the number are now in. Jamulus-Central1 gprof.out Jamulus-Central2 gprof.out jamulus.drealm.info gprof.out

Thx for the info @pljones, good to also have some performance info for the Central Server role.

Regarding the info on the audio server role, it seems it confirms the CPU impact of CServer::ProcessData and some Opus routines (I assume that as a result of the mix processing inside CServer::OnTimer), and that make sense (at least to me). On the Opus codec front, I found this article worthy to read: https://freac.org/developer-blog-mainmenu-9/14-freac/257-introducing-superfast-conversions/ (code at https://github.com/enzo1982/superfast#superfast-codecs)

Another item I think needs some attention (or verify is already optimized) is the buffering of audio blocks to avoid unnecessary memcopies. But still reading the code :-)

WolfganP commented 4 years ago

Are you still interested in these data? I can run a few tests on ubuntu over the weekend. Is there a particular release or tag we should checkout as I tried last week to compile but it froze my server.

Of course @storeilly the more information the better to compare performance on diff use cases and verify common patterns of CPU usage, and direct optimization efforts.

storeilly commented 4 years ago

Here is a short test on GCP n1-standard-2 (2 vCPUs, 7.5 GB memory) ubuntu 18.04. A single user connected for a few seconds. I'm running the server overnight with two instances, one on a private central and another on jamulusclassical.fischvolk.de:22524 for more data

gprof.txt

storeilly commented 4 years ago

jamprof01.txt jamprof02.txt

Overnights with 1 or 2 connections... Choir meeting later so will run again after that

WolfganP commented 4 years ago

Thanks @storeilly for the files, but those last 2 indicate a period of app usage extremely short, it doesn't even register significant stats to evaluate (even the cumulative times are in 0.00).

storeilly commented 4 years ago

jamprof03.txt

Oh sorry about that, maybe because I had them running as a service. I saw the message just before the choir meeting so ran this up as a live instance. We only had 8 connections for about 90 mins so I hope it is of some use.

WolfganP commented 4 years ago

Thanks @storeilly, that latest file is more representative of a live session and similar to the others posted previously. Thx a lot for sharing.

corrados commented 4 years ago

For your info: I will change the ProcessData function now to avoid some of the Double2Short calls and have a better clipping behavior.

WolfganP commented 4 years ago

For your info: I will change the ProcessData function now to avoid some of the Double2Short calls and have a better clipping behavior.

Excellent @corrados , we can keep running profiling sessions here and there and measure the improvements. CMovingAv (in src/util.h) is another function that is called very frequently according to the stats, do you mind to check if no unnecessary typing switch in there is performed as well?

Another thing that I couldn't still pay sufficient attention is the management of buffers of audio blocks, to make sure unnecessary memcopies are avoided. Do you recall how is it implemented?

corrados commented 4 years ago

I will do further investigations when I return from my vacation.

storeilly commented 4 years ago

Is there a possibility somebody can build a Windows exe with profiling config? A friend is trying a multi server load test on Windows tomorrow evening?

pljones commented 4 years ago

Is there a possibility somebody can build a Windows exe with profiling config? A friend is trying a multi server load test on Windows tomorrow evening?

https://docs.microsoft.com/en-us/visualstudio/profiling/running-profiling-tools-with-or-without-the-debugger?view=vs-2017 Virtually all the docs refer to working in Visual Studio using graphical tools rather than simple tools like gprof. This was about the closest I could find...

The Windows build doesn't seem to like "CONFIG+=nosound headless"

...src\../windows/sound.h(28): fatal error C1083: Cannot open include file: 'QMessageBox': No such file or directory

Leaving headless out lets the build run. (Though nosound should prevent ...src\../windows/sound.h being used, surely?)

So, Qt Creator under Windows has an "Analyze -> Performance Analyzer" tool. First thing, it kicks off the compile... Before it tries to run, it says Hm. Surely it can check what OS it's running on (i.e. it's Windows!) and disable the menu item entirely? No... So hit OK. It then fails: Yes, quite. And then, just to make sure you know it didn't work:

dingodoppelt commented 4 years ago

https://gist.github.com/dingodoppelt/9fecd468be2176dacd6d6d3ae3d1d078

here is another one. a public server (that hasn't seen too much use) on a 4 core cpu. It ran on the most recent code including the reduction of doubletoshort calls, if that is of any interest here.

corrados commented 4 years ago

Thanks dingodoppelt. In your profile log the ProcessData() function is much lower in the list compared to the profile given by storeilly in jamprof03.txt. So maybe the Double2Short optimization may already have given us a faster Jamulus server.

pljones commented 4 years ago

I've been dabbling with getting my service units to run at real time priority. The 2013 documentation linked from one of the guides is actually out of date. There's no need to fuss with changing cgroups from within the service unit - the latest kernels are quite happy dealing with individual slices.

[Service]
CPUSchedulingPolicy=rr
CPUSchedulingPriority=99
IOSchedulingClass=realtime
IOSchedulingPriority=3

Having said that, I noticed something when checking the status with ps:

peter@fs-peter:~/git/Jamulus-wip$ sudo chrt -r 99 sudo -u peter ./Jamulus -s -n -p 55850 -R /tmp/recording; rm -rf /tmp/recording/  - server mode chosen
- no GUI mode chosen
- selected port number: 55850
- recording directory name: /tmp/recording
Recording state enabled
 *** Jamulus, Version 3.5.10git
 *** Internet Jam Session Software
 *** Released under the GNU General Public License (GPL)

So that starts up the server real time quite happily:

peter@fs-peter:~/git/Jamulus-wip$ ps axwwH -eo user,pid,tid,spid,class,pri,comm,args | sort +5n | grep 'PID\|Jamulus' | grep 'PID\|^peter'
USER       PID   TID  SPID CLS PRI COMMAND         COMMAND
peter    11320 11322 11322 TS   19 Jamulus::CSocke ./Jamulus -s -n -p 55850 -R /tmp/recording
peter    11320 11320 11320 RR  139 Jamulus         ./Jamulus -s -n -p 55850 -R /tmp/recording
peter    11320 11321 11321 RR  139 Jamulus::JamRec ./Jamulus -s -n -p 55850 -R /tmp/recording

What I couldn't follow in the thread flow of control was why CSocket loses real time and yet JamRecorder retains it.

I'm looking to drop the priority of the jam recorder, really - and I'd have thought the socket handling code wanted to retain it?

Here's the patch that names the CSocket thread:

peter@fs-peter:~/git/Jamulus-wip$ git diff
diff --git a/src/server.cpp b/src/server.cpp
index fe9b50a8..ed1bae35 100755
--- a/src/server.cpp
+++ b/src/server.cpp
@@ -58,6 +58,8 @@ CHighPrecisionTimer::CHighPrecisionTimer ( const bool bNewUseDoubleSystemFrameSi
     veciTimeOutIntervals[1] = 1;
     veciTimeOutIntervals[2] = 0;

+    setObjectName ( "Jamulus::CHighPrecisionTimer" );
+
     // connect timer timeout signal
     QObject::connect ( &Timer, &QTimer::timeout,
         this, &CHighPrecisionTimer::OnTimer );
diff --git a/src/socket.h b/src/socket.h
index adc9c67f..b4ec15e9 100755
--- a/src/socket.h
+++ b/src/socket.h
@@ -169,7 +169,9 @@ protected:
     {
     public:
         CSocketThread ( CSocket* pNewSocket = nullptr, QObject* parent = nullptr ) :
-          QThread ( parent ), pSocket ( pNewSocket ), bRun ( true ) {}
+          QThread ( parent ), pSocket ( pNewSocket ), bRun ( true ) {
+        setObjectName ( "Jamulus::CSocketThread" );
+    }

         void Stop()
         {

(I don't know what CHighPrecisionTimer is used for but I didn't see it get a thread - maybe it's for short-lived stuff?)

WolfganP commented 4 years ago

The way I understood the server code, CHighPrecisionTimer is the base for the "realtime" processing of client audio mixes at ProcessData (one of the functions topping the performance charts consistently) via the OnTimer interrupt occurring every 1ms (https://github.com/corrados/jamulus/blob/f67dbd1290a579466ff1f315457ad9090b39747e/src/server.cpp#L792)

That async processing of data via the timer, was probably why the early parallelization test didn't work as intended.

pljones commented 4 years ago

I've also noticed the following:

USER       PID   TID  SPID CLS PRI COMMAND         COMMAND
peter    18959 18962 18962 TS   19 Jamulus::CSocke ./Jamulus -s -p 55850 -R /tmp/recording
peter    18959 18959 18959 RR  139 Jamulus         ./Jamulus -s -p 55850 -R /tmp/recording
peter    18959 18960 18960 RR  139 QXcbEventReader ./Jamulus -s -p 55850 -R /tmp/recording
peter    18959 18961 18961 RR  139 Jamulus::JamRec ./Jamulus -s -p 55850 -R /tmp/recording
peter    18959 18963 18963 RR  139 QDBusConnection ./Jamulus -s -p 55850 -R /tmp/recording

When kicking off the server with the GUI enabled at RT priority, the GUI runs at RT priority, too! That's definitely not wanted.

EDIT:

USER       PID   TID  SPID CLS PRI COMMAND         COMMAND
peter    19817 19820 19820 TS   19 Jamulus::CSocke ./Jamulus -s -p 55850 -R /tmp/recording
peter    19817 19883 19883 TS   19 Jamulus::CHighP ./Jamulus -s -p 55850 -R /tmp/recording
peter    19817 19817 19817 RR  139 Jamulus         ./Jamulus -s -p 55850 -R /tmp/recording
peter    19817 19818 19818 RR  139 QXcbEventReader ./Jamulus -s -p 55850 -R /tmp/recording
peter    19817 19819 19819 RR  139 Jamulus::JamRec ./Jamulus -s -p 55850 -R /tmp/recording
peter    19817 19821 19821 RR  139 QDBusConnection ./Jamulus -s -p 55850 -R /tmp/recording

And with a user connected to start up CHighPrecisionTimer. That also drops out of RT.

pljones commented 4 years ago

Ah yes: CHighPrecisionTimer -> OnTimer. I think I did work that out once.

My "instinct" is that:

RR/FF Real time -> CHighPrecisionTimer (for a steady OnTimer)
RR/FF RT (or very high priority) -> CServer (to process OnTimer), CSocketThread (to process CServer requests)
TS 19 ("normal") -> JamRecorder, GUI (QXcbEventReader, QDBusConnection)

I've still not understood why CSocketThread drops RT and yet I can't explain to Qt that I want JamRecorder to drop RT...

pljones commented 4 years ago

486 relates to the client configured with "nosound" and vaguely relates to performance, so I'll mention it here.

corrados commented 4 years ago

I've still not understood why CSocketThread drops RT

see: https://github.com/corrados/jamulus/blob/master/src/socket.h#L153 and https://github.com/corrados/jamulus/blob/master/src/socket.h#L209

At these places the priority is defined.

pljones commented 4 years ago

Yes - but I don't see why QThread::TimeCriticalPriority means the thread gets TS 19 rather than something actually high. TS 19 is what a login shell or any other "normal" process is scheduled at.

USER       PID   TID  SPID CLS PRI COMMAND         COMMAND
Jamulus   2235  2251  2251 TS   19 QThread         /opt/Jamulus/bin/Jamulus-server -s -n -F -p 54850 -u 30 -e jamulus.fischvolk.de:22224 -m /opt/Jamulus/run/status.html -l /opt/Jamulus/log/Jamulus.log -y /opt/Jamulus/log/history.svg -L -R /opt/Jamulus/run/recording -a jamulus.drealm.info -o jamulus.drealm.info;London;224 -w ...
peter     4053  4053  4053 TS   19 grep            grep --color=auto PID\|bin/Jamulus
Jamulus   2235  2250  2250 RR  139 Jamulus::JamRec /opt/Jamulus/bin/Jamulus-server -s -n -F -p 54850 -u 30 -e jamulus.fischvolk.de:22224 -m /opt/Jamulus/run/status.html -l /opt/Jamulus/log/Jamulus.log -y /opt/Jamulus/log/history.svg -L -R /opt/Jamulus/run/recording -a jamulus.drealm.info -o jamulus.drealm.info;London;224 -w ...

2235 is the main server thread
2251 is the QSocketThread (unpatched Jamulus) and
4053 is a normal user process

Also, if I patch the Jam recorder:

pthJamRecorder->start ( QThread::NormalPriority );

it doesn't drop (but the network thread does - so it's not because of how I'm running the server):

USER       PID   TID  SPID CLS PRI COMMAND         COMMAND
peter     3856  3858  3858 TS   19 Jamulus::CSocke ./Jamulus -s -n -p 55850 -R /tmp/recording
peter     3952  3952  3952 TS   19 grep            grep --color=auto PID\|\./Jamulus
peter     3953  3953  3953 TS   19 grep            grep --color=auto PID\|^peter
peter     3856  3856  3856 RR  139 Jamulus         ./Jamulus -s -n -p 55850 -R /tmp/recording
peter     3856  3857  3857 RR  139 Jamulus::JamRec ./Jamulus -s -n -p 55850 -R /tmp/recording

It's as if passing the priority isn't having the documented effect and something else entirely is happening.

pljones commented 4 years ago

Well... this is ... peculiar. I've got the JamRecorder priority down.

peter@fs-peter:~$ ps axwwH -eo user,pid,spid,class,pri,comm,args | sort +4n | grep 55850
peter    21623 21624 TS   19 JamRecorder     /opt/Jamulus/bin/Jamulus-wip -s -n -F -p 55850 -u 4 -L -R /tmp/recording -a drealm.info Test Server
peter    21623 21625 TS   19 CSocketThread   /opt/Jamulus/bin/Jamulus-wip -s -n -F -p 55850 -u 4 -L -R /tmp/recording -a drealm.info Test Server
peter    21628 21628 TS   19 grep            grep --color=auto 55850
peter    21623 21623 RR  139 Jamulus-wip     /opt/Jamulus/bin/Jamulus-wip -s -n -F -p 55850 -u 4 -L -R /tmp/recording -a drealm.info Test Server

That was done by moving the new instance to the new thread, connecting the signals/slots and calling pthJamRecorder->start ( QThread::NormalPriority ); - in that order.

Moving to the new thread after connecting up and the priority setting was ignored.

However, using pthJamRecorder->start ( QThread::TimeCriticalPriority ); rather than NormalPriority made no difference -- still get TS 19 rather than RR 139.

Not specifying a priority in the start call gets the RR 139 (i.e. putting everything back how I'd had it).

So it definitely looks like passing the priority is behaving ... strangely ...

corrados commented 4 years ago

I need your help. I have done some multithreading implementation for the server using QtConcurrent. I need to see if this is the right path. So if you have a multi-core CPU and would like to do some evaluation for me, please do the following:

git clone https://github.com/corrados/jamulus.git cd jamulus ~~git checkout --track origin/qtconcurrentrun_test~~ qmake "CONFIG+=nosound headless multithreading" make

Please make sure that you see "Project MESSAGE: Multithreading in the server is enabled."

edit: the code is now merged to the master branch, no need to checkout a special branch anymore

WolfganP commented 4 years ago

@corrados I was looking at QtConcurrent examples as well (https://www.bogotobogo.com/Qt/Qt5_QtConcurrent_RunFunction_QThread.php) lately, after looking at how QTimer was implemented (https://doc.qt.io/qt-5/qtimer.html) Basically is a high priority events queue inside the Qt framework, not a hard interrupt timer as the low level ones used in basic/close to the metal apps.

Interesting thou, the official doc says: In multithreaded applications, you can use QTimer in any thread that has an event loop. To start an event loop from a non-GUI thread, use QThread::exec(). Qt uses the timer's thread affinity to determine which thread will emit the timeout() signal. Because of this, you must start and stop the timer in its thread; it is not possible to start a timer from another thread. so that could probably affect the way concurrent Qthreads are created as they probably have to own it's own timer as per the docs?

corrados commented 4 years ago

Good question. But now we have a first implementation and we should use that as a benchmark. I would be very interesting how this new code behaves if a lot of clients are connected and a multi-core CPU is used.

WolfganP commented 4 years ago

Good question. But now we have a first implementation and we should use that as a benchmark. I would be very interesting how this new code behaves if a lot of clients are connected and a multi-core CPU is used.

Perfect! I'll try the new branch separated from the master.

BTW, for stress testing and reproducible benchmarking purposes, is there any quick way you can modify the client code to connect as multiple clients and inject multiple streams? (even if it drops the audio mix back, fake a sine-wave tone or inject an audio file w/o connecting to an actual interface, ...) Anything that we can quickly simulate 2, 4, 12, 36, ... whatever clients actively connected and stress the server easy and in a controlled way...

dingodoppelt commented 4 years ago

one more: same server as last time, this time with multithreading enabled. all clients were on small network buffers. i'll give it another try on 128 samples and with more clients

https://gist.github.com/dingodoppelt/4f81a6dcaca85b119160f41ad0fe370b

corrados commented 4 years ago

Thanks Nils. The problem with the profile is that we cannot see how many CPU cores are actually used and how good the multithreading works. I guess we need some real-world tests where we have multiple musicians connected and check if the server still works correctly (no audio issues, stability) and monitor the CPU load with, e.g., htop to see if the load is equally distributed over all CPU cores.

WolfganP commented 4 years ago

I guess we need some real-world tests where we have multiple musicians connected and check if the server still works correctly (no audio issues, stability) and monitor the CPU load with, e.g., htop to see if the load is equally distributed over all CPU cores.

That's why I suggested in my previous comment some quick and dirty mod to the client where the load can be simulated in lab conditions, so all measurements can be comparable (ie 1, 2, 12, 50 active connected clients; server in mono or multi-thread, different strategies/branches/optimizations). It doesn't need to be pretty :-)

dingodoppelt commented 4 years ago

May this be of any help?

stressedJamServer

corrados commented 4 years ago

That's interesting. You only have three Jamulus threads but since your CPU has four cores you should have at least four Jamulus threads. Can you please make sure that you are running the correct Jamulus binary? It seems the "multithreading" is not enabled in that binary you are running.

corrados commented 4 years ago

For your information: I did some tests today and also cleanup up the code so that it is now ready to be merged to master. So I will merge the code and delete the branch. Then if you want to test that new code, simply clone the latest git code and use the CONFIG multithreaded switch.

pljones commented 4 years ago

Tagging #491 which adds thread names. And drops the priority of the JamRecorder (which was an issue if you're running the main server thread at realtime).

WolfganP commented 4 years ago

May this be of any help?

Good test @dingodoppelt . Do you mind to describe how did you get the 30 clients test? (were they simulated or real clients? were they running in windows, linux; GUI or headless?))

dingodoppelt commented 4 years ago

Good test @dingodoppelt . Do you mind to describe how did you get the 30 clients test? (were they simulated or real clients? were they running in windows, linux; GUI or headless?))

I launched them on Linux via bash script in headless mode.

That's interesting. You only have three Jamulus threads but since your CPU has four cores you should have at least four Jamulus threads. Can you please make sure that you are running the correct Jamulus binary? It seems the "multithreading" is not enabled in that binary you are running.

You're right, I misspelled "multihreading" in the qmake line :) I enabled it now and it really keeps the CPU usage down and nicely balanced. What I found still is that audio quality degrades at a certain amount of clients depending on their usage of small network buffers though the CPU cores aren't maxed out. But I don't know how much I can trust my tests since I'm running all clients on one machine and internet connection.

pljones commented 4 years ago

@dingodoppelt Are you running the server SCHED_RR/SCHED_FIFO or SCHED_OTHER and what priority/nice values?

dingodoppelt commented 4 years ago

@pljones : [Unit] Description=Jamulus-Server After=network.target

[Service] Type=simple User=jamulus Group=nogroup NoNewPrivileges=true ProtectSystem=true ProtectHome=true Nice=-20 IOSchedulingClass=realtime IOSchedulingPriority=0 CPUSchedulingPolicy=rr CPUSchedulingPriority=99

pljones commented 4 years ago

And if you run

ps axwwH -eo user,pid,spid,class,pri,comm,args | sort +4n | grep '^jamulus\|^USER'

What's the output showing now?

dingodoppelt commented 4 years ago

@pljones

USER PID SPID CLS PRI COMMAND COMMAND jamulus 1509 1517 TS 39 QThread /usr/local/bin/Jamulus -s -F -n -u 20 -w <h1 style=text-align:center>FetteHupeBackstage</h1>Willkommen auf dem privaten Jamulus Server der Big Band Fette Hupe aus Hannoverhttps://fettehupe.de -o FetteHupe;Frankfurt;82 jamulus 1509 1509 RR 139 Jamulus /usr/local/bin/Jamulus -s -F -n -u 20 -w <h1 style=text-align:center>FetteHupeBackstage</h1>Willkommen auf dem privaten Jamulus Server der Big Band Fette Hupe aus Hannoverhttps://fettehupe.de -o FetteHupe;Frankfurt;82

dingodoppelt commented 4 years ago

You've lost the formatting and at least one line of output there (PID 1509 / SPID 1517 - so there should be a line for SPID 1509 itself).

this is the exact output of the command you sent. how do I keep the formatting?

jamulussoftware / jamulus

Server performance & optimization #455

486 relates to the client configured with "nosound" and vaguely relates to performance, so I'll mention it here.