Support large ensembles (> 100 connected clients)

jp8 commented 4 years ago

I would like to open a discussion about improving the Jamulus user experience for large ensembles. My understanding is that the current Jamulus server will use only a single CPU core, and that it generates a personal mix for each connected client.

One potential solution could be a server mode in which a single mix is generated, then potentially the server would have less work to do, and could therefore handle more connected clients. I image the client who occupies the first space on the server would be in control of the mix for all participants.

A second potential solution would be the ability for a server (with mixer controls on the server UI) to also act as a client to another server. In this case all the violins could join server A, all the cellos could join server B, and servers A and B could join server Z. The conductor would connect his client to server Z and have a mixer control for each section. In this solution, larger ensembles would simply require more servers. Delay would be mitigated by having multiple servers at the same hosting centre, even indeed on the same multi-core VM, so the ping time among all the servers is 0.

A third potential solution would be to have the server use multiple threads to generate mixes in parallel.

I would appreciate hearing what people think of these approaches, and I would like to hear about any other approaches that people can think of.

corrados commented 4 years ago

How many clients do you have to support? Have you tried it out already?

jp8 commented 4 years ago

I would like to support 120.

And I am trying to think not just about the server capacity, but the client user interface also.

So far the largest group I've been in is 30. The server had 8 CPUs but my impression is that one of them was at 100% and the others were mostly idle. The user interface would have been improved if it was a grid instead of a single line.

corrados commented 4 years ago

A third potential solution would be to have the server use multiple threads to generate mixes in parallel.

This is the only meaningful solution. The OPUS encoding requires the most processing time. This could easily be calculated on different CPU cores.

I would like to support 120.

The biggest challenge in that case is support all 120 musicians to setup their Jamulus clients correctly. I guess most of them will use ASIO4All together with the laptop built-in sound card. This will give them bad latencies and they will not have much fun playing together.

And I am trying to think not just about the server capacity, but the client user interface also.

Are you sure every of the 120 musicians will take the time to adjust all 120 faders? I don't think so. If you have so many musicians, each of them will have to adjust their input volume so that they all have about equal level. Then there is no need to touch any fader.

corrados commented 4 years ago

I would like to support 120.

If you make it, you should apply for here: https://www.guinnessworldrecords.com/business-marketing-solutions/record-event-formats/online-records ;-)

jp8 commented 4 years ago

Are you sure every of the 120 musicians will take the time to adjust all 120 faders? I don't think so. If you have so many musicians, each of them will have to adjust their input volume so that they all have about equal level. Then there is no need to touch any fader.

Indeed one thing i liked about the 'multiple server' approach is that musicians would only see faders for the others in their section, a section leader would control the section's mix that got sent upstream to the conductor's sever, and the conductor would see only one fader per section. Nobody would ever need to deal with 120 faders on a single screen.

corrados commented 4 years ago

multiple server' approach

That is no good solution in my opinion. You'll add additional latency and you will also have problems with the synchronization. You need a single point where all the audio streams are mixed together, otherwise you would have to compensate for the different delays. One single server is the way to go here.

the conductor would see only one fader per section

There exists already a similar feature request: https://github.com/corrados/jamulus/issues/202

jp8 commented 4 years ago

That is no good solution in my opinion. You'll add additional latency and you will also have problems with the synchronization. You need a single point where all the audio streams are mixed together, otherwise you would have to compensate for the different delays. One single server is the way to go here.

What if the up- and down- stream between servers was uncompressed audio and didn't go through Opus? Gigabit ethernet in the hosting centre should be able to handle that, I think.

corrados commented 4 years ago

didn't go through Opus?

I am sure with using the OMP and with 8 CPU cores in your server you will be able to serve 120 clients.

jp8 commented 4 years ago

Yes you are right about the multi-processing on the server. It leaves me wondering if there is a way to eliminate the faders at the client (maybe show just a grid of 120 names) with a way for a single sound engineer with a high-resolution screen to adjust a single mix that is heard by everyone.

corrados commented 4 years ago

adjust a single mix that is heard by everyone.

No, that is not possible. Each client has it's own mix. And I think that is useful since in a real orchestra you hear the instruments which are close to you much louder than the others which are far away from you. In your personal mix you can configure the same.

sthenos commented 4 years ago

Just for your information during the world Jam on Saturday I bought a dedicated Google VM with 8 cores and 16GB RAM plus 200GB SSD. We were able to reach about 35 people on the room all jamming together, server was running at 110% cpu utilisation across all 8 cores, but the server didn't crash as it had done on the 4 core version I was running the previous week at 28 people.

corrados commented 4 years ago

That is very interesting. The main Jamulus processing routine is a single CPU core implementation. There is no multi-threading implemented yet. Now the question is why you are seeing 8 cores busy when the Jamulus server runs. Maybe the Google servers do something smart with the running applications? Maybe they split the work themselfs somehow and distribute it on all available processors. But if this is the case, how do they do it?

Also it could be that simply the CPU monitoring tool shows incorrect data...

Anyway, having 35 clients connected to the Jamulus server is very impressive :-).

corrados commented 4 years ago

Another input: This user reports with 35 connected clients a CPU usage of only 17%: https://sourceforge.net/p/llcon/discussion/musicianslounge/thread/4702d9fae1/#86b5

corrados commented 4 years ago

The user interface would have been improved if it was a grid instead of a single line.

What about a "slim design" of the faders: grafik

JimMooy commented 4 years ago

So here is my screen shot with 35 connected:

84354645-14b33180-ab76-11ea-8608-2c7040214209

The audio sounded good for the local area clients. Only 10 to 15 people were actually making sounds. I am trying to get 50 clients for next week's test of a Linux server on a 20 up 100 down business modem.

PS: You are a fast reader Corrados. :)

corrados commented 4 years ago

I did not know that you have a Github account and read the Issues. Thanks for your screen shot. I have modified it by making all the IP addresses invisible (for privacy reasons).

I am trying to get 50 clients for next week's test of a Linux server on a 20 up 100 down business modem

Please report here when you have done this test.

JimMooy commented 4 years ago

Oh thank you! That was a NEWBIE in action. Much appreciated. I'll let you know how it goes. Looking to get all windows users off of ASIO4all. Nothing but problems with ASIO4all for the novice users. I conduct a 70 piece community college orchestra and a 25 piece big band. We (and every other music educator in the world) are wondering how to rehearse our groups when school begins again in August. Thank you for your hard work.

jp8 commented 4 years ago

I suppose this is a 8-core cpu. the 13% usage for Jamulus represents 100% of a single core. On a linux server, under a similar load, 'top' would report 100% usage (out of a total possible 800%).

jp8 commented 4 years ago

The user interface would have been improved if it was a grid instead of a single line.

What about a "slim design" of the faders:

It's good but what about this:

for each instrument icon, a a column (a vertical list) of names, with a single green light (just to the left of the name) if the person is making sound. If there are 9 violin players on the server, then the violin icon has a list of 9 names above it.
each column could be 'opened' and 'closed' to see the faders inside the group. when 'open', it looks just like the current UI.
the columns would default to the 'opened' position until there are too many faders for the client's window dimensions.
sorting by instrument type horizontally, and by musican name vertically

corrados commented 4 years ago

Yes, this would be a possible solution. But I am more a fan of "keep it simple and stupid" and prefer little incremental changes to support a new use case. The "slim fader" would be much easier to implement and be a straight forward change in the Jamulus software.

corrados commented 4 years ago

I suppose this is a 8-core cpu. the 13% usage for Jamulus represents 100% of a single core.

Let's see what JimMooy reports when he has finished his second test. @JimMooy Maybe the next time you should also make a screen shot of the individual CPU cores load like this: grafik

jp8 commented 4 years ago

Yes, this would be a possible solution. But I am more a fan of "keep it simple and stupid" and prefer little incremental changes to support a new use case. The "slim fader" would be much easier to implement and be a straight forward change in the Jamulus software.

Yes and for small screens or large groups, it will be a very welcome improvement. Do you think we could have the musican's initial (the first letter of their name) instead of the instrument number?

corrados commented 4 years ago

instead of the instrument number?

Are you referring to my screen shot? The number in my screen shot is actually the name. I just used a number as an example but you could use any number or letter there.

corrados commented 4 years ago

This is what I have so far: grafik grafik grafik grafik It's not ready yet but looks promising to be included in the Git repo.

corrados commented 4 years ago

I just added the code to the Git master. If you have the possiblity to compile the code and want to test it, you can do it now. BTW: I called the new skin "Slim Channel". I am no native speaker. Would that name be ok or should we use a different name for it?

Here is a screen shot of the new implementation: grafik

jp8 commented 4 years ago

I think you should keep "Slim Channel" as your rap name :)

It looks great on Mac and Linux. Could be slimmed down a bit in the future by not showing the full-length name. And... I see the icons vary quite a bit in width.

WolfganP commented 4 years ago

Good work! I would name it "Compact Channel View" or something similar. Also, for this use case it's probably valuable to force the channel widths to the minimum (ie in this case, all forced to a width like Eli, Vik or V in the screenshot above) and if the name is longer, show the details via hover/tooltip

jp8 commented 4 years ago

Good work! I would name it "Compact Channel View" or something similar.

How about just 'Compact'

corrados commented 4 years ago

That is a good idea. I'll change it to "Compact".

corrados commented 4 years ago

Also, for this use case it's probably valuable to force the channel widths to the minimum (ie in this case, all forced to a width like Eli, Vik or V in the screenshot above) and if the name is longer, show the details via hover/tooltip

I think that is not necessary. This new view is intended for ensembles so you have a controlled environment, i.e. you know all the people and can tell them that they shall set the name with only a few letters and also do not pick an instrument picture which is too large.

JimMooy commented 4 years ago

Would you like me to turn off hyperthreading for the next test?

jp8 commented 4 years ago

My theory is that with hyper-threading turned off, you will be able to support more connected clients.

If your test is to see at which point the server fails, then you could leave hyper-threading off.

If your test is to connect as many clients as possible without a failure, then you could leave hyper-threading on.

corrados commented 4 years ago

I just closed Issue https://github.com/corrados/jamulus/issues/375 since the topic is covered here, too. For your information: I have just applied some experimental multithreading code to the git. It seems not to work as expected but at least it is a starting point. Maybe someone with multithreading expertise could give me feedback what could go wrong.

jp8 commented 4 years ago

I just closed Issue #375 since the topic is covered here, too. For your information: I have just applied some experimental multithreading code to the git. It seems not to work as expected but at least it is a starting point. Maybe someone with multithreading expertise could give me feedback what could go wrong.

Latest client from git seems to have high CPU usage - could it be related to this change?

If so, maybe the new feature could be bypassed when running as a client?

corrados commented 4 years ago

No, it can't. The new change is only enabled if you activate the feature explicitely (i.e. using qmake "CONFIG+=multithreading"). If you do not do that, the test code is completely deactivated. BTW: The new code is only in the server, not in the client anyway.

jp8 commented 4 years ago

ok must be some other problem with my client.

so far on the server I obvserve: 1 user connected, cpu 230% (out of 400) 2 users connected, cpu 240% (out of 400)

is there a way to enable debug logs on the server?

jp8 commented 4 years ago

and audio completely garbled, as soon as the second person joins

corrados commented 4 years ago

Thanks for your testing. Ok, so now we know that it is not working at all which is not good. I'll have another look at the code but these multithreading implementations are not really straight forward...

Regarding logs, no, there are no logs implemented with regard to the multithreading.

WolfganP commented 4 years ago

@corrados is there any way to run the client in a way that no live input is needed? (ie reading an audio file of some chosen format and send it to the server?)

That way it will be more easy to run multiple clients w/o having to deal with actual audio routing, and just focus on testing network performance/server load/multi-threading strategies and probably more... (it may even allow implementing metronome functions https://github.com/corrados/jamulus/issues/79 if file looping allowed)

corrados commented 4 years ago

I just checked in a new version in Git. The high CPU usage seems to be still present, but the garbled audio should be (hopefully) solved now: https://github.com/corrados/jamulus/commit/db7a7599b6a6d9e89164fb2d5227e9f35862cc5f

corrados commented 4 years ago

is there any way to run the client in a way that no live input is needed?

There is always a way ;-). But right now this is not implemented/supported in the Jamulus software.

WolfganP commented 4 years ago

is there any way to run the client in a way that no live input is needed?

There is always a way ;-). But right now this is not implemented/supported in the Jamulus software.

I was thinking something like a pseudo-device File to select in the Settings menu, and open a wav/mp3/whatever is easier to implement from there and stream it in a loop once upon connection. Or is it too difficult to develop?

corrados commented 4 years ago

I do not think it makes sense to implement this. When I do my multi-threading tests, I am running multiple Jamulus instances under Windows which works just fine for me.

corrados commented 4 years ago

As written in https://github.com/corrados/jamulus/issues/375, if I use #pragma omp parallel for in the OnTimer() function of the server, the CPU usage jumps to a high value even if the for-loop does nothing time consuming. Is there anybody who can help out here?

jp8 commented 4 years ago

I just checked in a new version in Git. The high CPU usage seems to be still present, but the garbled audio should be (hopefully) solved now: db7a759

Confirmed, the audio is good now, for me.

WolfganP commented 4 years ago

I do not think it makes sense to implement this. When I do my multi-threading tests, I am running multiple Jamulus instances under Windows which works just fine for me.

OK, sounds fair. Do you mind to detail how you test with multiple clients so we can help testing and reproduce? Do you use the same audio source for all clients? Each client a different profile or just a diff --clientname?

corrados commented 4 years ago

Yes, it is the same audio source. That was ok for my tests because I was only interested in the CPU usage when I was working on the OMP implementation.

Do you mind to detail how you test with multiple clients so we can help testing and reproduce?

Well, basically I just started the Jamulus client multiple times. That works with my ASIO driver, fortunately.

jp8 commented 4 years ago

Some observations with the multithreaded server, on two different laptops.

With just one client connected:

With 4 cores, there are 5 processes 3 are in status 'Running', with CPU around 65-70% each 2 are in status 'Sleeping'

With 8 cores there are 9 processes 7 are in status 'Running', with CPU around 65-70% each 2 are in status 'Sleeping'

On the server with 4 cores, I had 10 people on the server. CPU was around 295. I suspect that was 3x95% but I wasn't watching closely.

corrados commented 4 years ago

Thanks for the info. Next step is to find out how to reduce the OMP overhead to get the CPU load much lower. Let's see if that is possible...

storeilly commented 4 years ago

Hi Volker, I have a chain of private servers for choral use on AWS and GCP and am wondering... Would it be possible in the interim to link 2 instances on the same physical machine in code and hope the latency between the instances is low enough? In the hope that the instances use different cores. I have a vested interest in this issue as one of my choirs is 75 members due to start back in September.

jamulussoftware / jamulus

Support large ensembles (> 100 connected clients) #339