Closed dingodoppelt closed 3 years ago
3.6.2 and before
Just a to clarify - does that imply you've reproduced this in earlier versions, or just by inference?
@gilgongo: not everybody was on 3.6.2 when we noticed (around 17 clients on the server). and i didn't ask for every version because i think the code concerning the levels hasn't been touched in the last few releases (correct me, if i'm wrong)
I don't think the server has any way to store this information (currently). So the two options to improve this would be:
Also, when you say "restarting" do you mean a kill -s SIGUSR1/2 as part of recording, or a kill -HUP (or stop/start)?
@gilgongo by "restarting" i mean restarting the systemd service: "systemctl restart jamulus.service" @hoffie i'd go for the client side implementation since it can already send levels but doesn't seem to do it on a restart. would we need another protocol message signaling a reconnect? i think "restarting" means quitting and starting before the clients time out so a "restart" scenario isn't handled. could it be the client doesn't notice it is being reconnected (because it doesn't get disconnected) and therefore doesn't update the fader gains? if the server made a request to the client to send the gains on connect it should work even on a restart.
UDP is a connectionless protocol. Stopping the server cannot be directly detected by the client. One of two things needs to happen: either the server says "I'm disconnecting you" or the client says "It's been rather a long time since I heard from the server". The former only happens with --discononquit
. The latter is a generous amount of time.
When the server restarts, the server has no way - currently - of knowing what clients were connected. It just waits for them to start talking to it again. Hence the generous amount of time they give it to recover.
What it sounds like we need here then is:
Now, there may be an existing message the server sends to the client when it first hears from it - can't remember... but I don't think the client's prepared to suddenly be told "You need to send state" when it thinks everything's okay.
Regarding "a generous amount of time".... the server is always sending the audio mix... this can serve as a keep-alive... the server is up unless there is no audio mix output
I am assuming the audio mix output exists as long as there is at least one user connected... If it is possible for all users to be quiet and there is no mix output, then the server has to a periodic quiet channel keep-alive message, saying the channel is quiet and all is well,
I agree that the server needs a "send state" request if it receives a message from an unexpected client.
Regarding "a generous amount of time".... the server is always sending the audio mix... this can serve as a keep-alive... the server is up unless there is no audio mix output
I'm pretty sure that's what's used -- audio is the only thing that's meant to run constantly. If it stops, you know the server's "not there"... but it could be one packet, two packets... twenty packets... buffer now seriously under running... Cloud servers sometimes do get swapped out by long running high priority tasks on the same metal - you'll see that "ping" suddenly go from 30ms to 500ms (or the maximum reported)... but the client hangs on in there, waiting for things to settle. And often after a few tens of seconds, they do. The server hasn't noticed anything (maybe a few dropped packets on its side, too, of course).
The problem only comes when the process terminates - i.e. the server stops existing. That doesn't normally happen. It's either a deliberate act (i.e. "stop the server", perhaps as part of "restart the server") or it's due to a crash. Some crash-type failures may result in an automatic restart.
But yes, simple solution: server tells client "No, no, believe me: I don't know you (any more)" and so client sends state.
I'm wondering if we could (ab)use some existing side effect in the protocol to detect a server restart, e.g.:
We should select a max time with no server transmissions to declare the server down and restart. It is almost arbitrary.
@hoffie When the server restarts, there will be no destination to send a restart message unless the server has some persistent data of connected clients just prior to the restart. What information persist across a restart?
When the server restarts, there will be no destination to send a restart message unless the server has some persistent data of connected clients just prior to the restart.
Yes, that's true at the time of the restart (and it will be true for all other proposed detection mechanisms). The reason why it could work is that clients will "reconnect" quickly (technically: they will probably not notice that they had not been connected and will continue to behave as if nothing happened).
We would explicitly be making use of the fact that the server noticed knows that it was restarted. We would only have to make clients recognize this an act like it was a fresh connection.
Have you tried it? In a quick test, restarting a server will cause a new welcome message to be sent to existing clients.
I think there are some things the client "knows" to tell the server on a fresh connect and some things the server knows to tell the client, currently. What we have here is the server "knowing" it's a fresh connect -- but the client not having realised.
So we need a way to make it clear to the client "something happened you weren't expecting" and it then needs to act appropriately.
Looks like there are already several opportunities in the server code which signal the client that it was reconnected (= is considered to be a new client by the server):
Interestingly, some of the messages explicitly allows for getting into an unexpected state https://github.com/jamulussoftware/jamulus/blob/7d0801a1578afc41803ff30a33327ed5e38d8269/src/server.cpp#L605-L610,619-626
So the problem is reduced -- the server knows to try to poke at the client for the information it needs, it's just the client side. What does it decide to do on a ClientIDMes arriving? Seems to be this... https://github.com/jamulussoftware/jamulus/blob/7d0801a1578afc41803ff30a33327ed5e38d8269/src/client.cpp#L710 which just triggers the UI to do something (probably not resend current values).
Potentially CClient::OnClientIDReceived
could do the work but I'm wary as the style seems not to follow that idiom...
since this a very rare event: can't the server just (re)start with all channels on 0% until it gets the actual values from the clients? (assuming the clients will periodically send their levels)
(assuming the clients will periodically send their levels)
They don't.
Another thought after reading some code in this area: Could it be that the necessary code for sending the proper fader levels to the server is there, but is not triggered due to the following mismatch between server and client state?
If this is true, the fix might be a change in CAudioMixerBoard::ApplyNewConClientList
. This still has to be verified (Wireshark?) and checked regarding possible side effects.
because its state does not indicate any change
If so, that shouldn't happen - I think any such check could be removed. I don't think that's it, though - I think the client only sends on connect or on change. Because the client's have no need to pass each other level information (with the exclusion of the recent addition of the "has me muted" indicator).
If so, that shouldn't happen - I think any such check could be removed. I don't think that's it, though - I think the client only sends on connect or on change. Because the client's have no need to pass each other level information (with the exclusion of the recent addition of the "has me muted" indicator).
Yeah, my suspicion was that the highlighted changed is not properly noticed by the client. The following test seems to contradict my suspicion:
tcpdump -C 32 -i lo -nn -p -s0 -w /tmp/jam.pcap '(ip[6:2]&0x3fff) != 0 or (udp portrange 22120-22139 and (udp[8:2] == 0 and udp[4:2]-17 == (udp[14]<<8)+udp[13]))'
I then analyzed the resulting dump (thanks a lot to @softins for the tcpdump filter example and the wireshark dissector. This is a huge help!).
Feb 14, 2021 12:41:29.978483000 CET 22124 22218 CLIENT_ID (0)
Feb 14, 2021 12:41:29.991517000 CET 22218 22124 CHANNEL_INFOS (Germany, None, Not_Set, Client_B)
Feb 14, 2021 12:41:30.206784000 CET 22218 22124 CHANNEL_INFOS (Germany, None, Not_Set, Client_B)
Feb 14, 2021 12:41:30.219947000 CET 22218 22124 CHANNEL_GAIN ([0] => 100%)
Feb 14, 2021 12:41:30.220012000 CET 22218 22124 CHANNEL_GAIN ([0] => 3%)
Feb 14, 2021 12:41:37.570514000 CET 22124 22217 CLIENT_ID (1)
Feb 14, 2021 12:41:37.584326000 CET 22217 22124 CHANNEL_INFOS (Germany, None, Not_Set, Client_A)
Feb 14, 2021 12:41:37.585077000 CET 22218 22124 CHANNEL_GAIN ([1] => 100%)
Feb 14, 2021 12:41:37.587252000 CET 22218 22124 CHANNEL_GAIN ([1] => 3%)
Feb 14, 2021 12:41:37.781320000 CET 22217 22124 CHANNEL_INFOS (Germany, None, Not_Set, Client_A)
Feb 14, 2021 12:41:37.800390000 CET 22217 22124 CHANNEL_GAIN ([0] => 100%)
Feb 14, 2021 12:41:37.800470000 CET 22217 22124 CHANNEL_GAIN ([0] => 100%)
Feb 14, 2021 12:41:37.800518000 CET 22217 22124 CHANNEL_GAIN ([1] => 24%)
# Server restart
Feb 14, 2021 12:41:48.214739000 CET 22124 22217 CLIENT_ID (0)
Feb 14, 2021 12:41:48.214865000 CET 22124 22218 CLIENT_ID (1)
Feb 14, 2021 12:41:48.215131000 CET 22217 22124 CHANNEL_INFOS (Germany, None, Not_Set, Client_A)
Feb 14, 2021 12:41:48.215287000 CET 22218 22124 CHANNEL_INFOS (Germany, None, Not_Set, Client_B)
Feb 14, 2021 12:41:48.215468000 CET 22217 22124 CHANNEL_GAIN ([0] => 24%)
Feb 14, 2021 12:41:48.215776000 CET 22218 22124 CHANNEL_GAIN ([0] => 3%)
Feb 14, 2021 12:41:48.217994000 CET 22217 22124 CHANNEL_GAIN ([1] => 100%)
Feb 14, 2021 12:41:48.219890000 CET 22218 22124 CHANNEL_GAIN ([1] => 3%)
So... protocol-wise everything seems fine to me in this case: The clients do properly send their intended gain levels to the server. I did not verify how the audio behaves in this case, but I think the test shows that:
Edit: Attached the pcap, in case someone wants to check themselves. jam.pcap.log
Edit 2: Seems like I can reproduce the issue partially: After a server restart, sometimes only one CHANNEL_GAIN message per client can be observed. This would explain the behavior from the issue description although I'm still unsure about the reason. Timing may play a role, repeated restarts may play a role. Will continue investigating as time permits.
By now I'm pretty confident that my hypothesis is correct, i.e. that
So, essentially, this is a race condition:
Two possible fixes come to mind:
CONN_CLIENTS_LIST
updates are received from the server. #1009 implements that.CONN_CLIENTS_LIST
in order to signal a reconnect and make them reset their local channel list. The next (regular) CONN_CLIENTS_LIST
message will then be handled as if those channels were new, i.e. gain levels would be properly updated. #1010 implements that.I have verified this with the following tests: Test 1: Vanilla Jamulus client/server, reconnect in initial order test1-vanilla-reconnect-in-normal-order.pcap.log
iptables -I OUTPUT -p udp --sport 22218 -j DROP
)iptables -D OUTPUT -p udp --sport 22218 -j DROP
)Test 2: Vanilla Jamulus client/server, reconnect in opposite order test2-vanilla-reconnect-in-reverse-order.pcap.log
iptables -I OUTPUT -p udp --sport 22217 -j DROP
)iptables -D OUTPUT -p udp --sport 22217 -j DROP
)Test 3: Vanilla Jamulus server with modified Jamulus client (PR #1009), reconnect in initial order test3-modified-client-reconnect-in-normal-order.log
iptables -I OUTPUT -p udp --sport 22218 -j DROP
)iptables -D OUTPUT -p udp --sport 22218 -j DROP
)Test 4: Vanilla Jamulus client with modified Jamulus server (PR #1010), reconnect in initial order test4-modified-server-reconnect-in-normal-order.pcap.log
iptables -I OUTPUT -p udp --sport 22218 -j DROP
)iptables -D OUTPUT -p udp --sport 22218 -j DROP
)To move this forward, we would have to do the following:
I suggest that general discussion on this issue is kept here while discussion about correctness, upsides and downsides of the proposals are directed to the relevant PRs which are linked above.
Great work @hoffie! I agree with the need to do more testing, but I have examined both PRs and the changes look good.
One test is missing from the above: modified Jamulus client (PR #1009) with modified Jamulus server (PR #1010)
For maximum compatibility, my initial view is that both PRs should be applied. That would improve the issue with both older clients on a new server, and with newer clients on an old server. Provided there is no problem with new clients talking to a new server.
One side effect will be that updated gain levels are sent more often (each time a new client list is distributed to the client). I don't think that this will happen really often and I don't think there is a problem if this causes some redundant protocol messages as the overhead should be small.
I would think that the bandwidth consumed in this case is much less than that used by the audio streams, and so not significant.
That's a fair point, @softins, and one I'd not thought of -- having both may well improve matters significantly more than just the one.
Hopefully at some point the "random pan" bug that affects certain combinations of client and server will get squashed in the cross-fire of fixes like this, too.
@softins I have done the test you suggested and it looks good (as expected). When combining both fixes, the client-side fix becomes irrelevant in the scenario. Of course, it will stay relevant when connecting to servers without the fix.
I would think that the bandwidth consumed in this case is much less than that used by the audio streams, and so not significant.
Yeah, I fully agree, just wanted to mention it. :)
Hopefully at some point the "random pan" bug that affects certain combinations of client and server will get squashed in the cross-fire of fixes like this, too.
Don't know what specific bug you refer to, but I want to add that my PRs intend to fix the gain bug mentioned in this issue, but in practice they should do even more: They will force the client to re-trigger sending all mixer properties. This should cover gain, pan, mute, solo and group (the latter two indirectly as they seem to be handled via mute protocol messages?). So, there is a real chance that the bug you are referring to may be fixed by this as well.
I guess there is agreement that more testing is needed, especially for the client-side fix where I'm really unsure about possible side effects. However, I fail to come up with useful test cases. I might just run my own Jamulus stuff with both fixes applied and will try to check if anything breaks.
Test 5: Modified Jamulus client (PR #1009) with modified Jamulus server (PR #1010), reconnect in initial order test5-modified-server-and-client-reconnect-in-normal-order.pcap.log
iptables -I OUTPUT -p udp --sport 22218 -j DROP
)iptables -D OUTPUT -p udp --sport 22218 -j DROP
)Sorry to jump on this thread, I have searched for other tickets and maybe this is closest.
Does the behaviour described mix everyone at 100% or at "new client" level? And is there a separate subject anywhere about setting default new client level to something like 70% instead of 100% (which seems a poor default)?
Does the behaviour described mix everyone at 100% or at "new client" level?
I'm not the expert on this issue, but I think it must be 100%. The "new client" level is a client-side setting, but this issue is that when the server is restarted, it doesn't know each client's fader settings. I assume the 100% is an initialised default within the server before it receives the actual setting from the client.
And is there a separate subject anywhere about setting default new client level to something like 70% instead of 100% (which seems a poor default)?
Don't know, sorry. Raise it as an issue, and hopefully that will attract discussion.
I've just gotten around to finishing tests with real audio.
I ran two Jamulus clients which connect to a local server. One client (A) fed music into the server (and has himself muted in its mix). I was listening to another Jamulus client (B). I had set the level of the music-providing client (A) really low in B's mix to spot any differences after restarts.
I have not noticed any downsides of either PRs.
I therefore agree that both PRs should be merged. Anyone hitting this issue can upgrade their server, their client or both to profit from the fix.
Describe the bug Restarting the server sets levels to 100%
To Reproduce restart a running server with clients on it and notice how it ignores the set gain levels and mixes everybody at 100%
Expected behavior the server should mix everybody at the individually set levels
Operating system linux, windows, macos
Version of Jamulus 3.6.2 and before