Closed Hartmnt closed 2 years ago
Looking back at the conversation in https://github.com/mumble-voip/mumble/pull/4147 it seems that the reasoning for clearing last_disconnect
on server start is this:
The last_disconnect
field is normally set whenever a user disconnects from the server. However, when the server shuts down, while the user is still connected, apparently this value does not get set to the time when the server shuts down (I am no longer sure whether this was just an assumption or fact). Thus, the field would contain wrong data and if the user has been connected for a while before the shutdown, the user might get reconnected to Root instead of their old channel, even if the server restarts right after having been shutdown.
Since timing the update of last_disconnected
field with the shutdown (to ensure it gets updated while the respective user object is still valid) appeared to be a bit complicated, we decided that it will be easier to just wipe that value from the DB after a server restart.
So I guess the first issue you found is actually code working as intended (though we should probably add a comment to that line explaining why it is there). Therefore, I think the second way to fix this that you suggested is the way to go here. Do you want to create a PR yourself or do you want me to take care of this?
Therefore, I think the second way to fix this that you suggested is the way to go here. Do you want to create a PR yourself or do you want me to take care of this?
Well, I can certainly create a pull request for a fix, but I think there is still a problem with the situation here...
The
last_disconnect
field is normally set whenever a user disconnects from the server. However, when the server shuts down, while the user is still connected, apparently this value does not get set to the time when the server shuts down (I am no longer sure whether this was just an assumption or fact). Thus, the field would contain wrong data and if the user has been connected for a while before the shutdown, the user might get reconnected to Root instead of their old channel, even if the server restarts right after having been shutdown.Since timing the update of
last_disconnected
field with the shutdown (to ensure it gets updated while the respective user object is still valid) appeared to be a bit complicated, we decided that it will be easier to just wipe that value from the DB after a server restart.
If we fix 2) without touching 1) that means that all clients will connect to the root channel instead of their last channel, if the server was restarted independent of their actual last_disconnect
. Since murmur sets the column to NULL
to prevent this in the first place, we would create the exact opposite problem.
Can we reliably assume that last_disconnect
is valid, if and only if last_disconnect >= last_active
? If the server shuts down, active clients would not update last_disconnect
leaving it at NULL
or < last_active
. Therefore, we could check that condition and decide, if to use last_disconnect
or not? Of course we MUST check this condition before the clients last_active
field is updated or save the old value in memory for later. (Edit: And we would of course not set last_disconnect
to NULL
in that case)
If murmur saves its starting datetime, we can use that to compare to rememberchannelduration
when the above check tells us last_disconnect
is invalid.
I think if we are careful about using the correct values at the correct time, this procedure will not produce errors (?) I would appreciate, if someone would double-check my logical reasoning.
Since murmur sets the column to NULL to prevent this in the first place, we would create the exact opposite problem.
Indeed. However, I think in this case the argument could be made that by restarting the server, you essentially also restart from a clean slate and thus start connecting to Root. It would certainly be not ideal though.
Can we reliably assume that last_disconnect is valid, if and only if last_disconnect >= last_active?
This sounds reasonable to me. I can't come with a scenario in which this assumption would break :thinking:
If murmur saves its starting datetime, we can use that to compare to rememberchannelduration when the above check tells us last_disconnect is invalid.
Do you intend to use the server's starting datetime as a replacement for last_disconnect
?
This then holds the assumption that every case in which last_disconnect
is invalid is due to the user having been connected while the server was shut down... Under normal circumstances this appears as if it should hold :eyes:
However, I think the first thing that should be tested is whether last_disconnect
is really not set properly when the server is shut down. If it isn't then this seems to indicate that the server does not shut down very gracefully and the better fix to this entire issue might be to improve the shutdown procedure to first disconnect all users (properly, using the regular disconnect logic) and only if the server is empty actually shut it down. In this case we should always end up with the properly set last_disconnect
and could just always use that, provided it is not NULL
:thinking:
Do you intend to use the server's starting datetime as a replacement for
last_disconnect
?
Yes, in an ideal world this would be used as replacement for maximum accuracy. For the sake of simplicity, we could also just assume that invalid last_disconnect
values must mean the client was active when the server shut down. But if we have the server start datetime available, i think that should be used instead.
However, I think the first thing that should be tested is whether
last_disconnect
is really not set properly when the server is shut down. If it isn't then this seems to indicate that the server does not shut down very gracefully and the better fix to this entire issue might be to improve the shutdown procedure to first disconnect all users (properly, using the regular disconnect logic) and only if the server is empty actually shut it down. In this case we should always end up with the properly setlast_disconnect
and could just always use that, provided it is notNULL
True, but I think there are reasonable scenarios where we do not expect the last_disconnect
to be updated correctly, even if the server close routine is improved. (SIGKILL for instance)
Here is the neat thing: My logic would work whether the server close routine is improved or not. If last_disconnect
were to always be updated without race conditions or similar considerations, the logic will see that the value is valid and use it.
If there are unexpected scenarios where last_disconnect
is still not updated correctly, the logic would fall back and not use last_disconnect
.
I am willing to work on a pull request for my proposed solution, but for improving the server close logic I think I lack the necessary experience in the mumble code base. Maybe a complete test and audit of the disconnect logic would deserve its own Issue.
Do you want me to start working on the above logic?
Do you want me to start working on the above logic?
Yes, please. In so doing, maybe you could also just test whether last_disconnect
is currently properly set when shutting down the server. If it is not, I might look into improving the shutdown routine myself.
@Krzmbrzl I have two questions regarding the implementation:
1) The previous implementation uses both setTimeSpec(Qt::UTC)
and toTimeSpec(Qt::UTC)
. Is there a reason why now
is copied using the latter? I would rather just use the former method on all local objects. CODE
1) I have searched for possible server starting datetime objects in the existing code and it appears there is none. I would propose adding a public member QDateTime
to the Server
class in Server.cpp
and initializing it to QDateTime::currentDateTime()
as this would work per virtual server instance, right? This member could also be used in the future, if there is any need. Any thoughts?
Edit: Nevermind, I found tUptime
in the server class which can be used to calculate the server start datetime.
Yes, please. In so doing, maybe you could also just test whether
last_disconnect
is currently properly set when shutting down the server. If it is not, I might look into improving the shutdown routine myself.
With the clearLastDisconnect
routine removed:
The server does NOT update last_disconnect
when receiving SIGTERM, SIGKILL or SIGQUIT.
The server does update last_disconnect
when the user manually quits client side or times out.
Do you want me to try more signals or scenarios?
The previous implementation uses both setTimeSpec(Qt::UTC) and toTimeSpec(Qt::UTC). Is there a reason why now is copied using the latter? I would rather just use the former method on all local objects. CODE
No idea, why this was done in this way. Looking at this now, it does indeed appear a bit odd :eyes:
The server does NOT update last_disconnect when receiving SIGTERM, SIGKILL or SIGQUIT. The server does update last_disconnect when the user manually quits client side or times out.
Okay, thanks for testing this. In that case I guess we'll have to look into that eventually...
Do you want me to try more signals or scenarios?
Nah, I think this is sufficient, thanks!
Description
We use the
rememberchannelduration
server option on our servers. Last used channels are supposed to be remembered for 5 minutes for potential connection losses and such, but otherwise clients are expected to connect directly to the root channel. This usually works as expected. However, the murmur server is restarted over night every couple of weeks for maintenance.Expected Behavior: The server restart has no effect on the behavior of
rememberchannelduration
. Clients connect to the root channel, except when they reconnect within the specified grace period.Actual Behavior: On the first connection of a client after the server restarted the remembered channel is used, even if the
rememberchannelduration
has long expired.Steps to reproduce
rememberchannelduration > 0
,rememberchannel = true
and multiple permanent channelsrememberchannelduration
expireMumble version
1.4.0
Mumble component
Server
OS
Linux
Reproducible?
Yes
Additional information
Offending Code: Upon examining the code of the feature pull request #4147 we can see two things which are handled weirdly/wrongly: 1) For a reason I do not fully understand the
last_disconnect
field is set toNULL
when starting murmur. CODE 1) And the logic inServerDB.cpp
is broken whenrememberchannel == true
,rememberchannelduration > 0
andlast_disconnect
is equal toNULL
.rememberchannelduration
is ignored and the server assigns the remembered channel anyway (as long as the channel still exists). CODEProposed Solution: To fix this murmur could 1) not set
last_disconnect
toNULL
when starting. 2) fix the logic inServerDB.cpp
by returning-1
in caselast_disconnect
is equal toNULL
ANDrememberchannelduration > 0
.Or possibly both.
Relevant log output
No response
Screenshots
No response