mumble-voip / mumble

Mumble is an open-source, low-latency, high quality voice chat software.
https://www.mumble.info
Other
6.28k stars 1.11k forks source link

rememberchannelduration option is ignored after restarting murmur #5639

Closed Hartmnt closed 2 years ago

Hartmnt commented 2 years ago

Description

We use the rememberchannelduration server option on our servers. Last used channels are supposed to be remembered for 5 minutes for potential connection losses and such, but otherwise clients are expected to connect directly to the root channel. This usually works as expected. However, the murmur server is restarted over night every couple of weeks for maintenance.

Expected Behavior: The server restart has no effect on the behavior of rememberchannelduration. Clients connect to the root channel, except when they reconnect within the specified grace period.

Actual Behavior: On the first connection of a client after the server restarted the remembered channel is used, even if the rememberchannelduration has long expired.

Steps to reproduce

  1. Start murmur with rememberchannelduration > 0, rememberchannel = true and multiple permanent channels
  2. Make sure the test user is registered
  3. Make sure the channel remember delay works in regular use
  4. Disconnect the user
  5. Restart murmur
  6. Let the rememberchannelduration expire
  7. Reconnect

Mumble version

1.4.0

Mumble component

Server

OS

Linux

Reproducible?

Yes

Additional information

Offending Code: Upon examining the code of the feature pull request #4147 we can see two things which are handled weirdly/wrongly: 1) For a reason I do not fully understand the last_disconnect field is set to NULL when starting murmur. CODE 1) And the logic in ServerDB.cpp is broken when rememberchannel == true, rememberchannelduration > 0 and last_disconnect is equal to NULL. rememberchannelduration is ignored and the server assigns the remembered channel anyway (as long as the channel still exists). CODE

Proposed Solution: To fix this murmur could 1) not set last_disconnect to NULL when starting. 2) fix the logic in ServerDB.cpp by returning -1 in case last_disconnect is equal to NULL AND rememberchannelduration > 0.

Or possibly both.

Relevant log output

No response

Screenshots

No response

Krzmbrzl commented 2 years ago

Looking back at the conversation in https://github.com/mumble-voip/mumble/pull/4147 it seems that the reasoning for clearing last_disconnect on server start is this:

The last_disconnect field is normally set whenever a user disconnects from the server. However, when the server shuts down, while the user is still connected, apparently this value does not get set to the time when the server shuts down (I am no longer sure whether this was just an assumption or fact). Thus, the field would contain wrong data and if the user has been connected for a while before the shutdown, the user might get reconnected to Root instead of their old channel, even if the server restarts right after having been shutdown.

Since timing the update of last_disconnected field with the shutdown (to ensure it gets updated while the respective user object is still valid) appeared to be a bit complicated, we decided that it will be easier to just wipe that value from the DB after a server restart.

So I guess the first issue you found is actually code working as intended (though we should probably add a comment to that line explaining why it is there). Therefore, I think the second way to fix this that you suggested is the way to go here. Do you want to create a PR yourself or do you want me to take care of this?

Hartmnt commented 2 years ago

Therefore, I think the second way to fix this that you suggested is the way to go here. Do you want to create a PR yourself or do you want me to take care of this?

Well, I can certainly create a pull request for a fix, but I think there is still a problem with the situation here...

The last_disconnect field is normally set whenever a user disconnects from the server. However, when the server shuts down, while the user is still connected, apparently this value does not get set to the time when the server shuts down (I am no longer sure whether this was just an assumption or fact). Thus, the field would contain wrong data and if the user has been connected for a while before the shutdown, the user might get reconnected to Root instead of their old channel, even if the server restarts right after having been shutdown.

Since timing the update of last_disconnected field with the shutdown (to ensure it gets updated while the respective user object is still valid) appeared to be a bit complicated, we decided that it will be easier to just wipe that value from the DB after a server restart.

If we fix 2) without touching 1) that means that all clients will connect to the root channel instead of their last channel, if the server was restarted independent of their actual last_disconnect. Since murmur sets the column to NULL to prevent this in the first place, we would create the exact opposite problem.

Can we reliably assume that last_disconnect is valid, if and only if last_disconnect >= last_active? If the server shuts down, active clients would not update last_disconnect leaving it at NULL or < last_active. Therefore, we could check that condition and decide, if to use last_disconnect or not? Of course we MUST check this condition before the clients last_active field is updated or save the old value in memory for later. (Edit: And we would of course not set last_disconnect to NULL in that case)

If murmur saves its starting datetime, we can use that to compare to rememberchannelduration when the above check tells us last_disconnect is invalid.

I think if we are careful about using the correct values at the correct time, this procedure will not produce errors (?) I would appreciate, if someone would double-check my logical reasoning.

Krzmbrzl commented 2 years ago

Since murmur sets the column to NULL to prevent this in the first place, we would create the exact opposite problem.

Indeed. However, I think in this case the argument could be made that by restarting the server, you essentially also restart from a clean slate and thus start connecting to Root. It would certainly be not ideal though.

Can we reliably assume that last_disconnect is valid, if and only if last_disconnect >= last_active?

This sounds reasonable to me. I can't come with a scenario in which this assumption would break :thinking:

If murmur saves its starting datetime, we can use that to compare to rememberchannelduration when the above check tells us last_disconnect is invalid.

Do you intend to use the server's starting datetime as a replacement for last_disconnect? This then holds the assumption that every case in which last_disconnect is invalid is due to the user having been connected while the server was shut down... Under normal circumstances this appears as if it should hold :eyes:

However, I think the first thing that should be tested is whether last_disconnect is really not set properly when the server is shut down. If it isn't then this seems to indicate that the server does not shut down very gracefully and the better fix to this entire issue might be to improve the shutdown procedure to first disconnect all users (properly, using the regular disconnect logic) and only if the server is empty actually shut it down. In this case we should always end up with the properly set last_disconnect and could just always use that, provided it is not NULL :thinking:

Hartmnt commented 2 years ago

Do you intend to use the server's starting datetime as a replacement for last_disconnect?

Yes, in an ideal world this would be used as replacement for maximum accuracy. For the sake of simplicity, we could also just assume that invalid last_disconnect values must mean the client was active when the server shut down. But if we have the server start datetime available, i think that should be used instead.

However, I think the first thing that should be tested is whether last_disconnect is really not set properly when the server is shut down. If it isn't then this seems to indicate that the server does not shut down very gracefully and the better fix to this entire issue might be to improve the shutdown procedure to first disconnect all users (properly, using the regular disconnect logic) and only if the server is empty actually shut it down. In this case we should always end up with the properly set last_disconnect and could just always use that, provided it is not NULL

True, but I think there are reasonable scenarios where we do not expect the last_disconnect to be updated correctly, even if the server close routine is improved. (SIGKILL for instance)

Here is the neat thing: My logic would work whether the server close routine is improved or not. If last_disconnect were to always be updated without race conditions or similar considerations, the logic will see that the value is valid and use it. If there are unexpected scenarios where last_disconnect is still not updated correctly, the logic would fall back and not use last_disconnect.

I am willing to work on a pull request for my proposed solution, but for improving the server close logic I think I lack the necessary experience in the mumble code base. Maybe a complete test and audit of the disconnect logic would deserve its own Issue.

Do you want me to start working on the above logic?

Krzmbrzl commented 2 years ago

Do you want me to start working on the above logic?

Yes, please. In so doing, maybe you could also just test whether last_disconnect is currently properly set when shutting down the server. If it is not, I might look into improving the shutdown routine myself.

Hartmnt commented 2 years ago

@Krzmbrzl I have two questions regarding the implementation:

1) The previous implementation uses both setTimeSpec(Qt::UTC) and toTimeSpec(Qt::UTC). Is there a reason why now is copied using the latter? I would rather just use the former method on all local objects. CODE 1) I have searched for possible server starting datetime objects in the existing code and it appears there is none. I would propose adding a public member QDateTime to the Server class in Server.cpp and initializing it to QDateTime::currentDateTime() as this would work per virtual server instance, right? This member could also be used in the future, if there is any need. Any thoughts?

Edit: Nevermind, I found tUptime in the server class which can be used to calculate the server start datetime.

Hartmnt commented 2 years ago

Yes, please. In so doing, maybe you could also just test whether last_disconnect is currently properly set when shutting down the server. If it is not, I might look into improving the shutdown routine myself.

With the clearLastDisconnect routine removed:

The server does NOT update last_disconnect when receiving SIGTERM, SIGKILL or SIGQUIT.
The server does update last_disconnect when the user manually quits client side or times out.

Do you want me to try more signals or scenarios?

Krzmbrzl commented 2 years ago

The previous implementation uses both setTimeSpec(Qt::UTC) and toTimeSpec(Qt::UTC). Is there a reason why now is copied using the latter? I would rather just use the former method on all local objects. CODE

No idea, why this was done in this way. Looking at this now, it does indeed appear a bit odd :eyes:

The server does NOT update last_disconnect when receiving SIGTERM, SIGKILL or SIGQUIT. The server does update last_disconnect when the user manually quits client side or times out.

Okay, thanks for testing this. In that case I guess we'll have to look into that eventually...

Do you want me to try more signals or scenarios?

Nah, I think this is sufficient, thanks!