raiguard / QuickItemSearch

Factorio mod - quickly search for items in your inventory or connected logistic network.
MIT License
3 stars 0 forks source link

Desync issue in multiplayer when server is stopped while clients are still connected #4

Closed azaghal closed 4 years ago

azaghal commented 4 years ago

If a server is stopped while there are still connected players (via the /quit command in console), and then additional mods are introduced prior to starting it again, connecting clients will fail to connect, and be presented with

Reproduction steps

  1. Start a headless server.

  2. Start a client.

  3. Connect to the server.

  4. Search for items via Quick Item Search.

  5. Stop the server in the server console using the /quit command.

  6. Enable a new mod or disable an existing one on both server and client.

  7. Start the server again.

  8. Connect to the server.

  9. Search for items via Quick Item Search.

  10. Disconnect from the server.

  11. Connect to server once again.

Expected results

  1. In step 4 it is possible to search for items via Quick Item Search.

  2. In step 9 it is possible to search for items via Quick Item Search.

  3. In step 11 it is possible to connect to the server.

Actual results

  1. In step 4 it is possible to search for items via Quick Item Search.

  2. In step 9 it is not possible to search for items via Quick Item Search. User is presented with the error message Cannot open Quick Item Search, dictionary translation has not yet finished. This persists no matter how long you wait.

  3. In step 11 it is not possible to connect to the server. User is presented with the following error:

Cannot join. The following mod event handlers are not identical between you and server. This indicates that the following mods are not multiplayer (save/load) safe. (See the log file for more details):

mod-QuickItemSearch

The log file on client side contains the following error message:

 124.102 Error ClientMultiplayerManager.cpp:1108: mod-QuickItemSearch was registered for the following events when the map was saved but has not re-registered them as a result of loading: on_tick (ID 0)
 124.102 Error ClientMultiplayerManager.cpp:98: MultiplayerManager failed: "" + multiplayer.script-event-mismatch + "
" + "
mod-QuickItemSearch"
 124.102 Info ClientMultiplayerManager.cpp:578: UpdateTick(7964) changing state from(ConnectedLoadingMap) to(Failed)

Additional notes

After doing quite a bit of digging, I ran into an interesting behaviour of Factorio in regards to the LuaPlayer::connected property. Namely, if the server is stopped while there are still clients connected, during the execution of handlers registered via LuaBootstrap::on_configuration_changed their state remains connected.

Unfortunately, this seems to be the intended behaviour, which in my opinion means it must not be relied on during the LuaBootstrap::on_configuration_changed handler execution.

On the Quick Item Search, the code effectively registers the on_tick handler during LuaBootstrap::on_configuration_changed if the player is connected (via player_data.refresh). This works as intended during single player, but causes an issue in multiplayer in above scenario.

The issue in step 9 comes from the fact that LuaPlayer::request_translation can only be used on connected players, and does nothing if player is not connected. Although the property reports player as connected, the player is not really connected at that point. So, in spite of translation requests being sent off, they can never be fulfilled, and the mod remains in the "waiting for translation to finish" state (because it never really receives any translations), and it remains with the registered on_tick handler.

Once the player disconnects in step 10, the game is saved, with the registered on_tick handler.

Upon the player reconnecting, the on_tick handler will not get registered on the client side, but it is registered server side, resulting in the listed error.

P.S. Moved issue as requested :)

azaghal commented 4 years ago

Now, as for resolving this, I tried decoupling a bit the logic for the player_data.refresh to be more granular, but in the end it did not help simply because of how the single player and multiplayer have a bit of a difference in handling the player joins. E.g. in single player it seems like player joins only once during the lifetime of a savegame, so that event cannot be relied on to trigger translation dictionary rebuild.

One thing that popped to my mind to fix this problem is to mark all players as requiring to redo the translations during the LuaBootstrap::on_configuration_changed, then register an initial "one-off" on_tick handler that will essentially trigger the translation process for all currently connected users (to handle the single player mode), deregister itself, and invoke on_tick.update at that point.

But, I'm not really sure if that's a good way to go about it.

raiguard commented 4 years ago

I think this might be solvable if I simply assume that no players are connected during on_config_changed in multiplayer. I'm pretty sure that it's actually impossible. So what I'll do is check if the game is multiplayer, and if it is, always set the translate_on_join flag to true instead of starting translations. This will defer translations for all players until they actually join the game.

If that doesn't work, I did write out logic to defer checking player.connected until the next tick, but it's ugly and an absolute pain, so if this method works I'll just roll with it.

Edit: after some more investigation, I think this method will work just fine!

azaghal commented 4 years ago

Don't want to be the guy, but... I tested the LuaGameScript::is_multiplayer a bit as well. :)

It kinda works, with one caveat - if you have copied the savegame from the server (directly via file or saved the game to client while connected to server), when loading that savegame in the single player mode and on_configuration_changed gets triggered, the function will return true, in spite of the game being single player now. If the game is saved after that point (in single player), it starts returning false, though.

So, just one thing to keep in mind (incidentally, I've often enabled creative mod after playing on servers to try out a couple of things locally in the local save).

Should be a rare situation where it becomes an issue, but just something to keep in mind.

raiguard commented 4 years ago

In that case, does on_player_joined_game never fire? If it doesn't then that's a problem. If it does, then my solution will still work.

And it's no problem, I appreciate the deep dive you're taking. :)

azaghal commented 4 years ago

Doing some extra logging and testing here, it seems to me that if the player is still in the connected status during the on_configuration_changed, the on_player_joined_game never kicks in either.

I was able to reproduce this perhaps a bit more easily by simply saving the game on client side while still connected to the server, then loading this game up in single player mode. Here's a small excerpt from the logs in that case (all on client side):

  49.312 Script @__player-connected__/control.lua:24: [PCon] Entered on_load
  49.314 Script @__player-connected__/control.lua:4: [PCon] Entered on_configuration_changed
  49.314 Script @__player-connected__/control.lua:6: [PCon] Game is multiplayer: true
  49.314 Script @__player-connected__/control.lua:8: [PCon] Iterating over game.players
  49.314 Script @__player-connected__/control.lua:10: [PCon] Connected status for player azaghal is: true

If, instead, I copy over the savegame from server to client when the player is disconnected, I get the following (kinda expected since the server is aware that the player disconnected):

 337.253 Script @__player-connected__/control.lua:24: [PCon] Entered on_load
 337.255 Script @__player-connected__/control.lua:4: [PCon] Entered on_configuration_changed
 337.255 Script @__player-connected__/control.lua:6: [PCon] Game is multiplayer: true
 337.255 Script @__player-connected__/control.lua:8: [PCon] Iterating over game.players
 337.255 Script @__player-connected__/control.lua:10: [PCon] Connected status for player azaghal is: false
 337.270 Script @__player-connected__/control.lua:28: [PCon] Entered on_player_joined_game
 337.270 Script @__player-connected__/control.lua:29: [PCon] Player ID is: 1
 337.270 Script @__player-connected__/control.lua:30: [PCon] Player name is: azaghal

Small control script I've been using for testing all of this:

player_connected = {}

function player_connected.on_configuration_changed(data)
    log("[PCon] Entered on_configuration_changed")

    log("[PCon] Game is multiplayer: " .. serpent.line(game.is_multiplayer()))

    log("[PCon] Iterating over game.players")
    for i, player in pairs(game.players) do
        log("[PCon] Connected status for player " .. player.name .. " is: " .. serpent.line(player.connected))
    end
end

function player_connected.on_init()
    log("[PCon] Entered on_init")

    log("[PCon] Iterating over game.players")
    for i, player in pairs(game.players) do
        log("[PCon] Connected status for player " .. player.name .. " is: " .. serpent.line(player.connected))
    end
end

function player_connected.on_load()
    log("[PCon] Entered on_load")
end

function player_connected.on_player_joined_game(event)
    log("[PCon] Entered on_player_joined_game")
    log("[PCon] Player ID is: " .. serpent.line(event.player_index))
    log("[PCon] Player name is: " .. game.players[event.player_index].name)
end

script.on_configuration_changed(player_connected.on_configuration_changed)
script.on_init(player_connected.on_init)
script.on_load(player_connected.on_load)
script.on_event(defines.events.on_player_joined_game, player_connected.on_player_joined_game)
azaghal commented 4 years ago

Truth be told, I really don't understand how none of this is considered to be a bug in the game, but it is what it is :)

raiguard commented 4 years ago

So, on the multiplayer server, even after on_configuration_changed is over, when the player joins the game, on_player_joined_game never fires?

If that's true then the base game is just straight-up broken in this regard.

azaghal commented 4 years ago

Sorry, maybe I was a bit unclear in the last comment. The on_player_joined_game will not trigger in single-player when loading a savegame that came from a server where the player was still connected when one of the two happened:

  1. Server was stopped with /quit command while player is still connected. Then the savegame file got copied from server to client.

  2. Player saved game via client to his local machine while being still connected to the server.

In multiplayer the on_player_joined_game acts reliably, but in case of single-player there is some weirdness when doing one of the two actions above. This would be normally fine, except that the is_multiplayer() seems to rely on information stored in the savegame - e.g. it does not 100% guarantee the savegame is singleplayer/multiplayer.

raiguard commented 4 years ago

So, after describing my problems in Discord (A.K.A. complaining a lot), turns out that on_player_left_game is indeed supposed to fire in this case. Rseding has fixed it for the next version of the game. When that comes out, it will completely fix this problem, as even though player.connected will be true, on_player_left_game will clean up the incorrectly started translations before they even have a chance to start running.

It also turns out that I accidentally fixed the desync portion of this while diagnosing a completely different problem. That desync affected not just this mod, but Recipe Book and LTN Manager as well.

Thank you again for the detailed bug report! If everyone reported bugs this thoroughly, life would be so much easier... :D

azaghal commented 4 years ago

I am really glad that this ended-up being resolved in the proper way and that Wube recognised that there is an issue - thank you for actually persisting on this one, and I am glad the bug report and additional comments were helpful to you :)