minetest / minetest

Minetest is an open source voxel game-creation platform with easy modding and game creation
https://www.minetest.net/
Other
10.63k stars 2k forks source link

minetest.get_player_information sometimes returns nil (in on_joinplayer, in singleplayer) #9352

Closed paramat closed 3 years ago

paramat commented 4 years ago

I am on engine commit https://github.com/minetest/minetest/commit/b3c245bb46890d2355ba6456062db65eda475e28 The following happens in singleplayer.

I have just started using the new formspec prepend code:

-- GUI related stuff
minetest.register_on_joinplayer(function(player)
    -- Set formspec prepend
    local formspec = [[
            bgcolor[#080808BB;true]
            listcolors[#00000069;#5A5A5A;#141318;#30434C;#FFF] ]]
    local name = player:get_player_name()
    local info = minetest.get_player_information(name)
    if info.formspec_version > 1 then
        formspec = formspec .. "background9[5,5;1,1;gui_formbg.png;true;10]"
    else
        formspec = formspec .. "background[5,5;1,1;gui_formbg.png;true]"
    end
    player:set_formspec_prepend(formspec)

... in a game of my own https://github.com/paramat/minipeli

Now, when starting a new world, MT shuts down with error:

[Server]: static int ModApiServer::l_get_player_information(lua_State*): peer was not
found
ERROR[Main]: ServerError: AsyncErr: ServerThread::run Lua: Runtime error from mod 'gui'
in callback on_joinplayer(): ...ods/minetest/bin/../games/minipeli/mods/gui/init.lua:10:
attempt to index local 'info' (a nil value)

Relevant engine code: https://github.com/minetest/minetest/blob/9acd36bf9981cf9cb8e82785508d6eb7f010cc77/src/script/lua_api/l_server.cpp#L139

int ModApiServer::l_get_player_information(lua_State *L)
{

    NO_MAP_LOCK_REQUIRED;
    const char * name = luaL_checkstring(L, 1);
    RemotePlayer *player = dynamic_cast<ServerEnvironment *>(getEnv(L))->getPlayer(name);
    if (player == NULL) {
        lua_pushnil(L); // no such player
        return 1;
    }

    Address addr;
    try
    {
        addr = getServer(L)->getPeerAddress(player->getPeerId());
    } catch(const con::PeerNotFoundException &) {
        dstream << FUNCTION_NAME << ": peer was not found" << std::endl;
        lua_pushnil(L); // error
        return 1;
    }

Of course, there is no error when using MTGame, which is odd, i cannot see what is different. If it is a case of dependencies, is there any code in MTGame that must be run before 'minetest.get_player_information()'? The formspec prepend code is highly likely to be copied into new games, so i think this is worth fixing. My game might be exposing a weakness in MTGame code that might cause a bug in MTGame in future.

paramat commented 4 years ago

Odd, now i deleted the first world, started a new one and no error. Another behaviour: After the first error, try re-entering the initial world, sometimes it then works. Another behaviour: No initial error at all.

It seems the error is most likely to happen when entering a world after MT startup, however, reproduction is unreliable.

paramat commented 4 years ago

Well, i have a workaround for my game. If i make the mod that contains the formspec prepend code depend on 'player_api', as MTGame default mod does, no error. But i cannot see why this avoids the error.

paramat commented 4 years ago

Ugh, now i cannot make the error happen without my workaround, so i do not know if the workaround ha sany effect.

paramat commented 4 years ago

Closing until it possibly reoccurrs.

paramat commented 4 years ago

Occurred again in my own game. I think we should make the code in MTG more robust in case this occurs in MTG, also to be a good coding example for other games that are highly likely to copy this code from MTG. So moving back to MTG and reopening.

I can make a PR, my suggestion is: if info and info.formspec_version > 1 then

EDIT: Occurred again.

rubenwardy commented 4 years ago

It looks like a race condition where the player isn't available during on join player. This regression has happened a few times, although it's usually due to emerging rather than the peer

Shouldn't this be on the engine repo?

paramat commented 4 years ago

Yes i suspected a race condition. Maybe my game calls this a little earlier because it is such a simple game, maybe this is why MTG does not have this issue. And yes this should be in the engine, my mistake.

SmallJoker commented 4 years ago

I suspect this is caused by wrong RTT calculations.

Please try to figure out which of those checks causes a direct return: https://github.com/minetest/minetest/blob/4eb3baa51ed95d7661bbeecbbdf040d51ac32187/src/script/lua_api/l_server.cpp#L174-L185

sorcerykid commented 4 years ago

I ran into this exact issue a few years ago when developing one of my mods, so it definitely appears to be a race condition that has never been addressed.

https://github.com/sorcerykid/auth_rx/blob/e59d77e153413a5b37cced88e4fef0d9d7044eda/init.lua#L74

I discovered the workaround was simply to wait a single server tick via minetest.after(0, func), and then to query the player information.

thomasrudin commented 4 years ago

Can confirm, had the same issue, but in a globalstep:

https://github.com/minetest-monitoring/monitoring/blob/501902fe2f619ac026633b19d1d37fbc36cdc24a/monitoring/builtin/player_rtt.lua#L27-L29

paramat commented 4 years ago

In globalstep, interesting, does the problem occur only at world startup or at any time?

rubenwardy commented 4 years ago

Attach a debugger. Which early return causes the issue? Why does the early return trigger?

rubenwardy commented 4 years ago

Look for error messages in dstream. This function produces logs on error

paramat commented 4 years ago

Reproduced the bug in my game but: No extra info in verbose debug.txt. No error info when using debug build and gdb, 'bt full' (to display full backtrace) results in 'no stack', huh?

paramat commented 4 years ago

Actually, all the various error reports return dstream << FUNCTION_NAME << ": peer was not found" << std::endl; so the error message would not tell us which ERET function was responsible.

EDIT: Actually, the dstream error message is already documented in the first post.

paramat commented 4 years ago

Ah got it ... by printing "ERET 1/2/3... ok" messages after each ERET. First 2 ERETs work ok, so the fail is: ERET(getServer(L)->getClientConInfo(player->getPeerId(), con::AVG_RTT, &avg_rtt)) Which suggests AVG RTT @SmallJoker

jomunoz commented 4 years ago

I'm experiencing a similar issue using 5.2-dev. However in my case it is calls to minetest.get_connected_players().

There are two mods that exercise the bug consistently those are the orienteering and wielded_light mods. To replicate the issue:

  1. Host a game using one of those mods (or both).
  2. Connect to the host using a different computer.
  3. Walk for a few seconds (or more).
  4. Disconnect from the server.
sfan5 commented 4 years ago

I'm experiencing a similar issue using 5.2-dev. However in my case it is calls to minetest.get_connected_players().

sounds like #9428, in any case it would be useful if you could post the backtrace. Also consider opening a separate issue for it.

sorcerykid commented 4 years ago

It seems to be a different issue, since the crash occurs when disconnecting, which should be long after the environment has been initialized. They are also connecting to the server from a different computer, so it's unlikely to be a of race condition which might happen in singleplayer.

paramat commented 4 years ago

Possible fix is: IRC http://irc.minetest.net/minetest-dev/2020-03-05#i_5648091

20:10 Krock        > getClientConInfo(player->getPeerId(), con::AVG_RTT, &avg_rtt))
20:10 Krock        well, according to paramat's post, the average rtt is broken, and that's a known things
20:10 Krock        -s
20:11 Krock        and the average rtt defaults to -1 at start IIRC
20:12 Krock        connection.h L618
20:12 Krock        which results in the return value -1 which is interpreted as error
SmallJoker commented 4 years ago

Changing the default value to -2 would be a simple fix, but doing it properly requires fixing the RTT calculations entirely (see #7428 -> #8183 for side-effects).

jomunoz commented 4 years ago

I'm experiencing a similar issue using 5.2-dev. However in my case it is calls to minetest.get_connected_players().

sounds like #9428, in any case it would be useful if you could post the backtrace. Also consider opening a separate issue for it.

Created separate issue with backtrace https://github.com/minetest/minetest/issues/9474

paramat commented 4 years ago

I tested SmallJoker's diff:

--- a/src/network/connection.h
+++ b/src/network/connection.h
@@ -612,16 +612,16 @@ class Peer {
        struct rttstats {
            float jitter_min = FLT_MAX;
            float jitter_max = 0.0f;
-           float jitter_avg = -1.0f;
+           float jitter_avg = -2.0f;
            float min_rtt = FLT_MAX;
            float max_rtt = 0.0f;
-           float avg_rtt = -1.0f;
+           float avg_rtt = -2.0f;

            rttstats() = default;
        };

        rttstats m_rtt;
-       float m_last_rtt = -1.0f;
+       float m_last_rtt = -2.0f;

        // current usage count
        unsigned int m_usage = 0;

... and could not reproduce the error on ~30 tries (previously the error occurred far more often than that). I also printed out the formspec version returned, it was 3.

@SmallJoker wrote in IRC that they would merge the fix as a trivial commit, however i can submit it as a PR if desired, as i have it as a PR branch.

Keep this issue open though for a proper fix later.

SmallJoker commented 4 years ago

23c907befea0

paramat commented 4 years ago

A workaround fix has been made for 5.2.0 release, keeping open for a proper fix later.

gpcf commented 4 years ago

This still seems to happen on the linux-forks server, running 5.2.0 of both engine and MTG.

sfan5 commented 4 years ago

Also observed by @VanessaE on her servers with 5.2.0. Since none(?) of the devs could reproduce this, we'll require some assistance to get this properly fixed.

paramat commented 4 years ago

I retested in 5.2.0 using one of my simple games that previously triggered the bug, but have not reproduced a problem in 5.2.0 yet.

sfan5 commented 4 years ago

My theory regarding why it's still crashing in 5.2 is that the player actually disappears this time.

Why? l_get_player_information calls getClientConInfo which reaches into the connection subsystem to retrieve the RTT values and other things. Typically the server receives events from the connection code and only acts on (dis)-connects/commands once a server step. But in this case the connection code deletes the peer from its structures as soon as the relevant packet is received, so the peer can disappear at any time while the Lua or server code runs.

FriendlyGamer commented 4 years ago

As presented by @Lejo1 this issue still have not be fixed as that was done on a 5.3 server with their mod and they said it's this sort of issue. Not the mod itself.

Lejo1 commented 4 years ago

I'm not sure if this is really related to this but it seems so:

2020-07-12 14:01:47: ACTION[Server]: vps_blocker: Kicking bad-ip-player talex [x.x.x.x]

2020-07-12 14:01:47: WARNING[ConnectionSend]: con(4/1) Packet quota used up for peer_id=113, was 256 pkts

2020-07-12 14:01:54: ACTION[Server]: talex [x.x.x.x] joins game. List of players: David FriendlyPerson Pony talex

2020-07-12 14:01:54: ACTION[Server]: vps_blocker: Kicking bad-ip-player talex [x.x.x.x]

2020-07-12 14:01:54: [Server]: static int ModApiServer::l_get_player_information(lua_State*): peer was not found

2020-07-12 14:01:54: ACTION[Main]: Server: Shutting down

2020-07-12 14:01:56: ERROR[Main]: ServerError: AsyncErr: ServerThread::run Lua: Runtime error from mod 'default' in callback on_joinplayer(): ...share/minetest/games/minetest_game/mods/default/init.lua:23: attempt to index local 'info' (a nil value)

2020-07-12 14:01:56: ERROR[Main]: stack traceback: 2020-07-12 14:01:56: ERROR[Main]: ...share/minetest/games/minetest_game/mods/default/init.lua:23: in function <...share/minetest/games/minetest_game/mods/default/init.lua:16>

2020-07-12 14:01:56: ERROR[Main]: /usr/share/minetest/builtin/game/register.lua:429: in function </usr/share/minetest/builtin/game/register.lua:413>

As I now look on this again, I see that I had the same Problem on my server which also uses vps_blocker. Cause I use my own game I could just remove the part of default but there should be a better way to fix that

EDIT: Maybe the Problem is caused by the instant kick after the player joined.

sfan5 commented 4 years ago

If you use minetest.kick_player or any other function that disconnects the peer in a on_joinplayer callback you need to ensure that the rest of the callbacks do not run. So what you are seeing is not an engine bug or minetest_game bug.

The joinplayer callback isn't laid out for this (you cannot e.g. return true to stop later callbacks), so I suggest kicking them at the next server step with minetest.after(0, ...)

paramat commented 4 years ago

https://github.com/minetest/minetest_game/pull/2716#issuecomment-657607227 suggests this bug is still happening for @tenplus1

paramat commented 4 years ago

@Lejo1 have you confirmed this was a mod code error in vps_blocker? Are you now able to avoid the crash?

I am re-opening this as 'unconfirmed bug' as tenplus1 is still experiencing this crash, see https://github.com/minetest/minetest_game/pull/2716 , although there is no reproduction by others and this might be a mod coding error as explained above.

Close in a few months if no reproduction by others.

Lejo1 commented 4 years ago

Yes, I was able to fix it. It's just as sfan5 pointed out a "combination of mods crash"

gpcf commented 4 years ago

It still happens on 5.3.0 with the following traceback (MTG also at 5.3.0):

2020-09-16 21:41:03: ERROR[Main]: ServerError: AsyncErr: ServerThread::run Lua: Runtime error from mod 'default' in callback on_joinplayer(): ...games/minetest/games/minetest_game/mods/default/init.lua:23: attempt to index local 'info' (a nil value)
2020-09-16 21:41:03: ERROR[Main]: stack traceback:
2020-09-16 21:41:03: ERROR[Main]:       ...games/minetest/games/minetest_game/mods/default/init.lua:23: in function <...games/minetest/games/minetest_game/mods/default/init.lua:16>
2020-09-16 21:41:03: ERROR[Main]:       /usr/share/games/minetest/builtin/game/register.lua:429: in function </usr/share/games/minetest/builtin/game/register.lua:413
tenplus1 commented 3 years ago

I've checked each and every mod on server and nothing kicks a player on_joinplayer at all, the only kick command is when a player swears and register_on_chat_message does that... The only way to stop getting the above error was the game#2716 pull which was closed by paramat.

Lejo1 commented 3 years ago

I maybe had a similar problem with antispam. Antispam crashed when on_chat_message was called before ob_joinplayer. This is possible with my mtbot but might also happen when clients are laggy.

So the other way around this could mean to you: Someone sweared before on_joinplayer, was kicked for it and the server crashed on_joinplayer😁

Might not be realistic but we should just keep in mind that ob_joinplayer is NOT always the first callback.

paramat commented 3 years ago

The only way to stop getting the above error was the game#2716 pull

It is not the only way, obviously the best way to fix this is in the engine, which has to be tried first, and we have tried already with partial success. The proposed MTG change is a workaround fix.

game #2716 pull which was closed by paramat.

I need to clarify, as this is worded in a possibly accusative way: Although i did technically 'close it', it was disapproved by multiple core devs, just like my own identical PR #2586 was disapproved by multiple core devs earlier. No need to mention who closed it, it was closed by core dev consensus.

As i wrote in the MTG PR, if we get multiple people reporting continuing problems (we do) and we cannot fix this in time for 5.4.0 then we should probably add the workaround fix to MTG.

I have no idea if the 2 people reporting a continuing problem are experiencing a genuine bug or if it is a mod or game error. This issue certainly deserves attention.

sfan5 commented 3 years ago

Here's a patch to builtin that debugs this error by determining whose callback caused the player to disappear:

diff --git a/builtin/game/register.lua b/builtin/game/register.lua
index 1034d4f2b..2c69a2400 100644
--- a/builtin/game/register.lua
+++ b/builtin/game/register.lua
@@ -421,6 +421,7 @@ function core.run_callbacks(callbacks, mode, ...)
                end
        end
        local ret = nil
+       local arg1 = ({...})[1]
        for i = 1, cb_len do
                local origin = core.callback_origins[callbacks[i]]
                if origin then
@@ -448,6 +449,13 @@ function core.run_callbacks(callbacks, mode, ...)
                elseif mode == 5 and cb_ret then
                        return cb_ret
                end
+               if callbacks == core.registered_on_joinplayers and
+                               -- it seems to work once so test twice
+                               (core.get_player_information(arg1:get_player_name()) == nil or
+                               core.get_player_information(arg1:get_player_name()) == nil) then
+                       error("Peer disappeared mid callback, the culprit is " ..
+                                       dump(debug.getinfo(callbacks[i])))
+               end
        end
        return ret
 end

In principle the same issue also applies to any other callback (though then you wouldn't be seeing the crash in minetest_game's on_joinplayer), if the player is kicked then the peer is deleted* and get_player_information stops working immediately. * The player and its ObjectRef are only deleted after a few(?) server steps.

paramat commented 3 years ago

Another report on IRC http://irc.minetest.net/minetest/2020-09-24#i_5735257 Solution http://irc.minetest.net/minetest/2020-09-24#i_5735309 This was caused by a mod.

My guess is that this bug was fixed in the engine and that any problems are caused by use of mods, but it was not easy to track down the guilty mod, so understandably a few users experienced a problem. I have not experienced the bug in MTG or in any of my simple games since the engine fix.

jamesalexanderstevenson commented 3 years ago

hi paramat

it happened to me a long time ago, and so i added this: https://github.com/jamesalexanderstevenson/jastest/blob/master/mods/default/init.lua#L24

bye

On Fri, Sep 25, 2020 at 6:36 PM Paramat notifications@github.com wrote:

Another report on IRC http://irc.minetest.net/minetest/2020-09-24#i_5735257 Solution http://irc.minetest.net/minetest/2020-09-24#i_5735309 This was caused by a mod.

My guess is that this bug was fixed in the engine and that any problems are caused by use of mods, but it was not easy to track down the guilty mod, so understandably a few users experienced a problem. I have not experienced the bug in MTG or in any of my simple games since the engine fix.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/minetest/minetest/issues/9352#issuecomment-699191792, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQE7PPQVZHUFL4VFO23PAQDSHULPZANCNFSM4KOVKTGA .

paramat commented 3 years ago

Yes that is the exact change me and tenplus1 submitted as PRs.

BuckarooBanzay commented 3 years ago

Can confirm, this (still?) happens with 5.3.0 and the unified_inventory mod here (on_joinplayer): https://github.com/minetest-mods/unified_inventory/blob/341a438267765ee69fc12c2b5ce855a4623f3eeb/internal.lua#L293 It happens every other week or so, i don't have a way to reproduce that.

There is no kick'ing happening in the onjoin callbacks, only in the prejoins, so this can't be it...

gpcf commented 3 years ago

it might be related to a CSM crashing while joining.

sfan5 commented 3 years ago

I hope you mean that on_prejoinplayer returns a disconnection reason, because I doubt kick_player works inside there.

BuckarooBanzay commented 3 years ago

I hope you mean that on_prejoinplayer returns a disconnection reason, because I doubt kick_player works inside there.

Yeah, thats what i meant, sorry :yum:

paramat commented 3 years ago

Have those who are reporting this tried the code above to debug it? See https://github.com/minetest/minetest/issues/9352#issuecomment-698584150

paramat commented 3 years ago

https://github.com/minetest/minetest_game/pull/2805 is another MTG workaround fix that should probably be merged if this problem cannot be fixed in the engine before 5.4.0 release. The first was https://github.com/minetest/minetest_game/pull/2586 and identical https://github.com/minetest/minetest_game/pull/2716

sfan5 commented 3 years ago

Is this issue confirmed in 5.4.0-dev?