Closed paramat closed 3 years ago
Odd, now i deleted the first world, started a new one and no error. Another behaviour: After the first error, try re-entering the initial world, sometimes it then works. Another behaviour: No initial error at all.
It seems the error is most likely to happen when entering a world after MT startup, however, reproduction is unreliable.
Well, i have a workaround for my game. If i make the mod that contains the formspec prepend code depend on 'player_api', as MTGame default mod does, no error. But i cannot see why this avoids the error.
Ugh, now i cannot make the error happen without my workaround, so i do not know if the workaround ha sany effect.
Closing until it possibly reoccurrs.
Occurred again in my own game. I think we should make the code in MTG more robust in case this occurs in MTG, also to be a good coding example for other games that are highly likely to copy this code from MTG. So moving back to MTG and reopening.
I can make a PR, my suggestion is:
if info and info.formspec_version > 1 then
EDIT: Occurred again.
It looks like a race condition where the player isn't available during on join player. This regression has happened a few times, although it's usually due to emerging rather than the peer
Shouldn't this be on the engine repo?
Yes i suspected a race condition. Maybe my game calls this a little earlier because it is such a simple game, maybe this is why MTG does not have this issue. And yes this should be in the engine, my mistake.
I suspect this is caused by wrong RTT calculations.
Please try to figure out which of those checks causes a direct return: https://github.com/minetest/minetest/blob/4eb3baa51ed95d7661bbeecbbdf040d51ac32187/src/script/lua_api/l_server.cpp#L174-L185
I ran into this exact issue a few years ago when developing one of my mods, so it definitely appears to be a race condition that has never been addressed.
https://github.com/sorcerykid/auth_rx/blob/e59d77e153413a5b37cced88e4fef0d9d7044eda/init.lua#L74
I discovered the workaround was simply to wait a single server tick via minetest.after(0, func)
, and then to query the player information.
Can confirm, had the same issue, but in a globalstep:
In globalstep, interesting, does the problem occur only at world startup or at any time?
Attach a debugger. Which early return causes the issue? Why does the early return trigger?
Look for error messages in dstream. This function produces logs on error
Reproduced the bug in my game but: No extra info in verbose debug.txt. No error info when using debug build and gdb, 'bt full' (to display full backtrace) results in 'no stack', huh?
Actually, all the various error reports return dstream << FUNCTION_NAME << ": peer was not found" << std::endl;
so the error message would not tell us which ERET function was responsible.
EDIT: Actually, the dstream error message is already documented in the first post.
Ah got it ... by printing "ERET 1/2/3... ok" messages after each ERET.
First 2 ERETs work ok, so the fail is:
ERET(getServer(L)->getClientConInfo(player->getPeerId(), con::AVG_RTT, &avg_rtt))
Which suggests AVG RTT @SmallJoker
I'm experiencing a similar issue using 5.2-dev. However in my case it is calls to minetest.get_connected_players()
.
There are two mods that exercise the bug consistently those are the orienteering and wielded_light mods. To replicate the issue:
I'm experiencing a similar issue using 5.2-dev. However in my case it is calls to
minetest.get_connected_players()
.
sounds like #9428, in any case it would be useful if you could post the backtrace. Also consider opening a separate issue for it.
It seems to be a different issue, since the crash occurs when disconnecting, which should be long after the environment has been initialized. They are also connecting to the server from a different computer, so it's unlikely to be a of race condition which might happen in singleplayer.
Possible fix is: IRC http://irc.minetest.net/minetest-dev/2020-03-05#i_5648091
20:10 Krock > getClientConInfo(player->getPeerId(), con::AVG_RTT, &avg_rtt))
20:10 Krock well, according to paramat's post, the average rtt is broken, and that's a known things
20:10 Krock -s
20:11 Krock and the average rtt defaults to -1 at start IIRC
20:12 Krock connection.h L618
20:12 Krock which results in the return value -1 which is interpreted as error
Changing the default value to -2
would be a simple fix, but doing it properly requires fixing the RTT calculations entirely (see #7428 -> #8183 for side-effects).
I'm experiencing a similar issue using 5.2-dev. However in my case it is calls to
minetest.get_connected_players()
.sounds like #9428, in any case it would be useful if you could post the backtrace. Also consider opening a separate issue for it.
Created separate issue with backtrace https://github.com/minetest/minetest/issues/9474
I tested SmallJoker's diff:
--- a/src/network/connection.h
+++ b/src/network/connection.h
@@ -612,16 +612,16 @@ class Peer {
struct rttstats {
float jitter_min = FLT_MAX;
float jitter_max = 0.0f;
- float jitter_avg = -1.0f;
+ float jitter_avg = -2.0f;
float min_rtt = FLT_MAX;
float max_rtt = 0.0f;
- float avg_rtt = -1.0f;
+ float avg_rtt = -2.0f;
rttstats() = default;
};
rttstats m_rtt;
- float m_last_rtt = -1.0f;
+ float m_last_rtt = -2.0f;
// current usage count
unsigned int m_usage = 0;
... and could not reproduce the error on ~30 tries (previously the error occurred far more often than that). I also printed out the formspec version returned, it was 3.
@SmallJoker wrote in IRC that they would merge the fix as a trivial commit, however i can submit it as a PR if desired, as i have it as a PR branch.
Keep this issue open though for a proper fix later.
23c907befea0
A workaround fix has been made for 5.2.0 release, keeping open for a proper fix later.
This still seems to happen on the linux-forks server, running 5.2.0 of both engine and MTG.
Also observed by @VanessaE on her servers with 5.2.0. Since none(?) of the devs could reproduce this, we'll require some assistance to get this properly fixed.
I retested in 5.2.0 using one of my simple games that previously triggered the bug, but have not reproduced a problem in 5.2.0 yet.
My theory regarding why it's still crashing in 5.2 is that the player actually disappears this time.
Why? l_get_player_information
calls getClientConInfo
which reaches into the connection subsystem to retrieve the RTT values and other things.
Typically the server receives events from the connection code and only acts on (dis)-connects/commands once a server step. But in this case the connection code deletes the peer from its structures as soon as the relevant packet is received, so the peer can disappear at any time while the Lua or server code runs.
As presented by @Lejo1 this issue still have not be fixed as that was done on a 5.3 server with their mod and they said it's this sort of issue. Not the mod itself.
I'm not sure if this is really related to this but it seems so:
2020-07-12 14:01:47: ACTION[Server]: vps_blocker: Kicking bad-ip-player talex [x.x.x.x]
2020-07-12 14:01:47: WARNING[ConnectionSend]: con(4/1) Packet quota used up for peer_id=113, was 256 pkts
2020-07-12 14:01:54: ACTION[Server]: talex [x.x.x.x] joins game. List of players: David FriendlyPerson Pony talex
2020-07-12 14:01:54: ACTION[Server]: vps_blocker: Kicking bad-ip-player talex [x.x.x.x]
2020-07-12 14:01:54: [Server]: static int ModApiServer::l_get_player_information(lua_State*): peer was not found
2020-07-12 14:01:54: ACTION[Main]: Server: Shutting down
2020-07-12 14:01:56: ERROR[Main]: ServerError: AsyncErr: ServerThread::run Lua: Runtime error from mod 'default' in callback on_joinplayer(): ...share/minetest/games/minetest_game/mods/default/init.lua:23: attempt to index local 'info' (a nil value)
2020-07-12 14:01:56: ERROR[Main]: stack traceback: 2020-07-12 14:01:56: ERROR[Main]: ...share/minetest/games/minetest_game/mods/default/init.lua:23: in function <...share/minetest/games/minetest_game/mods/default/init.lua:16>
2020-07-12 14:01:56: ERROR[Main]: /usr/share/minetest/builtin/game/register.lua:429: in function </usr/share/minetest/builtin/game/register.lua:413>
As I now look on this again, I see that I had the same Problem on my server which also uses vps_blocker. Cause I use my own game I could just remove the part of default but there should be a better way to fix that
EDIT: Maybe the Problem is caused by the instant kick after the player joined.
If you use minetest.kick_player
or any other function that disconnects the peer in a on_joinplayer
callback you need to ensure that the rest of the callbacks do not run. So what you are seeing is not an engine bug or minetest_game bug.
The joinplayer callback isn't laid out for this (you cannot e.g. return true to stop later callbacks), so I suggest kicking them at the next server step with minetest.after(0, ...)
https://github.com/minetest/minetest_game/pull/2716#issuecomment-657607227 suggests this bug is still happening for @tenplus1
@Lejo1 have you confirmed this was a mod code error in vps_blocker? Are you now able to avoid the crash?
I am re-opening this as 'unconfirmed bug' as tenplus1 is still experiencing this crash, see https://github.com/minetest/minetest_game/pull/2716 , although there is no reproduction by others and this might be a mod coding error as explained above.
Close in a few months if no reproduction by others.
Yes, I was able to fix it. It's just as sfan5 pointed out a "combination of mods crash"
It still happens on 5.3.0 with the following traceback (MTG also at 5.3.0):
2020-09-16 21:41:03: ERROR[Main]: ServerError: AsyncErr: ServerThread::run Lua: Runtime error from mod 'default' in callback on_joinplayer(): ...games/minetest/games/minetest_game/mods/default/init.lua:23: attempt to index local 'info' (a nil value)
2020-09-16 21:41:03: ERROR[Main]: stack traceback:
2020-09-16 21:41:03: ERROR[Main]: ...games/minetest/games/minetest_game/mods/default/init.lua:23: in function <...games/minetest/games/minetest_game/mods/default/init.lua:16>
2020-09-16 21:41:03: ERROR[Main]: /usr/share/games/minetest/builtin/game/register.lua:429: in function </usr/share/games/minetest/builtin/game/register.lua:413
I've checked each and every mod on server and nothing kicks a player on_joinplayer at all, the only kick command is when a player swears and register_on_chat_message does that... The only way to stop getting the above error was the game#2716 pull which was closed by paramat.
I maybe had a similar problem with antispam. Antispam crashed when on_chat_message was called before ob_joinplayer. This is possible with my mtbot but might also happen when clients are laggy.
So the other way around this could mean to you: Someone sweared before on_joinplayer, was kicked for it and the server crashed on_joinplayer😁
Might not be realistic but we should just keep in mind that ob_joinplayer is NOT always the first callback.
The only way to stop getting the above error was the game#2716 pull
It is not the only way, obviously the best way to fix this is in the engine, which has to be tried first, and we have tried already with partial success. The proposed MTG change is a workaround fix.
game #2716 pull which was closed by paramat.
I need to clarify, as this is worded in a possibly accusative way: Although i did technically 'close it', it was disapproved by multiple core devs, just like my own identical PR #2586 was disapproved by multiple core devs earlier. No need to mention who closed it, it was closed by core dev consensus.
As i wrote in the MTG PR, if we get multiple people reporting continuing problems (we do) and we cannot fix this in time for 5.4.0 then we should probably add the workaround fix to MTG.
I have no idea if the 2 people reporting a continuing problem are experiencing a genuine bug or if it is a mod or game error. This issue certainly deserves attention.
Here's a patch to builtin that debugs this error by determining whose callback caused the player to disappear:
diff --git a/builtin/game/register.lua b/builtin/game/register.lua
index 1034d4f2b..2c69a2400 100644
--- a/builtin/game/register.lua
+++ b/builtin/game/register.lua
@@ -421,6 +421,7 @@ function core.run_callbacks(callbacks, mode, ...)
end
end
local ret = nil
+ local arg1 = ({...})[1]
for i = 1, cb_len do
local origin = core.callback_origins[callbacks[i]]
if origin then
@@ -448,6 +449,13 @@ function core.run_callbacks(callbacks, mode, ...)
elseif mode == 5 and cb_ret then
return cb_ret
end
+ if callbacks == core.registered_on_joinplayers and
+ -- it seems to work once so test twice
+ (core.get_player_information(arg1:get_player_name()) == nil or
+ core.get_player_information(arg1:get_player_name()) == nil) then
+ error("Peer disappeared mid callback, the culprit is " ..
+ dump(debug.getinfo(callbacks[i])))
+ end
end
return ret
end
In principle the same issue also applies to any other callback (though then you wouldn't be seeing the crash in minetest_game's on_joinplayer), if the player is kicked then the peer is deleted* and get_player_information
stops working immediately.
* The player and its ObjectRef are only deleted after a few(?) server steps.
Another report on IRC http://irc.minetest.net/minetest/2020-09-24#i_5735257 Solution http://irc.minetest.net/minetest/2020-09-24#i_5735309 This was caused by a mod.
My guess is that this bug was fixed in the engine and that any problems are caused by use of mods, but it was not easy to track down the guilty mod, so understandably a few users experienced a problem. I have not experienced the bug in MTG or in any of my simple games since the engine fix.
hi paramat
it happened to me a long time ago, and so i added this: https://github.com/jamesalexanderstevenson/jastest/blob/master/mods/default/init.lua#L24
bye
On Fri, Sep 25, 2020 at 6:36 PM Paramat notifications@github.com wrote:
Another report on IRC http://irc.minetest.net/minetest/2020-09-24#i_5735257 Solution http://irc.minetest.net/minetest/2020-09-24#i_5735309 This was caused by a mod.
My guess is that this bug was fixed in the engine and that any problems are caused by use of mods, but it was not easy to track down the guilty mod, so understandably a few users experienced a problem. I have not experienced the bug in MTG or in any of my simple games since the engine fix.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/minetest/minetest/issues/9352#issuecomment-699191792, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQE7PPQVZHUFL4VFO23PAQDSHULPZANCNFSM4KOVKTGA .
Yes that is the exact change me and tenplus1 submitted as PRs.
Can confirm, this (still?) happens with 5.3.0
and the unified_inventory
mod here (on_joinplayer
):
https://github.com/minetest-mods/unified_inventory/blob/341a438267765ee69fc12c2b5ce855a4623f3eeb/internal.lua#L293
It happens every other week or so, i don't have a way to reproduce that.
There is no kick
'ing happening in the onjoin callbacks, only in the prejoins, so this can't be it...
it might be related to a CSM crashing while joining.
I hope you mean that on_prejoinplayer
returns a disconnection reason, because I doubt kick_player
works inside there.
I hope you mean that on_prejoinplayer returns a disconnection reason, because I doubt kick_player works inside there.
Yeah, thats what i meant, sorry :yum:
Have those who are reporting this tried the code above to debug it? See https://github.com/minetest/minetest/issues/9352#issuecomment-698584150
https://github.com/minetest/minetest_game/pull/2805 is another MTG workaround fix that should probably be merged if this problem cannot be fixed in the engine before 5.4.0 release. The first was https://github.com/minetest/minetest_game/pull/2586 and identical https://github.com/minetest/minetest_game/pull/2716
Is this issue confirmed in 5.4.0-dev?
I am on engine commit https://github.com/minetest/minetest/commit/b3c245bb46890d2355ba6456062db65eda475e28 The following happens in singleplayer.
I have just started using the new formspec prepend code:
... in a game of my own https://github.com/paramat/minipeli
Now, when starting a new world, MT shuts down with error:
Relevant engine code: https://github.com/minetest/minetest/blob/9acd36bf9981cf9cb8e82785508d6eb7f010cc77/src/script/lua_api/l_server.cpp#L139
Of course, there is no error when using MTGame, which is odd, i cannot see what is different. If it is a case of dependencies, is there any code in MTGame that must be run before 'minetest.get_player_information()'? The formspec prepend code is highly likely to be copied into new games, so i think this is worth fixing. My game might be exposing a weakness in MTGame code that might cause a bug in MTGame in future.