theFox6 / working_villages

A mod for minetest that adds villagers performing work
https://content.minetest.net/packages/theFox/working_villages/
MIT License
13 stars 10 forks source link

Error in job_thread: attempt to get length of field 'nodedata' (a nil value) #20

Open nfroeschl opened 2 years ago

nfroeschl commented 2 years ago

MineTest 5.4.1

ServerError: AsyncErr: ServerThread::run Lua: Runtime error from mod 'working_villages' in callback luaentity_Step(): ...ds\working_villages\working_villagers/job_coroutines.lua:42: error in job_thread ...mods\working_villages\working_villagers/jobs/builde 2021-08-02 11:47:11: ERROR[Main]: r.lua:41: attempt to get length of field 'nodedata' (a nil value) 2021-08-02 11:47:11: ERROR[Main]: stack traceback: 2021-08-02 11:47:11: ERROR[Main]: [C]: in function 'error' 2021-08-02 11:47:11: ERROR[Main]: ...ds\working_villages\working_villagers/job_coroutines.lua:42: in function 'resume' 2021-08-02 11:47:11: ERROR[Main]: ...4\bin..\mods\working_villages\working_villagers/api.lua:929: in function <...4\bin..\mods\working_villages\working_villagers/api.lua:914>

ketwaroo commented 10 months ago

+1 I get the same error and that world always crashes on load after that.

AsyncErr: Lua: Runtime error from mod 'default' in callback luaentity_Step(): ...ds\working_villages\working_villagers/job_coroutines.lua:42: error in job_thread Runtime error from mod 'default' in callback item_OnPlace(): ...st\bin\..\games\minetest_game\mods\default/functions.lua:730: attempt to index local 'player' (a nil value)
stack traceback:
    ...st\bin\..\games\minetest_game\mods\default/functions.lua:730: in function 'log_player_action'
    ...netest\bin\..\games\minetest_game\mods\default/trees.lua:575: in function 'sapling_on_place'
    ...netest\bin\..\games\minetest_game\mods\default/nodes.lua:734: in function <...netest\bin\..\games\minetest_game\mods\default/nodes.lua:733>
    [C]: in function 'resume'
    ...ds\working_villages\working_villagers/job_coroutines.lua:35: in function 'resume'
    ...t\bin\..\mods\working_villages\working_villagers/api.lua:953: in function <...t\bin\..\mods\working_villages\working_villagers/api.lua:938>: stack traceback:
    [C]: in function 'place_node'
    ...ods\working_villages\working_villagers/async_actions.lua:223: in function 'place'
    ...s\working_villages\working_villagers/jobs/woodcutter.lua:63: in function <...s\working_villages\working_villagers/jobs/woodcutter.lua:43>
stack traceback:
    [C]: in function 'error'
    ...ds\working_villages\working_villagers/job_coroutines.lua:42: in function 'resume'
    ...t\bin\..\mods\working_villages\working_villagers/api.lua:953: in function <...t\bin\..\mods\working_villages\working_villagers/api.lua:938>

and line 42 in question ..

error("error in job_thread " .. ret[2]..": "..debug.traceback(self.job_thread))

the error() call causes the fatal error itself. Maybe there could be a less forceful way to handle the errors. if I comment out that line in job_coroutines.lua things just sort of work. So there might be a way to handle things without crashing the program every time.

sfence commented 10 months ago

Thanks for reporting @nfroeschl and @ketwaroo .

@nfroeschl Bug will be fixed in the next release.

@ketwaroo Can you tell me which version of working_villages you are using? The lines from your logs go to nonfunctional lines in the Lua source file api.lua in working_villages in the master branch and in the last release v1.5.2 too. So I am not able to see what causing the error.

ketwaroo commented 10 months ago

@sfence I'm using whatever version is currently downloaded from inside minetest 5.7.0, it doesn't specifically mention a version. the modpack.conf says release = 11763

It's literally this line in the master branch https://github.com/theFox6/working_villages/blob/master/working_villagers/job_coroutines.lua#L42

calling error(...) throws an exeption/fatal error which crashes the server.

The functions run by coroutine may fail but that may not necessarily crash the server instance. Maybe simply log the error instead of calling error(). I'm not a lua expert but maybe this could provide a hint http://www.lua.org/manual/5.3/manual.html#2.3

Lua code can explicitly generate an error by calling the error function. If you need to catch errors in Lua, you can use pcall or xpcall to call a given function in protected mode.

Another example of that line causing problems, from the mine test content db page for working_villagers: https://content.minetest.net/threads/4272/#reply-12552 you'll notice that it has stack traceback: [C]: in function 'error' ...ds\working_villages\working_villagers/job_coroutines.lua:42: in common.

sfence commented 10 months ago

@ketwaroo Can you share with me your world? it will be helpful if I can reproduce this.

ketwaroo commented 10 months ago

Can you share with me your world?

I don't know how to do that. And not sure that'll help. I have a lot of different mods running. It could literally be from anything. And given the other errors that other users have seen all point to the same line: https://github.com/theFox6/working_villages/blob/master/working_villagers/job_coroutines.lua#L42 Please look at that file.

As I understand it, working_villager mod registers a function for each villager job. that function gets run by coroutine, which is this I'm guessing https://www.lua.org/pil/9.1.html.

if an error happens while that coroutine is running, the working_villager lua script /job_coroutines.lua#L42 crashes the server whether or not the error is actually serious enough to break the game. From my testing, commenting out that line fixed the issue. There are some extra warnings in the debug log that match the same error but it doesn't break the game and villager behaviour doesn't seem affected. It seems like logging the error instead of calling error() would be a better solution? does that make sense?

sfence commented 10 months ago

I am not the author of this part of the code. So I need to get what causes the error.

By the manual, it looks like a coroutine.resume should return three parameters.

Ok. Are you able to add the command print("ret: "..dump(ret)) before line 42 in job_coroutines.lua and send me the error log with this print output?

sfence commented 10 months ago

@ketwaroo It looks like this problem is more depth. Primary fail happens in the log function in the default mod and it is propagated down. So, line 42 in job_subroutines.lua probably does, what should be done.

Please, update working villages to the latest release from content DB and try if the error is still present. The code that caused it has been changed between the last two releases.

If the error is still present, send me the actual error log, please.

ketwaroo commented 10 months ago

It seems we're in very different timezones.

@sfence Please ask another contributor to this project to review my previous comments.

Primary fail happens in the log function in the default mod and it is propagated down.

because a villager performed an action that is usually associated with a player, like planting a sapling. This is a recoverable error. Writing to log should not crash the whole server. That is my point. Please read the documentation of error() in lua and what it does.

~If I get some time, I'll create a PR with a fix myself.~ See #44. forgot I could fork/edit files directly in browser.

sfence commented 10 months ago

@ketwaroo I am afraid that there is only one active maintainer of this mod at this time. The main problem here is this: "\games\minetest_game\mods\default/functions.lua:730: attempt to index local 'player' (a nil value)" If this error happens outside the lua coroutine it will crash the Minetest server. So, from my view, it is the purpose of the original author to propagate this type of error outside of Lua coroutine with an error command. Definitely, I can mask it by replacing error with Minetest.error for example, but it does not make lumberjack placing sapling work again. But who knows what else it can cause?

So my attempt is not to mask the problem source but to solve the problem source.

ketwaroo commented 10 months ago

My point is that you shouldn't have to crash the game for the user if your villager can't perform a non critical action. This is not good UX.

My point is also exactly, that anything could cause it, because almost all other mods and the base game itself assume that a player is the one performing actions. not a villager. It's impossible to debug all other mods in existence.

It does not crash the server. The coroutine runs like a try/catch block. The fact that my local instance was unblocked and my kids have been playing happily since is proof of that.

If your mod just crashes the game all the time, most people will just think it's broken and remove it without bothering trying to fix it. If you log the error and it shows up as red text in the screen corner, they might eventually report it instead.

ketwaroo commented 10 months ago

@sfence oh cool I saw @theFox6 added better logging and a pause call in 1721deabe7fbaca6b952b067d8943297e56e8ca3. I'm going to close my naive PR.

sfence commented 10 months ago

@ketwaroo Ok. He is the author of the code. So I believe that he knows what is he doing.

theFox6 commented 10 months ago

The reason why the error call was there is to prevent leaving minetest in a broken state and instead closing the game. If I had just logged the error in try/catch style it would have written log messages every world step, taking a toll on mintests performance as well as your harddrives. I figured stopping the villager with the error could be enough to prevent further issues. However this might cause players to ignore the issue at hand and just resume the Job, which can lead to unexpected behavior. Just crashing the game everytime anything like that occurs would leave the world unplayable, so providing plenty of crash messages should be a good compromise.

ketwaroo commented 9 months ago

So an update,

my kids discovered mineclone2 and basically abandoned the base game.. however I have not seen a crash since then.

The one surefire way of triggering a crash with the old code was to try to have the villager walk on an unknown node. i.e. a block that was added by another mod then if you disable the mod, it show up as a suspicious block in game (and basically does not exist in minetest.registered_nodes)

Here's what happens now:

image

hopefully picture got attached correctly.

error log for reference:

ERROR[Server]: [working_villages] error in job_thread ...t\bin\..\mods\working_villages\working_villagers/api.lua:497: attempt to index a nil value: stack traceback:
    ...t\bin\..\mods\working_villages\working_villagers/api.lua: in function 'handle_obstacles'
    ...king_villages\working_villagers/jobs/plant_collector.lua:84: in function <...king_villages\working_villagers/jobs/plant_collector.lua:77>
2023-09-14 13:56:12: ERROR[Server]: [working_villages] error in job_thread ...t\bin\..\mods\working_villages\working_villagers/api.lua:497: attempt to index a nil value: stack traceback:
    ...t\bin\..\mods\working_villages\working_villagers/api.lua: in function 'handle_obstacles'
    ...s\working_villages\working_villagers/jobs/woodcutter.lua:76: in function <...s\working_villages\working_villagers/jobs/woodcutter.lua:69>
2023-09-14 14:02:50: ERROR[Main]: Irrlicht: Invalid size of image for texture.

I don't think this is something that this mod needs to address since the issue is cause by another third party mod. Well, it might. if the node isn't registered, it could be assumed to be walkable. Unless the mod that added it get re-enabled, and turns out it was lava, then you have a well done villager.

But yes, not crashing the game is definitely an improvement.

lmaddox commented 7 months ago

the fix is trivial with some nil checks

Screenshot_2023-12-01_05_44_41