lgi-devs / lgi

Dynamic Lua binding to GObject libraries using GObject-Introspection
MIT License
434 stars 68 forks source link

Errors in callbacks don't terminate the loop #284

Open sclu1034 opened 2 years ago

sclu1034 commented 2 years ago

As the corocbk.rethrow test shows, errors that happen directly inside an idle source stop the current loop and are returned by loop:run().

However, when the error occurs somewhere nested inside callbacks, the callback chain is aborted, but the error is swallowed. In the following minimal example, if the query_info operation fails, the error will be printed as GLib warning, but the loop will not quit immediately. Only once the timeout hits will the loop stop, but report a "successful" true, nil.

The Guide suggests that these errors should either terminate execution or trigger a pcall, neither of which happens here.

local lgi = require("lgi")
local Gio = lgi.Gio
local GLib = lgi.GLib
local File = Gio.File
local loop = GLib.MainLoop()

local co = coroutine.create(function()
    local f = File.new_for_path("./does_not.exist")
    local ok, err = pcall(f.query_info_async, f, "standard::type", 0, GLib.PRIORITY_DEFAULT, nil, function(_, token)
        local _, err = f:query_info_finish(token)
        assert(err == nil)
        loop:quit()
    end)
    print(ok, err)
end)

GLib.idle_add(GLib.PRIORITY_DEFAULT, co)
GLib.timeout_add(GLib.PRIORITY_DEFAULT, 1000, function()
    loop:quit()
end)

local ok, err = pcall(loop.run, loop)
print(ok, err)

Result:

true    nil

(process:17302): Lgi-WARNING **: 11:46:45.041: Error raised while calling 'lgi.cbk (function: 0x55e197403ea0): Gio': test.lua:11: assertion failed!
true    nil

Related: #124.

psychon commented 2 years ago

Well, I set a breakpoint on longjmp and executed the code from corocbk.lua.

Before the longjmp:

Breakpoint 1, __libc_siglongjmp (env=0x7fffffffe3e8, val=1) at ../setjmp/longjmp.c:30
30  ../setjmp/longjmp.c: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  __libc_siglongjmp (env=0x7fffffffe3e8, val=1) at ../setjmp/longjmp.c:30
#1  0x0000555555560911 in  ()
#2  0x0000555555560510 in  ()
#3  0x000055555555f2d9 in  ()
#4  0x00007ffff7bf09e6 in  () at /usr/lib/x86_64-linux-gnu/lua/5.2/lgi/corelgilua51.so
#5  0x00007ffff7a154cc in  () at /lib/x86_64-linux-gnu/libffi.so.8
#6  0x00007ffff7a15980 in  () at /lib/x86_64-linux-gnu/libffi.so.8
#7  0x00007ffff7a6ebe4 in g_main_dispatch (context=0x555555594ad0) at ../../../glib/gmain.c:3381
#8  g_main_context_dispatch (context=0x555555594ad0) at ../../../glib/gmain.c:4099
#9  0x00007ffff7a6ef88 in g_main_context_iterate (context=0x555555594ad0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4175
#10 0x00007ffff7a6f273 in g_main_loop_run (loop=0x5555555bb760) at ../../../glib/gmain.c:4373
#11 0x00007ffff7a157ea in  () at /lib/x86_64-linux-gnu/libffi.so.8
#12 0x00007ffff7a14923 in  () at /lib/x86_64-linux-gnu/libffi.so.8
#13 0x00007ffff7bf1afc in  () at /usr/lib/x86_64-linux-gnu/lua/5.2/lgi/corelgilua51.so
#14 0x000055555556123e in  ()
#15 0x000055555556c20d in  ()
#16 0x00005555555615e8 in  ()
#17 0x0000555555560b6c in  ()
#18 0x000055555556183f in  ()
#19 0x000055555555ef97 in lua_pcallk ()
#20 0x000055555555b8da in  ()
#21 0x000055555555c5d4 in  ()
#22 0x000055555556123e in  ()
#23 0x00005555555615b2 in  ()
#24 0x0000555555560b6c in  ()
#25 0x000055555556183f in  ()
#26 0x000055555555ef97 in lua_pcallk ()
#27 0x000055555555b67b in  ()
#28 0x00007ffff7c597fd in __libc_start_main (main=
    0x55555555b620, argc=2, argv=0x7fffffffe8f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe8e8) at ../csu/libc-start.c:332
#29 0x000055555555b73a in  ()

After the jump:

(gdb) bt
#0  0x000055555555ef97 in lua_pcallk ()
#1  0x000055555555b8da in  ()
#2  0x000055555555c5d4 in  ()
#3  0x000055555556123e in  ()
#4  0x00005555555615b2 in  ()
#5  0x0000555555560b6c in  ()
#6  0x000055555556183f in  ()
#7  0x000055555555ef97 in lua_pcallk ()
#8  0x000055555555b67b in  ()
#9  0x00007ffff7c597fd in __libc_start_main (main=
    0x55555555b620, argc=2, argv=0x7fffffffe8f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe8e8) at ../csu/libc-start.c:332
#10 0x000055555555b73a in  ()

So, apparently LGI uses longjmp to exit g_main_loop_run. Is that even allowed by GLib?!

For your test case, the warning comes from here: https://github.com/lgi-devs/lgi/blob/4071f902b635d3a7078c65162fce347367b1371d/lgi/callable.c#L1346-L1354

This is the code for calling a function and this seems to turn a Lua error into a GError if the call allows it. Otherwise the error is just printed.

The else case of the above code is for when an error comes from a coroutine and that's the "just longjmp away"-case: https://github.com/lgi-devs/lgi/blob/4071f902b635d3a7078c65162fce347367b1371d/lgi/callable.c#L1373-L1377

So, yeah, dunno what to do here and I doubt what LGI is currently doing is legal / allowed / a good idea...

psychon commented 2 years ago

Oh and for what the guide says: I guess that he guide is outdated for almost eight years now. A little git-digging suggests that #86 changed things so that "call errors" are not simply propagated out. Back then, the coroutine case already existed, but apparently no one though of fixing that so that it also doesn't let the error propagate. And of course no one updated the docs.

sclu1034 commented 2 years ago

86 looks to me like it only added the warning message and that the behaviour didn't change.

That else case happens to do exactly what I need, though, so that might be enough as a workaround for me.

psychon commented 2 years ago

and that the behaviour didn't change.

The change is subtle: Previously, there was a lua_call in line 1176, i.e. an unprotected call. The patch replaces that with a lua_pcall and printing a warning if an error was caught.

so that might be enough as a workaround for me.

I guess you already know that, but just to be sure: To hit that case, you need to provide a coroutine instead of a function as the callback.

(And: I consider this a bug and would want to make the behaviour between both cases the same, but am currently to lazy to do this. Thus, if I ever become non-lazy (I doubt it), your code will break.)