lgi-devs / lgi

Dynamic Lua binding to GObject libraries using GObject-Introspection
MIT License
440 stars 70 forks source link

Error when using UDP socket (libffi7+ specific?) #265

Open v1993 opened 3 years ago

v1993 commented 3 years ago

This is a weird error that seems to occur only with libffi version 7/8, but not version 6 (or could it be another library?). Here's an example that can reproduce it:

local lgi = require 'lgi'
local   GLib,       Gdk,        GObject,        Gio =
        lgi.GLib,   lgi.Gdk,    lgi.GObject,    lgi.Gio

local app = Gio.Application { application_id = 'org.v1993.udpcrash', flags = 'NON_UNIQUE' }

local socket = lgi.Gio.Socket.new('IPV4', 'DATAGRAM', 'UDP')
socket.blocking = false
local sa = lgi.Gio.InetSocketAddress.new(Gio.InetAddress.new_loopback('IPV4'), 3030)
assert(socket:bind(sa, true))

do
    -- To avoid extra allocations
    -- I assume this is long enough
    local buf = require("lgi.core").bytes.new(4096)
    local source = socket:create_source('IN')
    source:set_callback(function()
        print('This will print')
        local len, src = socket:receive_from(buf) -- CRASH!
        print('And this will not')
        if len > 0 then
            print(tostring(buf):sub(1, len))
        end
        return true
    end)

    source:attach(lgi.GLib.MainContext.default())
end

local function exitNormal()
    print('Exiting')
    app:release()
end

function app:on_activate()
    app:hold()
end

GLib.unix_signal_add(GLib.PRIORITY_HIGH, 1, exitNormal)
GLib.unix_signal_add(GLib.PRIORITY_HIGH, 2, exitNormal)
GLib.unix_signal_add(GLib.PRIORITY_HIGH, 15, exitNormal)

app:run({arg[0], ...})

Now, start this simple script and run in another window:

echo 'Hi' > /dev/udp/localhost/3030

This will cause script to crash with following output:

This will print
**
Lgi:ERROR:marshal.c:1197:lgi_marshal_2c_caller_alloc: assertion failed: (size > 0)
Bail out! Lgi:ERROR:marshal.c:1197:lgi_marshal_2c_caller_alloc: assertion failed: (size > 0)
Aborted (core dumped)

This actually causes a crash in my application: https://github.com/v1993/linuxmotehook/issues/5 (thread contains some limited digging into issue), which evidently worked with older versions of stuff. Any clue what's causing it?

psychon commented 3 years ago

Quick first reaction: I need bash -c 'echo 'Hi' > /dev/udp/127.0.0.1/3030' to reproduce this (zsh does not implement /dev/udp and for some reason Lua listens on IPv4, but bash sends using IPv6). With this, I can reproduce the problem.

Now it's time for dinner, sorry.

psychon commented 3 years ago

Oh and: Thanks a lot for this great bug report!

Edit: Shorter reproducer:

local lgi = require 'lgi'
local Gio = lgi.Gio

local socket = lgi.Gio.Socket.new('IPV4', 'DATAGRAM', 'UDP')
--socket.blocking = false
local sa = lgi.Gio.InetSocketAddress.new(Gio.InetAddress.new_loopback('IPV4'), 3030)
assert(socket:bind(sa, true))

print('This will print')
local len, src = socket:receive_from(buf) -- CRASH!
print('And this will not')
v1993 commented 3 years ago

Note: your reduced example seems to never define buf, which (I think) shouldn't be the case. As for "blocking", it does not actually matter for UDP IIRC.

psychon commented 3 years ago

Note: your reduced example seems to never define buf,

And yet it still hits the same assert. Even calling socket:receive_from() does. This made me notice: https://github.com/pavouk/lgi/blob/a3d46f4f3cb1a3c19c61f46b856ee6683a2d57db/lgi/marshal.c#L1193-L1197 Could it be that this only can deal with arguments of type e.g. int[4096], i.e. a buffer of a fixed size? Could this be what the comment refers to?

Actually...The code with the failing assert is trying to allocate a GArray, not a C array. I guess it tries to completely eliminate the [out]-array from the function arguments and turn it into "just a return value".

If I add if(1) return FALSE; to the beginning of lgi_marshal_2c_caller_alloc(), the failed assertion goes away. I didn't yet try if the function actually works correctly / as expected with this change.

Adding an assert(0); to the code in question does not cause a test suite failure, so I am not quite sure why this is here. But it seems weird/fishy and seems to have an incorrect type check. I'll try to look more into it tomorrow.

psychon commented 3 years ago

Random idea was to just skip things here:

diff --git a/lgi/marshal.c b/lgi/marshal.c
index f56b821..fde21a5 100644
--- a/lgi/marshal.c
+++ b/lgi/marshal.c
@@ -1194,7 +1194,18 @@ lgi_marshal_2c_caller_alloc (lua_State *L, GITypeInfo *ti, GIArgument *val,
        elt_size =
          array_get_elt_size (g_type_info_get_param_type (ti, 0), FALSE);
        size = g_type_info_get_array_fixed_size (ti);
-       g_assert (size > 0);
+       // FIXME: This used to be g_assert (size > 0);, but it is
+       // possible that this assert fails, see test
+       // gio.socket_receive_from.
+       // This whole code seems fishy: socket_receive_from() reaches
+       // here with a C array, but this code allocates a GArray.
+       // Most likely a stricter check on the type is required here.
+       // However, I could not find any valid code reaching here and
+       // thus tried to only make "minimal damage".
+       if (size == 0)
+           {
+           break;
+           }

        /* Allocate underlying array.  It is temporary,
           existing only for the duration of the call. */

However, this does not work... Since the buffer argument is [out caller-allocates], the code supposedly tried to completely remove this argument from the argument list and instead allocate its own array.

Calling print(socket:receive_from()) results in the following call according to gdb (with the above patch):

g_socket_receive_from (socket=0x5555557b3280 [GSocket], address=0x7fffffffde18, buffer=0x55555558f2a8 "", size=140737488346664, cancellable=0x0, error=0x7fffffffde98)

So far: No idea where that size comes from, but it seems to be a valid pointer. In hex, it is 0x7fffffffde28, so "almost next to" address.

The print from the above call produces: userdata: 0x556c19392858 Error receiving message: Bad address

So... at least the GError works, but apparently the "normal C return value" was completely lost and instead we get... something.

Sigh. Dunno.

gnulabis commented 3 years ago

And yet it still hits the same assert. Even calling socket:receive_from() does. This made me notice:

https://github.com/pavouk/lgi/blob/a3d46f4f3cb1a3c19c61f46b856ee6683a2d57db/lgi/marshal.c#L1193-L1197

Disclaimer: I know almost nothing about Lua, and certainly nothing about gobject-introspection

But...

I was looking at v1993/linuxmotehook#5 to figure out why that one was not working for me and I came across the same assertion in marshal.c and the comment about only supporting fixed-sized arrays.

The thing is, in other places in the same file, there seems to be some logic to handle the case where g_type_info_get_array_fixed_size() returns a negative value, like this one:

https://github.com/pavouk/lgi/blob/a3d46f4f3cb1a3c19c61f46b856ee6683a2d57db/lgi/marshal.c#L494-L503

So I did a few checks and I saw that the assert is failing because the GITypeInfo *ti that we pass to g_type_info_get_array_fixed_size (ti) is an array indeed, but not of fixed size, so the function returns -1. On the other hand, I tried calling g_type_info_get_array_length(ti) and that one returns 2 (this is with the code from the linuxmotehook project, not your reduced example).

Unfortunately my understanding of the situation stops here. But it seems to me that if we were able to properly allocate based on the information returned by g_type_info_get_array_length(ti), it should work.

v1993 commented 3 years ago

The thing is, it should not be allocating array here at all, it is an output argument. Intended usage is that you pass your own array in and function fills it (returning some other info too).

cipitaua commented 3 years ago

hello, is there ANY workaround? this bug makes linuxmotehook totally unusable!