lgi-devs / lgi

Dynamic Lua binding to GObject libraries using GObject-Introspection
MIT License
435 stars 69 forks source link

Crashes with GTK3 for the last few releases with Lua 5.1 #236

Open Elv13 opened 5 years ago

Elv13 commented 5 years ago

For a while less and less of the AwesomeWM have been working for me. I am not totally sure what the issue is since it involve casting pointers to random (lua held) places and at some point things become invalid. I didn't investigate much either.

Lua 5.1, GTK3 3.24.10, GObject Introspection 1.58.3 and LGI 0.9.0

#!/usr/bin/env lua
local Gtk, class = require('lgi').require('Gtk'), 'client'
Gtk.init()
window = Gtk.Window {default_width=100, default_height=100, title='title'}
swindow.decorated = false
window:set_wmclass(class, class)
local app = Gtk.Application {}
function app:on_activate()
    window.application = self
    window:show_all()
end
app:run {''}
==1298== Memcheck, a memory error detector
==1298== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1298== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==1298== Command: lua /tmp/client.lua
==1298== 
==1298== Invalid read of size 8
==1298==    at 0x4885623: lgi_object_2lua (object.c:353)
==1298==    by 0x4885C5F: object_new (object.c:540)
==1298==    by 0x484D118: luaD_precall (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x4857EAC: luaV_execute (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x484D694: luaD_call (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x484C9DD: luaD_rawrunprotected (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x484D82C: luaD_pcall (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x48494E7: lua_pcall (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x1098ED: docall (in /usr/bin/lua)
==1298==    by 0x10A3B5: pmain (in /usr/bin/lua)
==1298==    by 0x484D118: luaD_precall (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x484D663: luaD_call (in /usr/lib64/liblua.so.5.1.5)
==1298==  Address 0x4 is not stack'd, malloc'd or (recently) free'd
==1298== 
==1298== 
==1298== Process terminating with default action of signal 11 (SIGSEGV)
==1298==  Access not within mapped region at address 0x4
==1298==    at 0x4885623: lgi_object_2lua (object.c:353)
==1298==    by 0x4885C5F: object_new (object.c:540)
==1298==    by 0x484D118: luaD_precall (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x4857EAC: luaV_execute (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x484D694: luaD_call (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x484C9DD: luaD_rawrunprotected (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x484D82C: luaD_pcall (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x48494E7: lua_pcall (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x1098ED: docall (in /usr/bin/lua)
==1298==    by 0x10A3B5: pmain (in /usr/bin/lua)
==1298==    by 0x484D118: luaD_precall (in /usr/lib64/liblua.so.5.1.5)
==1298==    by 0x484D663: luaD_call (in /usr/lib64/liblua.so.5.1.5)
==1298==  If you believe this happened as a result of a stack
==1298==  overflow in your program's main thread (unlikely but
==1298==  possible), you can try to increase the size of the
==1298==  main thread stack using the --main-stacksize= flag.
==1298==  The main thread stack size used in this run was 8388608.
==1298== 
==1298== HEAP SUMMARY:
==1298==     in use at exit: 1,948,758 bytes in 20,509 blocks
==1298==   total heap usage: 42,381 allocs, 21,872 frees, 4,954,491 bytes allocated
==1298== 
==1298== LEAK SUMMARY:
==1298==    definitely lost: 168 bytes in 17 blocks
==1298==    indirectly lost: 0 bytes in 0 blocks
==1298==      possibly lost: 2,360 bytes in 29 blocks
==1298==    still reachable: 1,907,814 bytes in 20,131 blocks
==1298==                       of which reachable via heuristic:
==1298==                         length64           : 2,440 bytes in 37 blocks
==1298==                         newarray           : 1,808 bytes in 33 blocks
==1298==         suppressed: 0 bytes in 0 blocks
==1298== Rerun with --leak-check=full to see details of leaked memory
==1298== 
==1298== For counts of detected and suppressed errors, rerun with: -v
==1298== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
zsh: segmentation fault  valgrind lua /tmp/client.lua

It works fine with a "pure luarocks" stack created with the Lua 5.3 slot (+ LGI 0.9.2). Re-installing the system LGI (0.9.0) with the 5.1 slot brings the breakages back. It used to work fine on 5.1. I always try to use 5.1 because it is the less problematic for development (beside missing unpack declaration). Is .9 the cause or "something changed" in GObject Introspection?

psychon commented 5 years ago
==1298== Invalid read of size 8
==1298==    at 0x4885623: lgi_object_2lua (object.c:353)

Uhm. In my version, that line of code is

*(gpointer *) lua_newuserdata (L, sizeof (obj)) = obj;
  1. How can this dereference a NULL pointer?
  2. How can this dereference a NULL pointer for reading?

Oh, wait. You are saying that this happens with version 0.9.0. Why are you using this particular version? The code is:

object_type (L, G_TYPE_FROM_INSTANCE (obj)); 

So, apparently G_TYPE_FROM_INSTANCE is being called on a null pointer.

I really recommend checking with a newer LGI version first (nothing can be done to change old versions), but then: Could you build LGI with debug symbols and perhaps without optimisations? I can look up how to do that (and how to run a Lua script against an uninstalled LGI version) later if you want.


For your Lua code: attempt to index global 'swindow' (a nil value) Could you fix that? (well, the fix itself is quite obvious, but could you double-check the new code crashes for you?)

After the fix, this does not crash here for me (of course it does not).

Uhm... you seem to have older versions of things.


FYI: I just got back from vacation.

psychon commented 5 years ago

So, apparently G_TYPE_FROM_INSTANCE is being called on a null pointer.

...which is impossible, because the function actually checks if obj is NULL first. So... either there is some memory corruption that breaks the underlying (((GTypeClass*) ((((GTypeInstance*) (obj))->g_class)))->g_type) that this macro expands to, or we have misleading debug information due to compiler optimisations, or... something else.