pkulchenko / wxlua

wxlua: Lua bindings for wxWidgets cross-platform GUI toolkit; supports Lua 5.1, 5.2, 5.3, 5.4, LuaJIT and wxWidgets 3.x
306 stars 59 forks source link

Segmentation faults on xubuntu 18.04 #20

Closed jjvbsag closed 5 years ago

jjvbsag commented 5 years ago

I think this may be related to #14 What I do is running just a little script loading "wx" and nothing else.

require"wx"

This runs fine and without any problem when executed from within ZeroBraneStudio. Then I change this to add the ZBS path and cpath,

package.path = package.path..";/opt/zbstudio/lualibs/?/?.lua;/opt/zbstudio/lualibs/?.lua;/opt/zbstudio/lualibs/?/?/init.lua;/opt/zbstudio/lualibs/?/init.lua"
package.cpath=package.cpath..";/opt/zbstudio/bin/linux/x64/clibs/?.so;/opt/zbstudio/bin/linux/x64/clibs/lib?.so;/opt/zbstudio/bin/linux/x64/clibs/?.so;/opt/zbstudio/bin/linux/x64/clibs/lib?.so"
require"wx"

Still runs fine from within ZBS. BUT from commandline I get Segmentation fault

I made dumps with

LD_PRELOAD=/lib/x86_64-linux-gnu/libSegFault.so lua only-wx.lua 2>lua-dump.txt
LD_PRELOAD=/lib/x86_64-linux-gnu/libSegFault.so luajit only-wx.lua 2>luajit-dump.txt

please find the results attached: lua-dump.txt luajit-dump.txt

To complete the information, this is on Ubuntu 18.04.1 LTS with Lua 5.1.5 Copyright (C) 1994-2012 Lua.org, PUC-Rio and LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/

jjvbsag commented 5 years ago

@pkulchenko If you can give me some advice, how to debug this to find the cause, I'm willing to try to investigate in this issue... I have already pulled the github project and did - after installing the required packages - a successful build on linux-x64.

jjvbsag commented 5 years ago

@pkulchenko As I just had my VM running, I did also verify this on my minimal VM setup. For ease of use I did a plain install of ZeroBraneStudioEduPack-1.80-linux.sh into /opt/ then run

/opt/zbstudio/bin/linux/x64/lua test-wx.lua

(so now using the lua coming with zbs) and get the same segfaults.

This is quite annoying to me, because all of my wxLua scripts trigger a Ubuntu Send Bug Report Dialog everytime after finishing. :disappointed:

pkulchenko commented 5 years ago

I use os.exit() at the end of my scripts (including ZBS) to avoid this. Not really a solution, but I haven't been able to track the source of this crash down, which seems to happen in GC cleanup in wxlua.

If you want to try debugging/tracing the execution, it may help (if it's still reproducible).

jjvbsag commented 5 years ago

Thank you for the workaround, at least this let me get rid of the Ubuntu Bug Dialogs. :champagne:

ghost commented 5 years ago

Yeah, I also believe there is a connection between issue #14 and #20.

I have examined the SEGV fault on linux with strace comparing the difference with/without the workaround via os.exit(), the problem seems to trigger at the very end on shutdown of the Lua state:

> strace -f luajit -e 'require("wx")'
...
open("/usr/share/pixmaps/Adwaita/index.theme", O_RDONLY) = -1 ENOENT (No such file or directory)
mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x41399000
mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x4122b000
mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x40bbc000
mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x40cbf000
rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTORER|SA_RESTART, 0x7fb59eb4f4b0}, {0x403a10, [INT], SA_RESTORER|SA
munmap(0x40cbf000, 131072)              = 0
munmap(0x40bbc000, 131072)              = 0
munmap(0x4122b000, 131072)              = 0
munmap(0x41399000, 131072)              = 0
munmap(0x411f9000, 131072)              = 0
munmap(0x4069c000, 131072)              = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x4069c37f} ---
+++ killed by SIGSEGV +++
Segmentation fault

> strace -f luajit -e 'require("wx");os.exit()'
...
open("/usr/share/pixmaps/Adwaita/index.theme", O_RDONLY) = -1 ENOENT (No such file or directory)
mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x40b0d000
mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x40dee000
mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x4125c000
mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x409ef000
exit_group(0)                           = ?
+++ exited with 0 +++

I guess the workaround is skipping the lua_close call in this case (in certain cases os.exit() may call lua_close, see Lua source). But as the behaviour on other OS types suggest, the problem occurs much earlier. Fortunately I got valgrind to tell us some things (see full log valgr.txt ):

> valgrind luajit -e require("wx")
...
 Invalid read of size 4
    at 0x405E56: ??? (in /home/spr3746/dev/prereq/deps_lnx64/bin/luajit-2.0.5)
    by 0x40A37A: lua_pushlstring (in /home/spr3746/dev/prereq/deps_lnx64/bin/luajit-2.0.5)
    by 0x417E2B: luaL_findtable (in /home/spr3746/dev/prereq/deps_lnx64/bin/luajit-2.0.5)
    by 0x417FBE: luaL_openlib (in /home/spr3746/dev/prereq/deps_lnx64/bin/luajit-2.0.5)
    by 0x61EC622: wxLuaBinding::RegisterBinding(wxLuaState const&) (in /home/spr3746/oslibs/lnx64/wx.so)
    by 0x61FC3AF: wxLuaBinding_wxlua::RegisterBinding(wxLuaState const&) (in /home/spr3746/oslibs/lnx64/wx.so)
    by 0x61EC35B: wxLuaBinding::RegisterBindings(wxLuaState const&) (in /home/spr3746/oslibs/lnx64/wx.so)
    by 0x61FAEE2: wxLuaState::Create(lua_State*, int) (in /home/spr3746/oslibs/lnx64/wx.so)
    by 0x613230A: luaopen_wx (in /home/spr3746/oslibs/lnx64/wx.so)
...

I have the impression that the problem lies somewhere here in the source (luaL_Register is mapped to luaL_openlib somewhere):

wxLua/modules/wxlua/wxlbind.cpp:
...
    // Let Lua create a new table for us and add it to these places.
    // We use an empty luaL_Reg since we just want luaL_register to create the
    // tables for us, but we want to install the elements ourselves since
    // wxLua is too large to follow the luaL_register method without being
    // wasteful of memory and slow down the initialization.
    //    LUA_REGISTRYINDEX["_LOADED"][m_nameSpace] = table
    //    LUA_GLOBALSINDEX[m_nameSpace] = table
    //    LUA_GLOBALSINDEX["package"]["loaded"][m_nameSpace] = table
    static const luaL_Reg wxlualib[] = { {NULL, NULL} };

    wxLuaState::luaL_Register(L, wx2lua(m_nameSpace), wxlualib);

    // luaL_register should have given an error message about why it couldn't
    // create the table for us
    if (!lua_istable(L, -1))
    {
        lua_pop(L, 1); // pop the nil value
        return false;
    }

It seems like wxLua is using a "shortcut" to register the libs here. Unfortunately I am not into C++ and need some help from here on.

@jjvbsag: Are you into C++? That would be cool!

I would be really glad if we could fix this at the root.

jjvbsag commented 5 years ago

@chymanfx Yes, I am (also) in C++, mainly doing embedded software development (about 30 years of experience) Also did recently a huge interfacing from lua to an OpcUa-Client library written in C++.

But I'm short of time. My job is at 150% brain-load in the moment and beside that I'm private into music (2 Bands, 1 Orchestra and 1 Gospelchoir) . So I have to seek, if I can spend time on this. But: Thanks for your investigations, I will have a look at it ASAP.

ghost commented 5 years ago

@jjvbsag Thank you, I will try to narrow it down as good as I can for prepare.

So far I found that the corruption occurs during the wx.so library loading steps. During the initialization the line 832 of wxLua/modules/wxlua/wxlbind.cpp is called a number of times (e.g. for bit32, wx, wxlua):

wxLuaState::luaL_Register(L, wx2lua(m_nameSpace), wxlualib);

One of these times the 2nd argument to luaL_Register receives a corrupt string for the libname from wx2lua(). This leads to at least 4 of the notes in the valgrind run. The faulty string gets copied into the lua state via lua_pushlstring and likely corrupts the lua_status in the lua state, leading to the SEGV.

I will try to narrow down by experiments which of the calls to luaL_Register is causing this.

pkulchenko commented 5 years ago

I'd be interested in the results as well. Here is wxString documentation and details on various wxString conversions.

pkulchenko commented 5 years ago

I wonder if this is relevant: "It should be noted that on destruction wxCharBuffer and its wide character variant delete the c string that hold onto. If you want to get the pointer to the buffer and don't want wxCharBuffer to delete it on destruction, use the member function release to do so." (from https://docs.wxwidgets.org/3.1/overview_bufferclasses.html), as "Invalid read of size 4" message usually points to a reuse of already released memory.

ghost commented 5 years ago

Unfortunately I didn't find a way to influence the LUA_REGISTRYINDEX["_LOADED"] through the lua language to pretend single sub modules like "bit32" are already loaded. But I know that luaL_register/luaL_Register is called 3 times, for: "bit32", "wxlua" and "wx" in that order and one of these three calls create the corruption:

lbitlib.c   (375): luaL_register(L, "bit32", bitlib);     -- this one I feel is quite unlikely the reason
wxlbind.cpp (832): luaL_Register(L, wx2lua(m_nameSpace), wxlualib);  -- with wxT("wxlua") as m_nameSpace
wxlbind.cpp (832): luaL_Register(L, wx2lua(m_nameSpace), wxlualib);  -- with wxT("wx") as m_nameSpace

Guess I don't get any further with my little Ansi C background... :|

jjvbsag commented 5 years ago

For test I have added a little patch into build-linux.sh

perl -pi \
  -e 'print("fprintf(stderr,\"luaI_openlib(libname=\\\"%s\\\")\\n\",libname);") if $.==244;' \
  src/lauxlib.c

this whill show all calls to luaI_openlib (this is where luaL_register finally goes) and then after rebuilding ZBS run

ZBS=/home/GitHub/pkulchenko/ZeroBraneStudio/
cd ${ZBS}
export LUA_PATH="${ZBS}/lualibs/?/?.lua;${ZBS}/lualibs/?.lua;${ZBS}/lualibs/?/?/init.lua;${ZBS}/lualibs/?/init.lua;;"
export LUA_CPATH="${ZBS}bin/linux/x64/clibs/?.so;${ZBS}bin/linux/x64/clibs/lib?.so;${ZBS}bin/linux/x64/clibs/?.so;${ZBS}bin/linux/x64/clibs/lib?.so;;"
/home/GitHub/pkulchenko/ZeroBraneStudio/bin/linux/x64/lua -e'require"wx"'

The segfault occurs, but the output of my fprintf seems to be fine

luaI_openlib(libname="_G")
luaI_openlib(libname="coroutine")
luaI_openlib(libname="package")
luaI_openlib(libname="(null)")
luaI_openlib(libname="table")
luaI_openlib(libname="(null)")
luaI_openlib(libname="io")
luaI_openlib(libname="os")
luaI_openlib(libname="string")
luaI_openlib(libname="math")
luaI_openlib(libname="debug")
luaI_openlib(libname="bit")
luaI_openlib(libname="bit32")
luaI_openlib(libname="wxlua")
luaI_openlib(libname="wxlua")
luaI_openlib(libname="wx")
luaI_openlib(libname="wx")
luaI_openlib(libname="wx")
luaI_openlib(libname="wx")
luaI_openlib(libname="wx")
luaI_openlib(libname="wxaui")
luaI_openlib(libname="wx")
luaI_openlib(libname="wxstc")

I cannot see any corrupted argument to luaI_openlib. I will try to investigate further...

jjvbsag commented 5 years ago

Ok, next test Changed the test script to use efence

export LD_PRELOAD=libefence.so.0.0
/home/GitHub/pkulchenko/ZeroBraneStudio/bin/linux/x64/lua -e'require"wx"'

Now the output is:

  Electric Fence 2.2 Copyright (C) 1987-1999 Bruce Perens <bruce@perens.com>
luaI_openlib(libname="_G")
luaI_openlib(libname="coroutine")
luaI_openlib(libname="package")
luaI_openlib(libname="(null)")
luaI_openlib(libname="table")
luaI_openlib(libname="(null)")
luaI_openlib(libname="io")
luaI_openlib(libname="os")
luaI_openlib(libname="string")
luaI_openlib(libname="math")
luaI_openlib(libname="debug")
luaI_openlib(libname="bit")
luaI_openlib(libname="bit32")
luaI_openlib(libname="wxlua")
luaI_openlib(libname="wxlua")
luaI_openlib(libname="wx")

***MEMORY-ERROR***: lua[3755]: GSlice: assertion failed: aligned_memory == (gpointer) addr
/pkulchenko/ZeroBraneStudio/bin/linux/x64/lua -e'require"wx"'
>Exit code: 134

so the memory corruption is indeed cause in wx somewhere after the first call with "wx" as libname.

ghost commented 5 years ago

I looked a bit deeper into gdb this weekend and got some interesting new findings:

The "Invalid read of size 4" in the valgrind output occurs after luaL_openlib is called with "wx". That is also confirmed by your findings with Electric Fence, @jjvbsag: thank you!

But that is not the reason for the SEGV fault. I have watched the changes to the Lua state memory in gdb and the above fault doesn't interfere with the Lua state memory. Much later, on shutdown, however, shortly before the end I got a gdb backtrace where Lua closes its state cleaning up everything including its own state (L):

>>> bt
#0  0x0000000000422601 in gc_sweep (g=g@entry=0x400003b8, p=0x40000378, p@entry=0x400003e8, lim=2147477677, lim@entry=2147483392) at lj_gc.c:400
#1  0x00000000004236fb in lj_gc_freeall (g=g@entry=0x400003b8) at lj_gc.c:567
#2  0x000000000040872c in close_state (L=0x40000378) at lj_state.c:159
#3  0x0000000000408ca5 in lua_close (L=<optimized out>) at lj_state.c:258
#4  0x00000000004039b8 in main (argc=3, argv=<optimized out>) at luajit.c:568

So after that the Lua state (L) is completely freed, but then suddenly there is a completely new gdb backtrace when the SEGV fault occurs:

Program received signal SIGSEGV, Segmentation fault.
lua_status (L=0x40000378) at lj_api.c:82
82    return L->status;
>>> bt
#0  lua_status (L=0x40000378) at lj_api.c:82
#1  0x00007ffff6b34224 in wxLuaStateRefData::CloseLuaState(bool) () from /home/.../wx.so
#2  0x00007ffff6b34497 in wxLuaState::Destroy() () from /home/.../wx.so
#3  0x00007ffff6a6ca0e in wxLuaState::~wxLuaState() () from /home/.../wx.so
#4  0x00007ffff7323ff8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007ffff7324045 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007ffff730a837 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x0000000000403a19 in _start ()

That is the "Invalid read of size 1" in the valgrind output. Here the "L->status" is accessed although L is already gone. For some reason something additional is triggered for cleanup. So in essence the Lua state is tried to be cleaned twice. That should not happen and I think the problem is here:

wxLua/modules/wxlua/wxlstate.cpp:
...
402   bool wxLuaStateRefData::CloseLuaState(bool force)
403   {
404       if ((m_lua_State == NULL) || m_wxlStateData->m_is_closing || m_lua_State_coroutine)
405           return true;
406   
407       if (lua_status(m_lua_State) != 0) // lua state is not LUA_OK
408   return true;
...

Shouldn't line 404 return true here when the Lua state is already gone?

Can anyone explain the background for this or why it doesn't return true here?

ghost commented 5 years ago

Does anyone know where the "_start" thread comes from after "main" has been finished?

pkulchenko commented 5 years ago

@chymanfx, thank you for taking look at this; very useful!

Shouldn't line 404 return true here when the Lua state is already gone?

I think it should, but I suspect m_lua_State is not set to NULL, so the check doesn't work.

I wonder if something like this would work better:

diff --git a/wxLua/modules/wxlua/wxlstate.cpp b/wxLua/modules/wxlua/wxlstate.cpp
index 24fe904..56f0d3f 100644
--- a/wxLua/modules/wxlua/wxlstate.cpp
+++ b/wxLua/modules/wxlua/wxlstate.cpp
@@ -401,10 +401,10 @@ wxLuaStateRefData::~wxLuaStateRefData()

 bool wxLuaStateRefData::CloseLuaState(bool force)
 {
-    if ((m_lua_State == NULL) || m_wxlStateData->m_is_closing || m_lua_State_coroutine)
+    if (m_lua_State == NULL || lua_status(m_lua_State) != 0) // lua state is not LUA_OK
         return true;

-    if (lua_status(m_lua_State) != 0) // lua state is not LUA_OK
+    if (m_wxlStateData->m_is_closing || m_lua_State_coroutine)
         return true;

     m_wxlStateData->m_is_closing = true;

wxLuaStateRefData::CloseLuaState should set m_lua_State to NULL, but it looks like it may not be happening for some reason (as there are several returns before that assignment is made), so I wouldn't be surprised if there is some flaw in the logic.

Since you have it all set up already, it may be useful to set a breakpoint at the beginning of CloseLuaState and at the end to see when it's called to close and when it doesn't get to the end (to set the state to NULL).

Does anyone know where the "_start" thread comes from after "main" has been finished?

I don't; I don't see anything in wxlua or wxwidgets that references _start.

pkulchenko commented 5 years ago

On the second thought, the patch is not likely to help, as it seems to be failing in lua_status(m_lua_State) call, as the lua state is already in the "wrong" state.

I wonder why it's closed, but not set to NULL...

pkulchenko commented 5 years ago

@chymanfx, can you try commenting out lines 407-408? They were added to address sigserv on Linux with a "regular" Lua, but maybe it "breaks" something with LuaJIT. See the ticket and the discussion in it, as it seems to be relevant to this one: https://sourceforge.net/p/wxlua/bugs/36/

ghost commented 5 years ago

Thank you for looking into this, the issue discussed at sourceforge looks indeed very similar. That is interesting.

The thing is that m_lua_State lies in/points to a memory area that has been freed already by lua_close in the main thread. Subsequent accesses will also fail. So I guess commenting out lines 407-408 will just move the SEGV to the next access of m_lua_State in that function.

I think the best would be to prevent _start from being called. I suspect that _start comes from a signal handler. See the line from my post from Jan 7:

rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTORER|SA_RESTART, 0x7fb59eb4f4b0}, {0x403a10, [INT], SA_RESTORER|SA...

It is the first line difference between with and without os.exit(). The question is, where is that handler registered?

ghost commented 5 years ago

I could lift the mist around the _start call: it is the standard way of glibc to start executables, so in essence it is a wrapper around main and nothing special.

That means for our issue here, that after lua_state has been closed by main in the Lua REPL, glibc calls exit() which calls all C++ destructors including wxLuaStateRefData::CloseLuaState that wants to close the lua_state a second time.

The issues discussed at sourceforge is exactly the same issue and is independent of the Lua version used, and the patch proposed there does not solve the problem as we see here. It still accesses a freed address.

The wxlua destructor closing the lua_state makes in my eyes only sense when using wxlua as a standalone application but not when using it as a library.

A clean solution would be to close the lua_state just once. But I have no idea how to implement that.

pkulchenko commented 5 years ago

That means for our issue here, that after lua_state has been closed by main in the Lua REPL, glibc calls exit() which calls all C++ destructors including wxLuaStateRefData::CloseLuaState that wants to close the lua_state a second time.

I think it all makes sense, but then the question is why m_lua_State == NULL check doesn't work, since the state should be set to NULL when it's closed the first time. Can you set a breakpoint on line 489 (m_lua_State = NULL) to see if it's reached the first time?

pkulchenko commented 5 years ago

@chymanfx, I don't see your last comment in the ticket (which I received in the notification), so copying it here just in case, along with my response:


Probably my description was not clear enough, I will be more precise:

When using wxlua as a module, on shutdown, the first time the Lua state is closed in the Lua REPL, e.g. in lua-5.1/src/lua.c:

int main (int argc, char **argv) {
int status;
struct Smain s;
lua_State L = lua_open(); / create state */
if (L == NULL) {
l_message(argv[0], "cannot create state: not enough memory");
return EXIT_FAILURE;
}
s.argc = argc;
s.argv = argv;
status = lua_cpcall(L, &pmain, &s);
report(L, status);
lua_close(L); // <----- here
return (status || s.status) ? EXIT_FAILURE : EXIT_SUCCESS;
}

On this close of Lua state 'm_lua_State' is not set to NULL, but now points to freed space. After main returns in the following line, glib calls exit() (5), which calls the exit handlers (4). The exit handlers all the c++ destructors including (3):

#0 lua_status (L=0x40000378) at lj_api.c:82
#1 0x00007ffff6b34224 in wxLuaStateRefData::CloseLuaState(bool) () from /home/.../wx.so
#2 0x00007ffff6b34497 in wxLuaState::Destroy() () from /home/.../wx.so
#3 0x00007ffff6a6ca0e in wxLuaState::~wxLuaState() () from /home/.../wx.so
#4 0x00007ffff7323ff8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007ffff7324045 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x00007ffff730a837 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x0000000000403a19 in _start ()

CloseLuaState(bool) (1) is called the first and only time with 'm_lua_State' pointing to freed space, and boom: SEGV.


I agree, it looks like the state is closed twice (once outside of wxlua and once in the CloseLuaState call), so there is not much we can do about it. I'm thinking that the only way to avoid this would be to remove line 397: CloseLuaState(true); from wxLuaStateRefData::~wxLuaStateRefData() call. This obviously will remove all associated processing, but maybe it doesn't matter at that point.

Another option may be to leverage m_lua_State_static/wxLUASTATE_STATICSTATE value, but I don't see a way to set it for the existing state.

ghost commented 5 years ago

Thank you for formating my post, I had issues with my browser version and github, so I deleted it again to post it later nicely.

Yes, I see. Ideally the distinction between existing state (use as a module) and self created state (standalone use) should be done somewhere in wxLuaState::Create, but I have not found anything close to that around there.

For last resort I also see to remove the CloseLuaState(true); call.

ghost commented 5 years ago

Yeah, 'm_lua_State_static' is exactly what we need! Umm, how about changing line 735:

modules/wxlua/wxlstate.cpp: 734 // we don't want recursion in UnRef and wxlua_garbageCollect 735 if (GetRefData()->GetRefCount() == 1) 736 M_WXLSTATEDATA->CloseLuaState(true);

to:

735 if (M_WXLSTATEDATA->m_lua_State_static == false && GetRefData()->GetRefCount() == 1)

What do you think?

pkulchenko commented 5 years ago

I think it does make sense, but then the same check should be applied on line 397 as well (for CloseLuaState(true);) and if that's the case, we can simply add the check inside CloseLuaState call itself. In fact, it's already there (on line 472), but it happens too late to affect the issue we see.

Are you sure it's going to help, as it will only work if m_lua_State_static is set to true (as in the case of false it's not going to make any difference) and I'm not sure how that value is set in the "regular" wxlua case.

ghost commented 5 years ago

Line 397 has the check already few lines above, so I think my suggestion should be sufficient.

Line 472 makes me wonder, if a better way would be to just call wxLuaState:Destroy() in the Lua code after MainLoop() returns, this way the Lua state would be spared for the REPL lua_close() later on. But I can only find an empty table at: wxlua.wxLuaState

Do you know how to call it?

ghost commented 5 years ago

I tried two changes, but both did not solve the issue:

735 if (M_WXLSTATEDATA->m_lua_State_static == false && GetRefData()->GetRefCount() == 1)

0x00007ffff6b58754 in wxLuaStateRefData::CloseLuaState(bool) () from /home/.../wx.so 0x00007ffff6b596f5 in wxLuaStateRefData::~wxLuaStateRefData() () from /home/.../wx.so 0x00007ffff6b5974b in wxLuaStateRefData::~wxLuaStateRefData() () from /home/.../wx.so 0x00007ffff6d3e188 in wxObject::UnRef() () from /home/.../wx.so 0x00007ffff6a90f3e in wxLuaState::~wxLuaState() () from /home/.../wx.so 0x00007ffff7323ff8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 0x00007ffff7324045 in exit () from /lib/x86_64-linux-gnu/libc.so.6

Here the UnRef() below calls CloseLuaState() later on.

--

732 if (M_WXLSTATEDATA->m_lua_State_static || (m_refData == NULL)) return;

0x00007ffff6b589b2 in wxLuaState::Destroy() () from /home/.../lib/wx.so 0x00007ffff6b58f3d in wxLuaState::Create(lua_State*, int) () from /home/.../lib/wx.so 0x00007ffff6a9083b in luaopen_wx () from /home/.../lib/wx.so ... 0x000000000040d786 in lua_cpcall (L=L@entry=0x40000378, func=func@entry=0x404450 , ud=ud@entry=0x0) at lj_api.c:1074 0x00000000004039a4 in main (argc=3, argv=0x7fffffffe5a8) at luajit.c:566

Here Destroy() fails on state creation already.

pkulchenko commented 5 years ago

Could you try commenting out CloseLuaState(true); on line 397 instead? I think it should resolve the issue and I'm not sure what's going to be harmed by not calling it. I think if the apps needs it, it can make the explicit call, as it's part of the public API.

ghost commented 5 years ago

I tried commenting out CloseLuaState(true); on line 397, but that was not sufficient. I also had to comment out line 145 in modules/wxlua/wxlstate.h:

145 // virtual ~wxLuaState() { Destroy(); }

Both together now avoid the SEGV on shutdown and > luajit -e 'require("wx")' now returns fine. Thank you for your input and help!

pkulchenko commented 5 years ago

@chymanfx, thank you for the update. I'm curious, if you only comment out the line 145, does it fix the issue? I plan on applying this fix, but want to make sure that only the minimally sufficient changes will be applied.

ghost commented 5 years ago

Commenting out line 145 was actually my first try, but unfortunately that isn't sufficient either. There must be other destructors that call CloseLuaState() through line 397.

sonoro1234 commented 5 years ago

I am using wxLua in a lanes lane so the os.exit() workaround is not feasible for me (other lanes would be closed) I am not able to debug the SIGSEGV as it happens after wxLuaStateRefData::CloseLuaState

Thread 1 (Thread 3252.0x1b20):
#0  0x00e65bf0 in ?? ()
#1  0x76007451 in USER32!CallNextHookEx () from C:\Windows\syswow64\user32.dll
#2  0x75ff80a9 in USER32!GetUserObjectInformationW () from C:\Windows\syswow64\user32.dll
#3  0x00030000 in ?? ()
#4  0x75ff8ba1 in USER32!RegisterClassW () from C:\Windows\syswow64\user32.dll
#5  0x77c2013a in ntdll!KiUserCallbackDispatcher () from C:\Windows\SysWOW64\ntdll.dll
#6  0x0028fc48 in ?? ()
#7  0x76000769 in USER32!PeekMessageW () from C:\Windows\syswow64\user32.dll
#8  0x753b602c in TF_GetInputScope () from C:\Windows\syswow64\msctf.dll
#9  0x753b61da in TF_GetInputScope () from C:\Windows\syswow64\msctf.dll
#10 0x753b5b18 in TF_GetInputScope () from C:\Windows\syswow64\msctf.dll
#11 0x77c493c5 in ntdll!RtlIsCurrentThreadAttachExempt () from C:\Windows\SysWOW64\ntdll.dll
#12 0x0028edf8 in ?? ()
#13 0x77c68efe in ntdll!LdrShutdownProcess () from C:\Windows\SysWOW64\ntdll.dll
#14 0x00455290 in ?? ()
#15 0x77c68e4a in ntdll!RtlExitUserProcess () from C:\Windows\SysWOW64\ntdll.dll
#16 0x76f67a3d in KERNEL32!ExitProcess () from C:\Windows\syswow64\kernel32.dll
#17 0x00000000 in ?? ()

Only could check that os.exit() after wx.wxGetApp():MainLoop() avoid the calls to wxLuaStateRefData::CloseLuaState to happen. But may be more functions are avoided (as just returning true at the begining of wxLuaStateRefData::CloseLuaState doesnt avoid the SIGSEGV )

Just checked that calling os.exit() at the end of the main lane (not the wxLua lane) avoids calling wxLuaStateRefData::CloseLuaState and the SIGSEGV

pkulchenko commented 5 years ago

@sonoro1234, can you try the following patch, as suggested by @chymanfx?

diff --git a/wxLua/modules/wxlua/wxlstate.cpp b/wxLua/modules/wxlua/wxlstate.cpp
index 24fe904..8a5264d 100644
--- a/wxLua/modules/wxlua/wxlstate.cpp
+++ b/wxLua/modules/wxlua/wxlstate.cpp
@@ -394,7 +394,7 @@ wxLuaStateRefData::~wxLuaStateRefData()
     wxCHECK_RET((m_lua_State_static == true) || (m_lua_State == NULL),
                 wxT("You must ALWAYS call wxLuaState::Destroy and not wxObject::UnRef"));

-    CloseLuaState(true);
+    // CloseLuaState(true);
     if (m_own_stateData)
         delete m_wxlStateData;
 }
diff --git a/wxLua/modules/wxlua/wxlstate.h b/wxLua/modules/wxlua/wxlstate.h
index 1d8e628..0466745 100644
--- a/wxLua/modules/wxlua/wxlstate.h
+++ b/wxLua/modules/wxlua/wxlstate.h
@@ -142,7 +142,7 @@ public:

     // ALWAYS Destroy() the wxLuaState instead of calling UnRef(), else circular
     //  destruction since ref count goes to 0 before actually destroying the lua_State
-    virtual ~wxLuaState() { Destroy(); }
+    // virtual ~wxLuaState() { Destroy(); }

     // -----------------------------------------------------------------------

If this works, it shouldn't require os.exit() call.

sonoro1234 commented 5 years ago

@pkulchenko Tested the patch with editor.wx.lua still getting SIGSEGV on exit

also 5 threads begun and only 4 exit

Starting program: C:\supercolliderrepos\build_Lua2SCdebug\install\luajit.exe C:\supercolliderrepos\Lua2SC\wxLu
aBundled\wxlua\wxLua\samples\editor.wx.lua
[New Thread 5620.0x15ec]
[New Thread 5620.0x15e8]
[New Thread 5620.0x1608]
[New Thread 5620.0x160c]
[New Thread 5620.0x1610]
[Thread 5620.0x1610 exited with code 0]
[Thread 5620.0x160c exited with code 0]
[Thread 5620.0x1608 exited with code 0]
[Thread 5620.0x15e8 exited with code 0]

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00d95be0 in ?? ()
(gdb) bt
#0  0x00d95be0 in ?? ()
#1  0x76257451 in USER32!CallNextHookEx () from C:\Windows\syswow64\user32.dll
#2  0x762480a9 in USER32!GetUserObjectInformationW () from C:\Windows\syswow64\user32.dll
#3  0x00030000 in ?? ()
#4  0x76248ba1 in USER32!RegisterClassW () from C:\Windows\syswow64\user32.dll
#5  0x77e6013a in ntdll!KiUserCallbackDispatcher () from C:\Windows\SysWOW64\ntdll.dll
#6  0x0028fc48 in ?? ()
#7  0x76250769 in USER32!PeekMessageW () from C:\Windows\syswow64\user32.dll
#8  0x766d602c in TF_GetInputScope () from C:\Windows\syswow64\msctf.dll
#9  0x766d61da in TF_GetInputScope () from C:\Windows\syswow64\msctf.dll
#10 0x766d5b18 in TF_GetInputScope () from C:\Windows\syswow64\msctf.dll
#11 0x77e893c5 in ntdll!RtlIsCurrentThreadAttachExempt () from C:\Windows\SysWOW64\ntdll.dll
#12 0x0028edf8 in ?? ()
#13 0x77ea8efe in ntdll!LdrShutdownProcess () from C:\Windows\SysWOW64\ntdll.dll
#14 0x00415290 in ?? ()
#15 0x77ea8e4a in ntdll!RtlExitUserProcess () from C:\Windows\SysWOW64\ntdll.dll
#16 0x77467a3d in KERNEL32!ExitProcess () from C:\Windows\syswow64\kernel32.dll
#17 0x00000000 in ?? ()
pkulchenko commented 5 years ago

@sonoro1234, since I'm applying other changes, I'd like to apply this fix as well, but it seems like it may not be working for your case. How do you start 5 threads where only 4 is finishing?

sonoro1234 commented 5 years ago

Because thread 5620.0x15ec receives SIGSEGV instead of exiting?

pkulchenko commented 5 years ago

Right, but why is there 5 threads? I don't see anything in the code that would start 5 threads...

sonoro1234 commented 5 years ago

I dont see CreateThread or pthread_create in the source.

I just did C:\supercolliderrepos\build_Lua2SC\install>C:\i686-7.2.0-release-posix-dwarf-rt_v5rev1\mingw32\bin\gdb.exe --args "luajit.exe C:\supercolliderrepos\Lua2SC\wxLuaBundled\wxlua\wxLua\samples\editor.wx.lua"

and then run

May be system functions called by wxWidgets are creating threads, dont you see them in gdb? If I do do return end after the first toolBar:AddTool I get three threads but if I do it before I get only one.

pkulchenko commented 5 years ago

@chymanfx, @jjvbsag, @sonoro1234, I may have a better patch for this issue; at least it eliminates the crash I saw on Windows.

diff --git a/wxLua/modules/wxlua/wxlstate.cpp b/wxLua/modules/wxlua/wxlstate.cpp
index 24fe904..42b7c34 100644
--- a/wxLua/modules/wxlua/wxlstate.cpp
+++ b/wxLua/modules/wxlua/wxlstate.cpp
@@ -729,7 +729,7 @@ bool wxLuaState::IsOk() const

 void wxLuaState::Destroy()
 {
-    if (m_refData == NULL) return;
+    if (m_refData == NULL || M_WXLSTATEDATA->m_lua_State_static) return;

     // we don't want recursion in UnRef and wxlua_garbageCollect
     if (GetRefData()->GetRefCount() == 1)
@@ -741,6 +741,8 @@ void wxLuaState::Destroy()
 bool wxLuaState::CloseLuaState(bool force)
 {
     wxCHECK_MSG(Ok(), false, wxT("Invalid wxLuaState"));
+    if (M_WXLSTATEDATA->m_lua_State_static) return true;
+
     return M_WXLSTATEDATA->CloseLuaState(force);
 }

According to the long comment in wxlstate.h, neither Destroy nor CloseLuaState should be called on static lua states, but I don't see the checks for that anywhere in the code, so the patch above adds them:

    // Destroy the refed data, use this instead of wxObject::UnRef().
    //  Only calls lua_close(L) if this is the last refed state and this was
    //  created without the wxLUASTATE_STATICSTATE flag.
    //  Note: if you have a top level window (wxFrame) open in Lua and exit the
    //  C++ program your program will seem to "hang" because wxApp doesn't
    //  exit with a top level window open. Call CloseLuaState(true) to ensure
    //  all non parented (top level) windows are destroyed.
    //  You must always call CloseLuaState() when you want to close Lua instead
    //  of hoping that when you call Destroy() you have the last refed instance.
    void Destroy();
    // Close the lua_State and if 'force' close all attached wxWindows
    //   if !force then popup a dialog to ask if all wxWindows should be destroyed.
    // Only calls lua_close(L) if this is the last refed state and this was
    //  created without the wxLUASTATE_STATICSTATE flag.
    bool CloseLuaState(bool force);

Please let me know if you have a chance to test this patch against wxwidgets312 branch and it eliminates the crashes on exit that we discussed in this ticket.

pkulchenko commented 5 years ago

One more note: we tried a similar patch in the past, but it was in a slightly different location and there was still UnRef called, which I think was also causing the issue.

pkulchenko commented 5 years ago

BTW, I have not been able to reproduce this issue on 32bit Linux using binaries compiled based on wxwidgets312 branch (from the upgrade branch https://github.com/pkulchenko/ZeroBraneStudio/tree/wxwidgets-upgrade-313), so I can't test the patch there, but I did see an issue on Windows (that I suspect is similar), so I'd be curious to see if you observe the same crash with the binaries I compiled.

sonoro1234 commented 5 years ago

I have checked that with ptrdiff branch the issue with editor.wx.lua crashing on exit dissappears.

I am not sure of which branch is the one that should be tried!!

pkulchenko commented 5 years ago

@sonoro1234, great! do you mean you applied the patch above on top of the ptrdiff branch? That should work too...

sonoro1234 commented 5 years ago

great! do you mean you applied the patch above on top of the ptrdiff branch? That should work too...

Yes, I always worked on ptrdiff branch. My question is: which is the branch I should use now?

pkulchenko commented 5 years ago

I'll be merging wxwidgets312, then ptrdiff, then no-close fix into master. I didn't want to merge until the changes are confirmed working, but so far it looks good, so should be ready to merge.

pkulchenko commented 5 years ago

@jjvbsag, @sonoro1234, I'm closing this, as I expect it to be fixed in the master branch. Please use that branch and feel free to re-open if you notice the same crash happening again.

ghost commented 5 years ago

Humm, with your patch I still have this issue on Linux (tested with 64 bit) while the patch suggested by me does not fix it on Windows (msys/mingw64). The easiest way to test, is to just load and unload the lib with: luajit -e 'require("wx")'

What does "m_lua_State_static" actually indicate? When does this flag get set?

Does glibc skip calling 'exit()' when closing on 32-bit Linux? The issue should be the same in 32bit.

pkulchenko commented 5 years ago

@chymanfx, you are correct, as I have been able to reproduce this issue on 64-bit Ubuntu. I'm not sure why the 32-bit version shows different results for me, but I expect the following patch that I pushed to the master branch to fix the issue:

diff --git a/wxLua/modules/wxlua/wxlstate.cpp b/wxLua/modules/wxlua/wxlstate.cpp
index 42b7c34..abda16d 100644
--- a/wxLua/modules/wxlua/wxlstate.cpp
+++ b/wxLua/modules/wxlua/wxlstate.cpp
@@ -394,7 +394,11 @@ wxLuaStateRefData::~wxLuaStateRefData()
     wxCHECK_RET((m_lua_State_static == true) || (m_lua_State == NULL),
                 wxT("You must ALWAYS call wxLuaState::Destroy and not wxObject::UnRef"));

-    CloseLuaState(true);
+    // only close the state if it's not static,
+    // as when it's static (wx is loaded as a library), it will be closed somewhere else
+    if (!m_lua_State_static)
+        CloseLuaState(true);
+
     if (m_own_stateData)
         delete m_wxlStateData;
 }

Let me know if this fixed the issue for you.

ghost commented 5 years ago

Yes, that looks good, on Ubuntu 16 and Centos 6.10 (both 64-bit) the shutdown is now clean for me.

Thank you for your efforts!

pkulchenko commented 5 years ago

Great; thank you for the update!

pkulchenko commented 4 years ago

@jjvbsag, @ghost, @sonoro1234: FYI, I pushed a patch that fixes double freeing of memory in GC that I picked up from @osch's fork (thanks Oliver!) in 3697eef4.

It's not directly related to the crashes discussed in this ticket, but I thought I'd mention this, as this was the only crash I've seen left when the patches discussed in this ticket are applied (and this crash would usually have wxluaO_deletegcobject close to the top of the stack).