richardhundt / luv

libuv bindings for Lua
Apache License 2.0
118 stars 19 forks source link

Segmentation fault on pthread_join() #8

Closed miko closed 11 years ago

miko commented 11 years ago

$ gdb which lua (gdb) run ../examples/timer.lua Starting program: /usr/bin/lua ../examples/timer.lua ENTER tick: 1 0 ... tick: 10 0

Program received signal SIGSEGV, Segmentation fault. 0xb7c4dd1e in pthread_join () from /usr/lib/libpthread.so.0 (gdb) bt

0 0xb7c4dd1e in pthread_join () from /usr/lib/libpthread.so.0

1 0xb7c8f462 in cleanup () from ./luv.so

2 0xb7fed62c in _dl_fini () from /lib/ld-linux.so.2

3 0xb7d31601 in __run_exit_handlers () from /usr/lib/libc.so.6

4 0xb7d3168d in exit () from /usr/lib/libc.so.6

5 0xb7d1960d in __libc_start_main () from /usr/lib/libc.so.6

6 0x0804ad15 in _start ()

(gdb) run ../examples/stdfh.lua Starting program: /usr/bin/lua ../examples/stdfh.lua Hello World!hello thanks, you said:6 Program received signal SIGSEGV, Segmentation fault. 0xb7c4dd1e in pthread_join () from /usr/lib/libpthread.so.0 (gdb) bt

0 0xb7c4dd1e in pthread_join () from /usr/lib/libpthread.so.0

1 0xb7c8f462 in cleanup () from ./luv.so

2 0xb7fed62c in _dl_fini () from /lib/ld-linux.so.2

3 0xb7d31601 in __run_exit_handlers () from /usr/lib/libc.so.6

4 0xb7d3168d in exit () from /usr/lib/libc.so.6

5 0xb7d1960d in __libc_start_main () from /usr/lib/libc.so.6

6 0x0804ad15 in _start ()

Using archlinux 32-bit, gcc 2.16.0, lua 5.1.5 (the same for luajit2 2.0.0-beta10) It happens on program termination, so all the work is done, but still annoying.

richardhundt commented 11 years ago

Yeah, somebody else reported segfaults on arch too. I'm grabbing a copy of arch now - it's related to the std file pipes which are cleaned up by the GC during exit. I'm working on it now.

richardhundt commented 11 years ago

I removed zmq to see what would happen, and got a slightly different stack trace, which looks more like what's really going on:

Program received signal SIGSEGV, Segmentation fault. 0x00007ffff6819f5b in pthread_join () from /usr/lib/libpthread.so.0 (gdb) bt

0 0x00007ffff6819f5b in pthread_join () from /usr/lib/libpthread.so.0

1 0x00007ffff6c60533 in cleanup () from ./luv.so

2 0x00007ffff7dee5d7 in _dl_close_worker () from /lib/ld-linux-x86-64.so.2

3 0x00007ffff7def08c in _dl_close () from /lib/ld-linux-x86-64.so.2

4 0x00007ffff7de9636 in _dl_catch_error () from /lib/ld-linux-x86-64.so.2

5 0x00007ffff78de5fc in ?? () from /usr/lib/libdl.so.2

6 0x00007ffff78de10f in dlclose () from /usr/lib/libdl.so.2

7 0x000000000041e5a0 in gctm ()

8 0x0000000000407ff4 in luaD_precall ()

9 0x0000000000408404 in luaD_call ()

10 0x0000000000409b2a in GCTM ()

11 0x000000000040a188 in luaC_callGCTM ()

12 0x00000000004076ca in luaD_rawrunprotected ()

13 0x000000000040dece in lua_close ()

14 0x0000000000403a88 in main ()

Looks like ArchLinux ships with the very latest glibc, which AFAICT has had some hacking going into the elf dlopen and dlclose code from the Red Hat, along with lots of NPTL (Native POSIX Thread Library) changes. So I think something in the pthreads code in glibc is pushing a cleanup handler into _dl_close which does pthread_join() and that's broken. It's simply the main thread exiting, with no child threads having been created (just pthread being linked to, that's all). I've grabbed the glibc-2.16 sources tried groking the code in there, and it's too hairy for me to get into in a short time :(

So I give up. Luv is broken on ArchLinux (and probably any other system with glibc-2.16).

On Oct 9, 2012, at 9:45 AM, miko wrote:

$ gdb which lua (gdb) run ../examples/timer.lua Starting program: /usr/bin/lua ../examples/timer.lua ENTER tick: 1 0 ... tick: 10 0

Program received signal SIGSEGV, Segmentation fault. 0xb7c4dd1e in pthread_join () from /usr/lib/libpthread.so.0 (gdb) bt

0 0xb7c4dd1e in pthread_join () from /usr/lib/libpthread.so.0

1 0xb7c8f462 in cleanup () from ./luv.so

2 0xb7fed62c in _dl_fini () from /lib/ld-linux.so.2

3 0xb7d31601 in __run_exit_handlers () from /usr/lib/libc.so.6

4 0xb7d3168d in exit () from /usr/lib/libc.so.6

5 0xb7d1960d in __libc_start_main () from /usr/lib/libc.so.6

6 0x0804ad15 in _start ()

(gdb) run ../examples/stdfh.lua Starting program: /usr/bin/lua ../examples/stdfh.lua Hello World!hello thanks, you said:6 Program received signal SIGSEGV, Segmentation fault. 0xb7c4dd1e in pthread_join () from /usr/lib/libpthread.so.0 (gdb) bt

0 0xb7c4dd1e in pthread_join () from /usr/lib/libpthread.so.0

1 0xb7c8f462 in cleanup () from ./luv.so

2 0xb7fed62c in _dl_fini () from /lib/ld-linux.so.2

3 0xb7d31601 in __run_exit_handlers () from /usr/lib/libc.so.6

4 0xb7d3168d in exit () from /usr/lib/libc.so.6

5 0xb7d1960d in __libc_start_main () from /usr/lib/libc.so.6

6 0x0804ad15 in _start ()

Using archlinux 32-bit, gcc 2.16.0, lua 5.1.5 (the same for luajit2 2.0.0-beta10) It happens on program termination, so all the work is done, but still annoying.

— Reply to this email directly or view it on GitHub.

dvv commented 11 years ago

confirmed on my setup. but i believe this bug may belong to lua/luajit per se. what would be the native coroutine code for this timer example? would it work on our glibc 2.16 setup?

richardhundt commented 11 years ago

This is my test:

local luv = require("luv")

That's it. Just require and exit. It segfaults every time on ArchLinux. Lua/LuaJIT don't even link to pthread, and something is definitely calling pthread_join()

richardhundt commented 11 years ago

face palm there's a commit to libuv from yesterday which fixes the segfault

try git HEAD now :)

miko commented 11 years ago

Indeed, works for me on archlinux without segfaluts now. Thanks

dvv commented 11 years ago

fix confirmed