luvit / luv

Bare libuv bindings for lua
Apache License 2.0
832 stars 187 forks source link

Frequent segfaults in `push_fs_result` #644

Closed happenslol closed 1 year ago

happenslol commented 1 year ago

I'm using the nightly neovim build and am regularly encountering segfaults. I haven't yet 100% narrowed down when they occur, but it mostly seems to be when files are changed while the editor is opened.

This is what the stack trace looks like:

                #0  0x00007f7c9db4e939 push_fs_result (libluv.so.1 + 0xf939)
                #1  0x00007f7c9db542b3 luv_fs_cb (libluv.so.1 + 0x152b3)
                #2  0x00007f7c9d9333fd uv__work_done (libuv.so.1 + 0xc3fd)
                #3  0x00007f7c9d9370cd uv__async_io.part.0 (libuv.so.1 + 0x100cd)
                #4  0x00007f7c9d94ae6c uv__io_poll (libuv.so.1 + 0x23e6c)
                #5  0x00007f7c9d937a14 uv_run (libuv.so.1 + 0x10a14)
                #6  0x0000000000538948 loop_uv_run (nvim + 0x138948)
                #7  0x0000000000643cad inbuf_poll.lto_priv.0 (nvim + 0x243cad)
                #8  0x0000000000643eed os_inchar (nvim + 0x243eed)
                #9  0x00000000006cdaad state_enter (nvim + 0x2cdaad)
                #10 0x0000000000609214 normal_enter (nvim + 0x209214)
                #11 0x00000000004552d0 main (nvim + 0x552d0)
                #12 0x00007f7c9d73d24e __libc_start_call_main (libc.so.6 + 0x2924e)
                #13 0x00007f7c9d73d309 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29309)
                #14 0x0000000000457585 _start (nvim + 0x57585)

I'm not sure what else to include here, so please tell me if there's any additional information you require.

squeek502 commented 1 year ago

Is this reproducible with default neovim or is it possible it only happens with some combination of plugins?

If possible, knowing the line number of where the crash is happening in push_fs_result would likely be very helpful.

squeek502 commented 1 year ago

Stacktrace looks similar to one in https://github.com/neovim/neovim/issues/21467

happenslol commented 1 year ago

I'm 99% sure it happens when I have neo-tree enabled, but I can't say for certain. However, I've tried disabling their usage of libuv, and the crashes have persisted.

How would I go about getting the line number? Loading the segfault into gdb only provides me with the stacktrace I have posted above. I'm assuming I can only get the line number out if debug information is compiled in, I'm not sure how I would do that in this case

Edit: Just had a look at that thread. One of the stacktraces in there is the exact same as mine, however the command posted in there (:lua require("luv").handle_get_type(newproxy())) also causes a segfault for me, albeit with a different stacktrace.

squeek502 commented 1 year ago

Compiling neovim from source with debug info would probably work. You could also try the instructions here if that makes anything easier:

https://github.com/NixOS/nixpkgs/pull/219400#issuecomment-1455150162

EDIT: Just tried and compiling neovim from source is pretty painless, can use CMAKE_BUILD_TYPE=Debug for making sure debug info will be available. Wasn't able to reproduce this crash, though.


The :lua require("luv").handle_get_type(newproxy()) crash is different, and should be fixed by https://github.com/luvit/luv/pull/634

happenslol commented 1 year ago

I'll try that tomorrow and report back. Thanks for the link!

happenslol commented 1 year ago

Alright, that was quite the journey since the nix build has been broken for a few weeks on NixOS due to treesitter not being up to date in nixpkgs, but I got one step further. libluv still seems to not have debug symbols even though I built neovim-debug, but at least there's some line numbers for the neovim portion now:

#0  0x00007fe385c789ee in push_fs_result () from /nix/store/fsdy4sq9pi4ibp0p6gjzp9lgi5ap77yq-libluv-1.43.0-0/lib/libluv.so.1
#1  0x00007fe385c7e524 in luv_fs_cb () from /nix/store/fsdy4sq9pi4ibp0p6gjzp9lgi5ap77yq-libluv-1.43.0-0/lib/libluv.so.1
#2  0x00007fe385a603c2 in uv.work_done () from /nix/store/avbmp3dcrbzrckrprx48cxx2mwlh825l-libuv-1.44.2/lib/libuv.so.1
#3  0x00007fe385a6409d in uv.async_io.part () from /nix/store/avbmp3dcrbzrckrprx48cxx2mwlh825l-libuv-1.44.2/lib/libuv.so.1
#4  0x00007fe385a780d5 in uv.io_poll () from /nix/store/avbmp3dcrbzrckrprx48cxx2mwlh825l-libuv-1.44.2/lib/libuv.so.1
#5  0x00007fe385a649bc in uv_run () from /nix/store/avbmp3dcrbzrckrprx48cxx2mwlh825l-libuv-1.44.2/lib/libuv.so.1
#6  0x000000000051c267 in loop_uv_run (loop=0x7ed518 <main_loop>, ms=ms@entry=0, once=true) at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/event/loop.c:65
#7  loop_poll_events (loop=0x7ed518 <main_loop>, ms=ms@entry=0) at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/event/loop.c:87
#8  0x0000000000604b2d in os_breakcheck () at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/os/input.c:197
#9  0x000000000055dc18 in vgetorpeek (advance=140) at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/getchar.c:2378
#10 0x000000000055cfae in vpeekc () at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/getchar.c:1635
#11 0x000000000068b029 in state_enter (s=s@entry=0x7ffed28cc4d0) at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/state.c:61
#12 0x00000000005d4b26 in normal_enter (cmdwin=false, noexmode=false) at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/normal.c:497
#13 0x0000000000456ea8 in main (argc=<optimized out>, argv=<optimized out>) at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/main.c:641
squeek502 commented 1 year ago

Unfortunately the libluv line numbers would be the helpful bit, since push_fs_result contains a switch statement so if the crash is happening in a particular case then it'd narrow down the possible reproductions significantly.

If you're using that nixpkgs branch, maybe adding separateDebugInfo = true; to here would give you debug info for libluv? (note that this is a total guess on my part, I have no experience with nixpkgs)

happenslol commented 1 year ago

Mhm, no luck so far I'm afraid. I've tried compiling the debug symbols separately and loading them into gdb, but the nixpkgs version seems to be different since I'm getting bogus line numbers. I'm not too experienced with overriding nixpkgs either, I'll try to get some help on the forums for that. Man, nix is amazing when it works, but it makes things like these so complicated...

Thanks for your patience!

teto commented 1 year ago

enabling debug symbol can differ between projects, separateDebugInfo might be one of those case, if you can point me at instructions to enable debug symbols in libuv, we can see how to modify the nix expression together.

happenslol commented 1 year ago

Yeah, libuv didn't have separateDebugInfo, but I managed to enable it myself by overriding libluv in the rust flake and settings the cmake build type as well as dontStrip (that last one took a bit to figure out..), and I have libluv with debug symbols now. Turns out my last crash was so long ago that coredumpctl already cleaned out the stack traces though, so I'll have to wait for the next crash to get you that line number :-P

enabling debug symbol can differ between projects, separateDebugInfo might be one of those case, if you can point me at instructions to enable debug symbols in libuv, we can see how to modify the nix expression together.

Thanks a lot for the offer still! I learned a lot about overriding things in nix, and I can at least do it for separate targets now. My current way would be building libluv by itself with debug symbols, stripping them out using objcopy and then loading them dynamically in coredumpctl with gdb. Writing an overlay to modify the libluv that neovim builds with would probably be a lot easier, but I haven't done a deep dive into how overlays work yet.

zhaozg commented 1 year ago

Pay attention to https://github.com/neovim/neovim/issues/21413#issuecomment-1684564391

squeek502 commented 1 year ago

The lines the backtrace is pointing to:

https://github.com/luvit/luv/blob/e2fbfba499f9481ebef6a8510b526b183233fd63/src/fs.c#L103

https://github.com/luvit/luv/blob/e2fbfba499f9481ebef6a8510b526b183233fd63/src/fs.c#L352

https://github.com/luvit/luv/blob/e2fbfba499f9481ebef6a8510b526b183233fd63/src/fs.c#L377

zhaozg commented 1 year ago

Let's do some analyze.

  1. In uv.fs_opendir result callback, by newuserdata to create luv_dir, by newuserdata to create luv_dir->handle->dirents and set luv_dir->dirents_ref to dirents.
  2. luv_dir->dirents_ref be unref in uv.fs_closedir or luv_fs_dir_gc, cause dirents gc to invalid.
  3. After call fs_readdir, luv_dir mybe gc before fs_readdir be called.
  4. So we should ref luv_dir in fs_readdir, and unref in readdir callback, avoid lost dirents memory.
zhaozg commented 1 year ago

Reproduced

  test("fs.{open,read,close}dir ref check", function(print, p, expect, uv)
    local dir = assert(uv.fs_opendir('.', nil, 50))

    local function readdir_cb(err, dirs)
      assert(not err)
      if dirs then
        p(dirs)
        uv.fs_readdir(dir, readdir_cb)
      else
        assert(uv.fs_closedir(dir)==true)
      end
    end

    uv.fs_readdir(dir, readdir_cb)
    dir = nil
    collectgarbage()
    collectgarbage()
    collectgarbage()

  end, "1.28.0")
squeek502 commented 1 year ago

That reproduction produces a different stack trace for me when I run it via gdb:

#0  0x00007ffff7c321dc in uv__fs_readdir (req=<optimized out>, req=<optimized out>) at /home/ryan/Programming/luvit/luv/deps/libuv/src/unix/fs.c:610
610     dirent->name = uv__strdup(res->d_name);
#1  uv__fs_work (w=<optimized out>) at /home/ryan/Programming/luvit/luv-tmp/deps/libuv/src/unix/fs.c:1709
#2  0x00007ffff7c2a34e in worker (arg=0x0) at /home/ryan/Programming/luvit/luv-tmp/deps/libuv/src/threadpool.c:122
#3  0x00007ffff7be4609 in start_thread (arg=<optimized out>) at pthread_create.c:477

but I think the fix might solve the luv_push_dirent segfault, too (it's likely the same problem; the garbage collection is just happening at a different time).

squeek502 commented 1 year ago

Nevermind, the stack trace is the same as the neovim one if I run it with LuaJIT (I was using PUC Lua since sometimes that makes things easier to debug):

Thread 1 "luajit" received signal SIGSEGV, Segmentation fault.
luv_push_dirent (L=L@entry=0x7ffff7fa9380, ent=0x0, table=table@entry=1) at /home/ryan/Programming/luvit/luv/src/fs.c:121
121   lua_pushstring(L, ent->name);

#0  luv_push_dirent (L=L@entry=0x7ffff7fa9380, ent=0x0, table=table@entry=1) at /home/ryan/Programming/luvit/luv/src/fs.c:121
#1  0x00007ffff7bfb1d8 in push_fs_result (L=L@entry=0x7ffff7fa9380, req=req@entry=0x7ffff7fc84d8) at /home/ryan/Programming/luvit/luv/src/fs.c:371
#2  0x00007ffff7bfb5b1 in luv_fs_cb (req=0x7ffff7fc84d8) at /home/ryan/Programming/luvit/luv/src/fs.c:401
#3  0x00007ffff7c10240 in uv__work_done (handle=0x7ffff7fbc1f0) at /home/ryan/Programming/luvit/luv/deps/libuv/src/threadpool.c:329
#4  0x00007ffff7c1407b in uv__async_io (loop=0x7ffff7fbc140, w=0x7fffffff9580, events=<optimized out>) at /home/ryan/Programming/luvit/luv/deps/libuv/src/unix/async.c:176
#5  0x00007ffff7c25ff3 in uv__io_poll (loop=loop@entry=0x7ffff7fbc140, timeout=<optimized out>) at /home/ryan/Programming/luvit/luv/deps/libuv/src/unix/linux.c:1303
#6  0x00007ffff7c14cc3 in uv_run (loop=0x7ffff7fbc140, mode=mode@entry=UV_RUN_DEFAULT) at /home/ryan/Programming/luvit/luv/deps/libuv/src/unix/core.c:447
#7  0x00007ffff7c0bc00 in luv_run (L=0x7ffff7fa9380) at /home/ryan/Programming/luvit/luv/src/loop.c:36
#8  0x00005555555ca03b in lj_BC_FUNCC () at buildvm_x86.dasc:859
#9  0x00005555555bbe03 in lua_pcall (L=0x7ffff7fa9380, nargs=<optimized out>, nresults=-1, errfunc=<optimized out>) at /home/ryan/Programming/luvit/luv/deps/luajit/src/lj_api.c:1116
#10 0x000055555555c8ab in docall (L=0x7ffff7fa9380, narg=0, clear=0) at /home/ryan/Programming/luvit/luv/deps/luajit/src/luajit.c:122
#11 0x000055555555dbd2 in handle_script (argx=<optimized out>, L=0x7ffff7fa9380) at /home/ryan/Programming/luvit/luv/deps/luajit/src/luajit.c:292
#12 pmain (L=0x7ffff7fa9380) at /home/ryan/Programming/luvit/luv/deps/luajit/src/luajit.c:550
#13 0x00005555555ca03b in lj_BC_FUNCC () at buildvm_x86.dasc:859
#14 0x00005555555bbfa1 in lua_cpcall (L=<optimized out>, func=<optimized out>, ud=<optimized out>) at /home/ryan/Programming/luvit/luv/deps/luajit/src/lj_api.c:1173
#15 0x000055555555c70e in main (argc=2, argv=0x7fffffffda48) at /home/ryan/Programming/luvit/luv/deps/luajit/src/luajit.c:581