luvit / luv

Bare libuv bindings for lua
Apache License 2.0
831 stars 187 forks source link

fix memory leak in fs.scandir sync mode #694

Closed zhaozg closed 9 months ago

zhaozg commented 9 months ago

I found it in long time luvit/webserver. luv test close will free all mems, so not found by valgrind or Asan.

lewis6991 commented 8 months ago

Just to note that this change caused an issue in Neovim: https://github.com/neovim/neovim/issues/27678#issuecomment-1972998750

Application Specific Information:
abort() called

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib                 0x18cde20dc __pthread_kill + 8
1   libsystem_pthread.dylib                0x18ce19cc0 pthread_kill + 288
2   libsystem_c.dylib                      0x18cd25a40 abort + 180
3   libsystem_malloc.dylib                 0x18cc3cb08 malloc_vreport + 908
4   libsystem_malloc.dylib                 0x18cc403f4 malloc_report + 64
5   libsystem_malloc.dylib                 0x18cc54ebc find_zone_and_free + 308
6   nvim                                   0x100eb13b8 uv__fs_scandir_cleanup + 84
7   nvim                                   0x100eb7e68 uv_fs_req_cleanup + 120
8   nvim                                   0x100e03634 luv_fs_gc + 132
9   nvim                                   0x100e42aec lj_BC_FUNCC + 44
10  nvim                                   0x100e44cc8 gc_call_finalizer + 148 (lj_gc.c:521)
11  nvim                                   0x100b7fdb4 nlua_push_Object + 320 (converter.c:781)
12  nvim                                   0x100b7fc48 nlua_push_Dictionary + 332 (converter.c:722)
13  nvim                                   0x100992228 nlua_api_nvim_get_mode + 160 (lua_api_c_bindings.generated.c:6379)
14  nvim                                   0x100e42aec lj_BC_FUNCC + 44
15  nvim                                   0x100e590b4 lua_pcall + 228 (lj_api.c:1150)
squeek502 commented 8 months ago

That makes sense, although I can't quite figure out how to reproduce that particular crash yet.

I believe the real problem here is that luv_fs_scandir is returning a uv_req_t that effectively stores a reference to itself in the LUA_REGISTRYINDEX which means that it will never be garbage collected until the program ends.

This PR tries to workaround that by unrefing it, but that means that the underlying uv_req_t memory can be garbage collected while the Lua userdata still exists, and it could therefore get its gc function called which will try to call uv_fs_req_cleanup on already-freed memory.

So the real fix here needs to be making the return of luv_fs_scandir able to be properly garbage collected.

EDIT: Potential fix: https://github.com/luvit/luv/pull/696