luvit / luv

Bare libuv bindings for lua
Apache License 2.0
823 stars 185 forks source link

pthread_create possible leak #552

Closed squeek502 closed 1 year ago

squeek502 commented 3 years ago

In setting up CI with Github Actions (https://github.com/luvit/luv/pull/551), the valgrind step is reporting leaks (and I can reproduce the leaks locally). It happens with multiple tests (not sure of everything that triggers it yet), and is not only related to the thread bindings. For example, here's the reported leak after running test-fs.lua (with luajit):

==34441== HEAP SUMMARY:
==34441==     in use at exit: 3,012 bytes in 9 blocks
==34441==   total heap usage: 744 allocs, 735 frees, 264,178 bytes allocated
==34441== 
==34441== 1,152 bytes in 4 blocks are possibly lost in loss record 5 of 6
==34441==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==34441==    by 0x40149CA: allocate_dtv (dl-tls.c:286)
==34441==    by 0x40149CA: _dl_allocate_tls (dl-tls.c:532)
==34441==    by 0x504D322: allocate_stack (allocatestack.c:622)
==34441==    by 0x504D322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==34441==    by 0x5007591: uv_thread_create_ex (thread.c:259)
==34441==    by 0x500761F: uv_thread_create (thread.c:213)
==34441==    by 0x4FF74AF: init_threads (threadpool.c:225)
==34441==    by 0x4FF74AF: init_once (threadpool.c:252)
==34441==    by 0x505547E: __pthread_once_slow (pthread_once.c:116)
==34441==    by 0x500791C: uv_once (thread.c:420)
==34441==    by 0x4FF766C: uv__work_submit (threadpool.c:261)
==34441==    by 0x4FFFFD0: uv_fs_open (fs.c:1886)
==34441==    by 0x4FFFFD0: uv_fs_open (fs.c:1876)
==34441==    by 0x4FF16BA: luv_fs_open (fs.c:475)
==34441==    by 0x170259: lj_BC_FUNCC (buildvm_x86.dasc:851)

This can also be reproduced with a minimal file:

-- test.lua
local uv = require('luv')

uv.fs_stat('README.md', function (err, stat)
  assert(not err, err)
  print(stat.size)
end)

uv.run()

However, I cannot reproduce it with a theoretically similar test case when using Libuv directly:

#include <uv.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>

static void stat_cb(uv_fs_t* req) {
  uv_stat_t* stat = &req->statbuf;
  printf("size %" PRIu64 "\n", stat->st_size);
  uv_fs_req_cleanup(req);
}

int main() {
  uv_fs_t req;
  int ret;

  ret = uv_fs_stat(uv_default_loop(), &req, "CMakeLists.txt", stat_cb);
  printf("stat %d\n", ret);

  ret = uv_run(uv_default_loop(), UV_RUN_DEFAULT);
  printf("run %d\n", ret);

  uv_loop_close(uv_default_loop());
}

I'm really unsure what's happening here, or why it's just showing up now. For now, I've added a suppression for these possible leaks in the CI just to get that passing, but it would be nice to figure out if this is a real leak that we should be addressing.

squeek502 commented 3 years ago

It can be reproduced with a super minimal Lua module that does the same as the bare Libuv test code above:

libuv_test.c

#include <lua.h>
#include <lauxlib.h>
#include <uv.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>

static void stat_cb(uv_fs_t* req) {
  if (req->result < 0) {
    printf("error %ld\n", req->result);
  } else {
    uv_stat_t* stat = &req->statbuf;
    printf("size %" PRIu64 "\n", stat->st_size);
  }
  uv_fs_req_cleanup(req);
}

static int libuv_test(lua_State *L)
{
  uv_fs_t req;
  int ret;

  ret = uv_fs_stat(uv_default_loop(), &req, "test.lua", stat_cb);
  printf("stat %d\n", ret);

  ret = uv_run(uv_default_loop(), UV_RUN_DEFAULT);
  printf("run %d\n", ret);

  uv_loop_close(uv_default_loop());

  return 0;
}

int luaopen_libuv_test(lua_State *L)
{
  lua_pushcfunction(L, libuv_test);
  return 1;
}

test.lua

local libuv_test = require('libuv_test')

libuv_test()

Maybe related to dynamic library loading? Not sure what the difference would be otherwise.

squeek502 commented 3 years ago

Does look to be dl-related. Can create a reproduction from the 'using libuv directly' test by sticking it in a dynamic library:

libuv_dl.c

#include <uv.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>

static void stat_cb(uv_fs_t* req) {
  if (req->result < 0) {
    printf("error %ld\n", req->result);
  } else {
    uv_stat_t* stat = &req->statbuf;
    printf("size %" PRIu64 "\n", stat->st_size);
  }
  uv_fs_req_cleanup(req);
}

void libuv_test(void) {
  uv_fs_t req;
  int ret;

  ret = uv_fs_stat(uv_default_loop(), &req, "test.lua", stat_cb);
  printf("stat %d\n", ret);

  ret = uv_run(uv_default_loop(), UV_RUN_DEFAULT);
  printf("run %d\n", ret);

  uv_loop_close(uv_default_loop());
}

main.c

#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>

int main() {
  void *handle;
  void (*libuv_test_fn)(void);
  char *error;

  handle = dlopen ("./libuv_dl.so", RTLD_NOW|RTLD_LOCAL);
  if (!handle) {
    fputs (dlerror(), stderr);
    exit(1);
  }

  libuv_test_fn = dlsym(handle, "libuv_test");
  if ((error = dlerror()) != NULL)  {
    fputs(error, stderr);
    exit(1);
  }

  (*libuv_test_fn)();

  dlclose(handle);
}

Also possibly of interest:

squeek502 commented 3 years ago

The leak can be fixed in the example in the previous comment (https://github.com/luvit/luv/issues/552#issuecomment-873365524) by creating a thread from main (when it's called doesn't matter). These are probably relevant for what type of thing is possibly happening (as mentioned in the previous comment):

(note: simply linking against pthread is not enough, though, pthread_create/pthread_join has to actually be called to fix the leak)

Not sure exactly what this means or how we should treat this. Still unsure why it wasn't being reported previously, will try testing with Valgrind 3.13.0 (the version that was being used on Travis CI previously) to see if it's reported there. EDIT: Valgrind 3.13.0 has a bug that makes it incompatible with newer versions of glibc, so I'm not able to test this easily.

"fixed" main.c

#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
#include <pthread.h>

void* thread_func(void * arg)
{
  return NULL;
}

int main(int argc, char **argv) {
  void *handle;
  void (*libuv_test_fn)(void);
  char *error;

  pthread_t thread_id;
  pthread_create(&thread_id, NULL, &thread_func, NULL);
  pthread_join(thread_id, NULL);

  handle = dlopen ("./libuv_dl.so", RTLD_NOW|RTLD_LOCAL);
  if (!handle) {
    fputs (dlerror(), stderr);
    exit(1);
  }

  libuv_test_fn = dlsym(handle, "libuv_test");
  if ((error = dlerror()) != NULL)  {
    fputs(error, stderr);
    exit(1);
  }

  (*libuv_test_fn)();

  dlclose(handle);
}

Valgrind output:

==155368== Command: ./libuv_main
==155368== 
stat 0
size 24
run 0
==155368== 
==155368== HEAP SUMMARY:
==155368==     in use at exit: 0 bytes in 0 blocks
==155368==   total heap usage: 14 allocs, 14 frees, 4,219 bytes allocated
==155368== 
==155368== All heap blocks were freed -- no leaks are possible
zhaozg commented 1 year ago

Maybe fixed in https://github.com/luvit/luv/commit/ccb0578e6acd98a819ed2757ba73d32522fc5f2e#diff-6fe539b06bb5c8ecbb00283c94398c0ae8cb44837c6999c01814f295c3eaced2R300-R312

squeek502 commented 1 year ago

Yeah, seems to be fixed. :+1: