Open vifino opened 8 years ago
@vifino i dont have musl or Alpine Linux, and I am unlikely to investigate further. The stack points to malloc_usable_size, and you can compile torch with -DHAVE_MALLOC_USABLE_SIZE=0 to see if that fixes things (maybe libmusl has an incorrect implementation of that function). You'll have to
git clone https://github.com/torch/torch7
cd torch7
# modify this line to always trigger : https://github.com/torch/torch7/blob/master/lib/TH/CMakeLists.txt#L90-L92
luarocks make rocks/torch-scm-1.rockspec
Hope this helps.
Hi guys, I may have a similar issue. I started having segfaults, but I am not sure whether it is a bug on my side. It shouldn't as I am not using any proprietary C or CUDA code. Anyway, I am posting my backtrace here, still I am not sure this is relevant to this issue or whether it is an issue in torch7 or elsewhere.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe6f0a700 (LWP 6914)]
musable (mem=0x7bffbbb41ba0) at malloc.c:4567
4567 malloc.c: No such file or directory.
(gdb) bt
#0 musable (mem=0x7bffbbb41ba0) at malloc.c:4567
#1 __malloc_usable_size (m=0x7bffbbb41ba0) at malloc.c:4581
#2 0x00007ffff5647bb9 in THFree () from /usr/local/lib/libTH.so
#3 0x00007ffff56713f6 in THFloatTensor_free () from /usr/local/lib/libTH.so
#4 0x00007ffff5cfd05d in torch_FloatTensor_free () from /usr/local/lib/lua/5.1/libtorch.so
#5 0x0000000000475bc9 in lj_BC_FUNCC ()
#6 0x0000000000416e19 in gc_call_finalizer ()
#7 0x0000000000443c56 in gc_finalize ()
#8 0x0000000000443dd3 in gc_onestep ()
#9 0x0000000000444304 in lj_gc_step ()
#10 0x000000000045cea4 in lua_newuserdata ()
#11 0x00007ffff5aa02ac in luaT_pushudata () from /usr/local/lib/libluaT.so
#12 0x00007ffff5cfde83 in torch_FloatTensor___index__ () from /usr/local/lib/lua/5.1/libtorch.so
#13 0x0000000000475bc9 in lj_BC_FUNCC ()
#14 0x00007ffff5a9ea86 in luaT_mt__index () from /usr/local/lib/libluaT.so
#15 0x0000000000475bc9 in lj_BC_FUNCC ()
#16 0x00000000004627c0 in lua_pcall ()
#17 0x00007fffc4d0e607 in newthread () from /usr/local/lib/lua/5.1/libthreads.so
#18 0x00007fffc4d100eb in thread_closure () from /usr/local/lib/lua/5.1/libthreads.so
#19 0x00007ffff7170182 in start_thread (arg=0x7fffe6f0a700) at pthread_create.c:312
#20 0x00007ffff6c8747d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
@korkinof can you see if pre-loading a different allocator like libjemalloc will help, even when linking against musllibc?
So I have jemalloc installed from Ubuntu repos and I did this before running:
export LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so.1
Would that do the trick? Coz it didn't help...
Now running single-threaded data loading, it still crashes more rarely I think. The backtrace is similar.
`Program received signal SIGSEGV, Segmentation fault.
_int_malloc (av=0x7ffff6f4b760
`
Hello.
Torch seems to have some incompatabilities with musl. One of the cases where it will segfault is when
nn
is required.I am not sure if this is due to a bug in musl or something else. I am sure however that there must be some sort of incompatability that needs fixing.
My testing environment is Alpine Linux v3.3.
If you have Docker installed, you can run
docker run --rm -it vifino/torch
, which will put you right in the th repl. If you want to debug with gdb, rundocker run --rm -v
pwd:/pwd -it vifino/torch sh
to get a shell andapk update && apk add gdb
to update repository lists and install gdb.Thanks, Adrian "vifino" Pistol