Open Pavel-Durov opened 1 year ago
If you increase your ulimit
a bit (say by 25%), does the problem go away? I'm wondering if there's a big memory leak or whether we're close to your memory limit and a little (allowable) nondeterminism explains things.
My virtual memory was set to unlimitted
initally:
$ ulimit -v
unlimited
And now I can't reproduce LLVM ERROR: out of memory
anymore...
Even when I set a low limit:
$ ulimit -v 100000 && ./src/lua ./tests/bench/mandelbrot.lua
I recommend https://tratt.net/laurie/src/try_repeat/ for this sort of things -- let it run for (say) 1000 times. If it gets through 1000 times then maybe the error was transient.
Yeah, it was easier to reproduce with try_repeat
:)
I've run try_repeat 1000 ...
and it fails at different iterations but always with the same error LLVM ERROR: out of memory
Virtual memory limits used 100000, ...1000000000000000
I'm not sure if ulimit -v
is the right knob for ulimit
? I'm not very au fait with Linux, but on OpenBSD I would tend to fiddle with ulimit -d
or ulimit -m
. @vext01 Any thoughts on this?
Not heard of ulimit -v
. Usually -d
is what you want, as leaks are typically on the heap.
Ok, I'll use -d
(data seg size)
I got the same result with ulimit -d
. User range: 100000...1000000000000000
This feels weird: we get an out of error message with ulimit -d 1000000000000000
which must surely exceed the actual RAM you have available? I wonder if we're not adjusting whatever factor actually ends up causing yk to think/know that there's no memory left?
I hope its not me doing something weird here :no_mouth:
here's how I check the limit set and run try_repeat
:
$ ulimit -d
1000000000000000
$ try_repeat 1000 ./src/lua ./mandelbrot.lua
P4
100 100
>������������?@k��
...
������������������?���������������LLVM ERROR: out of memory
Allocation failed
Aborted (core dumped)
Hmm, if you gdb
this, what allocation point is failing? It could be that we're passing a ludicrously big number to the malloc
call such that no limit we pick could deal with it!
is the ulimit -d <large-number>
actually taking effect? Does running ulimit -d
after show the number you passed before?
I think we should wait until we see what size allocation is requested. It's quite possible that data corruption means that we're asking for a ludicrously big size.
Summary of related discussion:
When we run YKD_SERIALISE_COMPILATION=1
, a single thread is compiling LLVM, so not that much memory is being used. When thatYKD_SERIALISE_COMPILATION
is not set, you can have multi-core LLVM compilation and presumably consumer too much RAM
When we run YKD_SERIALISE_COMPILATION=1
that benchmark is using floating point and we don't handle that and don't warn that we don't handle it.
There's an intrinsic missing the yk_inlined attribute this tells us whether to expect a frame for the intrinsic call
We don't support floats in yk
mandelbrot.lua
benchmark works with serialised compilation (tested 100 times with try_repeat).
mandelbrot.lua
benchmark DOESNT work with non-serialised compilation
Example:
YKD_SERIALISE_COMPILATION=0 gdb -batch -ex "r" --ex "bt" --args ../src/lua ./mandelbrot.lua
Backtrace:
[Thread 0x7fffecb986c0 (LWP 1982006) exited]
[New Thread 0x7fffecb986c0 (LWP 1982007)]
While deleting: i32 %
Use still stuck around after Def is destroyed: %.lcssa19253 = phi i32 [ <null operand!>, %guardsuccess274 ], !dbg
Thread 3 "lua" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffee59d6c0 (LWP 1981994)]
0x00007ffff3e39e59 in llvm::TargetIRAnalysis::getDefaultTTI(llvm::Function const&) () from /home/pd/yk-fork/target/debug/ykllvm/lib/../lib/libLLVMAnalysis.so.16git
#0 0x00007ffff3e39e59 in llvm::TargetIRAnalysis::getDefaultTTI(llvm::Function const&) () from /home/pd/yk-fork/target/debug/ykllvm/lib/../lib/libLLVMAnalysis.so.16git
#1 0x00007ffff3e3969c in std::_Function_handler<llvm::TargetTransformInfo (llvm::Function const&), llvm::TargetTransformInfo (*)(llvm::Function const&)>::_M_invoke(std::_Any_data const&, llvm::Function const&) () from /home/pd/yk-fork/target/debug/ykllvm/lib/../lib/libLLVMAnalysis.so.16git
#2 0x00007ffff3e42f76 in llvm::TargetTransformInfoWrapperPass::getTTI(llvm::Function const&) () from /home/pd/yk-fork/target/debug/ykllvm/lib/../lib/libLLVMAnalysis.so.16git
#3 0x00007ffff3ad895b in llvm::AssumptionCacheTracker::getAssumptionCache(llvm::Function&) () from /home/pd/yk-fork/target/debug/ykllvm/lib/../lib/libLLVMAnalysis.so.16git
#4 0x00007ffff31d4b8d in (anonymous namespace)::LoopInstSimplifyLegacyPass::runOnLoop(llvm::Loop*, llvm::LPPassManager&) [clone .part.0] () from /home/pd/yk-fork/target/debug/ykllvm/lib/../lib/libLLVMScalarOpts.so.16git
#5 0x00007ffff3ced0e3 in llvm::LPPassManager::runOnFunction(llvm::Function&) () from /home/pd/yk-fork/target/debug/ykllvm/lib/../lib/libLLVMAnalysis.so.16git
#6 0x00007ffff5a67570 in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /home/pd/yk-fork/target/debug/ykllvm/lib/libLLVMCore.so.16git
#7 0x00007ffff3b41c4f in (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) () from /home/pd/yk-fork/target/debug/ykllvm/lib/../lib/libLLVMAnalysis.so.16git
#8 0x00007ffff5a67f77 in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /home/pd/yk-fork/target/debug/ykllvm/lib/libLLVMCore.so.16git
#9 0x00007ffff7beee98 in compileIRTrace<std::tuple<llvm::Module*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::map<llvm::GlobalValue*, void*, std::less<llvm::GlobalValue*>, std::allocator<std::pair<llvm::GlobalValue* const, void*> > >, void*, unsigned long> (*)(llvm::Module*, char**, unsigned long*, unsigned long, char**, void**, unsigned long)> (Func=0x7ffff7bf92d0 <createModule[abi:cxx11](llvm::Module*, char**, unsigned long*, unsigned long, char**, void**, unsigned long)>, FuncNames=0x7fffe80ef580, BBs=0x7fffe80f0ee0, TraceLen=810, FAddrKeys=0x7fffe8132f20, FAddrVals=0x7fffe8132f70, FAddrLen=8, BitcodeData=0x2579b0 <llvm.embedded[module]+8>, BitcodeLen=3622780, DebugInfoFD=-1, DebugInfoPath=0x0) at src/ykllvmwrap.cc:432
#10 0x00007ffff7beecae in __yktracec_irtrace_compile (FuncNames=0x7fffe80ef580, BBs=0x7fffe80f0ee0, TraceLen=810, FAddrKeys=0x7fffe8132f20, FAddrVals=0x7fffe8132f70, FAddrLen=8, BitcodeData=0x2579b0 <llvm.embedded[module]+8>, BitcodeLen=3622780, DebugInfoFD=-1, DebugInfoPath=0x0) at src/ykllvmwrap.cc:450
#11 0x00007ffff7a9d528 in ykrt::compile::jitc_llvm::{impl#0}::compile (self=0x91d720, mt=..., irtrace=...) at ykrt/src/compile/jitc_llvm.rs:38
#12 0x00007ffff7a8fe2d in ykrt::mt::{impl#0}::queue_compile_job::{closure#0} () at ykrt/src/mt.rs:397
#13 0x00007ffff7a5cbbe in core::ops::function::FnOnce::call_once<ykrt::mt::{impl#0}::queue_compile_job::{closure_env#0}, ()> () at /rustc/8142a319ed5c1d1f96e5a1881a6546e463b77c8f/library/core/src/ops/function.rs:250
#14 0x00007ffff7a8c888 in alloc::boxed::{impl#47}::call_once<(), (dyn core::ops::function::FnOnce<(), Output=()> + core::marker::Send), alloc::alloc::Global> (self=..., args=()) at /rustc/8142a319ed5c1d1f96e5a1881a6546e463b77c8f/library/alloc/src/boxed.rs:2007
#15 0x00007ffff7a780fb in lock_api::mutex::MutexGuard<parking_lot::raw_mutex::RawMutex, alloc::collections::vec_deque::VecDeque<alloc::boxed::Box<(dyn core::ops::function::FnOnce<(), Output=()> + core::marker::Send), alloc::alloc::Global>, alloc::alloc::Global>>::unlocked<parking_lot::raw_mutex::RawMutex, alloc::collections::vec_deque::VecDeque<alloc::boxed::Box<(dyn core::ops::function::FnOnce<(), Output=()> + core::marker::Send), alloc::alloc::Global>, alloc::alloc::Global>, alloc::boxed::Box<(dyn core::ops::function::FnOnce<(), Output=()> + core::marker::Send), alloc::alloc::Global>, ()> (s=0x7fffee59c9e0, f=...) at /home/pd/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.10/src/mutex.rs:567
#16 0x00007ffff7a8e651 in ykrt::mt::{impl#0}::queue_job::{closure#0} () at ykrt/src/mt.rs:168
#17 0x00007ffff7a9ae33 in std::sys_common::backtrace::__rust_begin_short_backtrace<ykrt::mt::{impl#0}::queue_job::{closure_env#0}, ()> (f=...) at /rustc/8142a319ed5c1d1f96e5a1881a6546e463b77c8f/library/std/src/sys_common/backtrace.rs:154
#18 0x00007ffff7a89a04 in std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure#0}<ykrt::mt::{impl#0}::queue_job::{closure_env#0}, ()> () at /rustc/8142a319ed5c1d1f96e5a1881a6546e463b77c8f/library/std/src/thread/mod.rs:529
#19 0x00007ffff7a959b4 in core::panic::unwind_safe::{impl#23}::call_once<(), std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<ykrt::mt::{impl#0}::queue_job::{closure_env#0}, ()>> (self=...) at /rustc/8142a319ed5c1d1f96e5a1881a6546e463b77c8f/library/core/src/panic/unwind_safe.rs:271
#20 0x00007ffff7a87297 in std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<ykrt::mt::{impl#0}::queue_job::{closure_env#0}, ()>>, ()> (data=0x7fffee59cb00) at /rustc/8142a319ed5c1d1f96e5a1881a6546e463b77c8f/library/std/src/panicking.rs:526
#21 0x00007ffff7a8758b in __rust_try () from /home/pd/yk-fork/bin/../target/debug/deps/libykcapi.so
#22 0x00007ffff7a8712e in std::panicking::try<(), core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<ykrt::mt::{impl#0}::queue_job::{closure_env#0}, ()>>> (f=...) at /rustc/8142a319ed5c1d1f96e5a1881a6546e463b77c8f/library/std/src/panicking.rs:490
#23 0x00007ffff7a89846 in std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<ykrt::mt::{impl#0}::queue_job::{closure_env#0}, ()>>, ()> (f=...) at /rustc/8142a319ed5c1d1f96e5a1881a6546e463b77c8f/library/std/src/panic.rs:142
#24 std::thread::{impl#0}::spawn_unchecked_::{closure#1}<ykrt::mt::{impl#0}::queue_job::{closure_env#0}, ()> () at /rustc/8142a319ed5c1d1f96e5a1881a6546e463b77c8f/library/std/src/thread/mod.rs:528
#25 0x00007ffff7a5cb0f in core::ops::function::FnOnce::call_once<std::thread::{impl#0}::spawn_unchecked_::{closure_env#1}<ykrt::mt::{impl#0}::queue_job::{closure_env#0}, ()>, ()> () at /rustc/8142a319ed5c1d1f96e5a1881a6546e463b77c8f/library/core/src/ops/function.rs:250
#26 0x00007ffff7c7ee55 in alloc::boxed::{impl#47}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:2007
#27 alloc::boxed::{impl#47}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:2007
#28 std::sys::unix::thread::{impl#2}::new::thread_start () at library/std/src/sys/unix/thread.rs:108
#29 0x00007ffff78a8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#30 0x00007ffff79285fc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
SUCCESS - YKD_SERIALISE_COMPILATION=1
ERROR - YKD_SERIALISE_COMPILATION=0
YKLUA - main/810108764e996d727442d6984f7a8318155a08c8
YK - master/a6613631e8e13cfe3ba4cc8ee24df6517831fd6c
$ YKD_SERIALISE_COMPILATION=0 gdb -batch -ex r -ex bt -args ../src/lua -e"_U=true" ./mandelbrot.lua
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff26346c0 (LWP 1662751)]
P4
100 100
[New Thread 0x7ffff1afe6c0 (LWP 1662752)]
[Thread 0x7ffff1afe6c0 (LWP 1662752) exited]
[New Thread 0x7ffff1afe6c0 (LWP 1662753)]
[New Thread 0x7ffff11fc6c0 (LWP 1662754)]
[Thread 0x7ffff11fc6c0 (LWP 1662754) exited]
[New Thread 0x7ffff11fc6c0 (LWP 1662755)]
[New Thread 0x7ffff08fa6c0 (LWP 1662756)]
[Thread 0x7ffff08fa6c0 (LWP 1662756) exited]
[New Thread 0x7ffff08fa6c0 (LWP 1662757)]
[Thread 0x7ffff26346c0 (LWP 1662751) exited]
[New Thread 0x7ffff23f26c0 (LWP 1662758)]
[Thread 0x7ffff23f26c0 (LWP 1662758) exited]
thread '<unnamed>' panicked at ykrt/src/trace/hwt/mod.rs:39:23:
not yet implemented: Unrecoverable("c set errnor 12")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
Thread 1 "lua" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007ffff78a9d9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 0x00007ffff785af32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007ffff7845472 in __GI_abort () at ./stdlib/abort.c:79
#4 0x00007ffff7cfb817 in std::sys::unix::abort_internal () at library/std/src/sys/unix/mod.rs:370
#5 0x00007ffff7cf8f22 in std::panicking::rust_panic () at library/std/src/panicking.rs:804
#6 0x00007ffff7cf8e17 in std::panicking::rust_panic_with_hook () at library/std/src/panicking.rs:775
#7 0x00007ffff7cf8b51 in std::panicking::begin_panic_handler::{closure#0} () at library/std/src/panicking.rs:631
#8 0x00007ffff7cf7516 in std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::{closure_env#0}, !> () at library/std/src/sys_common/backtrace.rs:170
#9 0x00007ffff7cf8892 in std::panicking::begin_panic_handler () at library/std/src/panicking.rs:619
#10 0x00007ffff7a62eb5 in core::panicking::panic_fmt () at library/core/src/panicking.rs:72
#11 0x00007ffff7ad8a18 in ykrt::trace::hwt::{impl#2}::stop_collector (self=0x922b10) at ykrt/src/trace/hwt/mod.rs:39
#12 0x00007ffff7a6960f in ykrt::mt::MT::control_point (self=0x7fffffffcd28, loc=0x923260, ctrlp_vars=0x7fffffffd198, frameaddr=0x7fffffffe000) at ykrt/src/mt.rs:261
#13 0x00007ffff7a63db7 in ykcapi::__ykrt_control_point (mt=0x91b7e0, loc=0x923260, ctrlp_vars=0x7fffffffd198, frameaddr=0x7fffffffe000) at ykcapi/src/lib.rs:67
#14 0x0000000000870ef3 in luaV_execute (L=<optimized out>, ci=<optimized out>) at lvm.c:1186
#15 0x000000000080246b in ccall (L=0x91ae28, func=<optimized out>, nResults=<optimized out>, inc=<optimized out>) at ldo.c:609
#16 0x0000000000802591 in luaD_callnoyield (L=0x91ae28, func=0x91b940, nResults=-1) at ldo.c:627
#17 0x00000000007ee823 in f_call (L=0x91ae28, ud=<optimized out>) at lapi.c:1041
#18 0x00000000007fc517 in luaD_rawrunprotected (L=0x91ae28, f=0x7ee760 <f_call>, ud=0x7ffff2635308) at ldo.c:144
#19 0x0000000000804d36 in luaD_pcall (L=0x91ae28, func=0x7ee760 <f_call>, u=0x7ffff2635308, old_top=<optimized out>, ef=<optimized out>) at ldo.c:926
#20 0x00000000007ee30c in lua_pcallk (L=0x91ae28, nargs=<optimized out>, nresults=<optimized out>, errfunc=<optimized out>, ctx=<optimized out>, k=<optimized out>) at lapi.c:1067
#21 0x00000000007dff43 in docall (L=0x91ae28, narg=0, nres=-1) at lua.c:160
#22 0x00000000007df644 in handle_script (L=0x91ae28, argv=<optimized out>) at lua.c:255
#23 0x00000000007dd903 in pmain (L=0x91ae28) at lua.c:634
#24 0x00000000008013a5 in precallC (L=0x91ae28, func=<optimized out>, nresults=<optimized out>, f=0x7dd110 <pmain>) at ldo.c:506
#25 0x00000000008019f8 in luaD_precall (L=0x91ae28, func=0x91b900, nresults=1) at ldo.c:572
#26 0x00000000008023af in ccall (L=0x91ae28, func=0x91b900, nResults=1, inc=<optimized out>) at ldo.c:607
#27 0x0000000000802591 in luaD_callnoyield (L=0x91ae28, func=0x91b900, nResults=1) at ldo.c:627
#28 0x00000000007ee823 in f_call (L=0x91ae28, ud=<optimized out>) at lapi.c:1041
#29 0x00000000007fc517 in luaD_rawrunprotected (L=0x91ae28, f=0x7ee760 <f_call>, ud=0x7ffff2635058) at ldo.c:144
#30 0x0000000000804d36 in luaD_pcall (L=0x91ae28, func=0x7ee760 <f_call>, u=0x7ffff2635058, old_top=<optimized out>, ef=<optimized out>) at ldo.c:926
#31 0x00000000007ee30c in lua_pcallk (L=0x91ae28, nargs=<optimized out>, nresults=<optimized out>, errfunc=<optimized out>, ctx=<optimized out>, k=<optimized out>) at lapi.c:1067
#32 0x00000000007dcdd0 in main (argc=<optimized out>, argv=<optimized out>) at lua.c:660
I get errors when I'm running mandelbrot.lua benchmark with YkLua.
Command:
When running without
YKD_SERIALISE_COMPILATION
it sometimes completes successfully, but sometimes fails with an "out of memory" error:YkLua Commit: 6bd6c53b64a8ae95f6a0e404d74c69b30a0a84f5 Yk Commit: 3a9551f961b27798dd86b3cd0e2159cc1cb81f73 Machine: Ubuntu 22.04.2 LTS