waTeim / node-julia

Fast and simple access to Julia embedded in node
MIT License
80 stars 15 forks source link

hang on import #21

Closed sebastiang closed 8 years ago

sebastiang commented 8 years ago

I am trying to figure out why an import of a Julia modules works on one machine, but hangs on another. While I hunt down what differences might exist between the machine images -- each provisioned the same way -- I thought I'd post the stack trace to see if it inspired any ideas. Node 0.12.7, julia v0.4-rc2. I start the app, it hangs on an import statement, and after a while I CTRL+C. If I do it quickly, I just get a plain segfault. If I wait a while, I get something like this.

^Cfatal: error thrown and no exception handler available.
InterruptException()
rec_backtrace at /usr/local/lib/julia/libjulia.so (unknown line)
jl_throw at /usr/local/lib/julia/libjulia.so (unknown line)
unknown function (ip: 0x7f32b97d409f)
unknown function (ip: 0x7f32b97d4109)
unknown function (ip: 0x7f32bae13340)
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE at /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (unknown line)
_ZN5JMain12syncQueueGetEv at /MyApp/build/node_modules/node-julia/build/Release/nj.node (unknown line)
_Z8doImportRKN2v820FunctionCallbackInfoINS_5ValueEEE at /MyApp/build/node_modules/node-julia/build/Release/nj.node (unknown line)
_ZN2v88internal25FunctionCallbackArguments4CallEPFvRKNS_20FunctionCallbackInfoINS_5ValueEEEE at node (unknown line)
unknown function (ip: 0x7d9af1)
unknown function (ip: 0x3839292060a2)
unknown function (ip: 0x3839295ac819)
unknown function (ip: 0x38392921e8d5)
unknown function (ip: 0x3839295ac3a2)
unknown function (ip: 0x3839295a4c9f)
unknown function (ip: 0x38392958ff71)
unknown function (ip: 0x38392959a379)
unknown function (ip: 0x38392958ff71)
unknown function (ip: 0x3839292bf54c)
unknown function (ip: 0x3839292bd920)
unknown function (ip: 0x3839292bdbb4)
unknown function (ip: 0x3839292bd920)
unknown function (ip: 0x3839292a3b5e)
unknown function (ip: 0x3839292a614b)
unknown function (ip: 0x383929224ac6)
unknown function (ip: 0x3839292a2c46)
unknown function (ip: 0x38392929d12c)
unknown function (ip: 0x3839292998e0)
unknown function (ip: 0x383929290505)
unknown function (ip: 0x38392928fec4)
unknown function (ip: 0x38392926597f)
unknown function (ip: 0x383929264370)
unknown function (ip: 0x38392921ef40)
unknown function (ip: 0x38392921de90)
_ZN2v88internal9Execution4CallEPNS0_7IsolateENS0_6HandleINS0_6ObjectEEES6_iPS6_b at node (unknown line)
_ZN2v88Function4CallENS_6HandleINS_5ValueEEEiPS3_ at node (unknown line)
_ZN4node15LoadEnvironmentEPNS_11EnvironmentE at node (unknown line)
_ZN4node5StartEiPPc at node (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x617cdf)
unknown function (ip: (nil))
waTeim commented 8 years ago

Strange that this should be caught in jl_throw since the latest fix should prevent julia from catching signals in 0.4; are you using the latest from master or the latest release?

sebastiang commented 8 years ago

The latest release. I will try latest from master.

I’ve found the difference in provisioning. There was a script I wasn’t running on the putative ‘production’ machines which did another incantation of apt-get  update after adding some sources to the list. It was to get a build of chrome for dev machines, but I imagine it must have subtly changed which libraries were being linked. I’ll work to get to the bottom of it.

sebastiang commented 8 years ago

it would appear I have to build the latest build with JL_OPTIONS_HANDLE_SIGNALS_OFF defined?

sebastiang commented 8 years ago

Running with gdb and getting a trace on the error suggests a stack overflow. I bet the error is something on my side -- something not deployed correctly to my target machine. But whatever the problem is isn't shown to me because calls to find the error are somehow infinitely recurring.

#0  0x00007ffff6b9dc87 in _IO_vfprintf_internal (s=s@entry=0x7ffff4d8c6d0, format=<optimized out>, 
    format@entry=0x7ffff620919c "could not open file %s", ap=ap@entry=0x7ffff4d8c858) at vfprintf.c:1777
#1  0x00007ffff6bc42a3 in _IO_vasprintf (result_ptr=result_ptr@entry=0x7ffff4d8c800, format=format@entry=0x7ffff620919c "could not open file %s", 
    args=args@entry=0x7ffff4d8c858) at vasprintf.c:62
#2  0x00007ffff588368b in jl_vexceptionf (exception_type=0x7ffded9218d0, fmt=fmt@entry=0x7ffff620919c "could not open file %s", 
    args=args@entry=0x7ffff4d8c858) at builtins.c:56
#3  0x00007ffff5883b98 in jl_errorf (fmt=fmt@entry=0x7ffff620919c "could not open file %s") at builtins.c:73
#4  0x00007ffff58e4e26 in jl_load (fname=0x7ffdef581cb0 "/MyApp/node_modules/node-julia/lib/nj.jl", len=45) at toplevel.c:612
#5  0x00007fffee6dce80 in julia_include_680 () at boot.jl:261
#6  0x00007ffff587c16b in jl_apply (nargs=1, args=0x7ffff4d8ca90, f=<optimized out>) at julia.h:1328
#7  jl_apply_generic (F=0x7ffdef37d370, args=0x7ffff4d8ca90, nargs=<optimized out>) at gf.c:1684
#8  0x00007ffff58e7520 in jl_apply (nargs=1, args=0x7ffff4d8ca90, f=<optimized out>) at julia.h:1328
#9  jl_call1 (f=0x7ffdef37d370, a=0x7ffdee54e760) at jlapi.c:155
#10 0x00007ffff690349f in nj::Kernel::load() () from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#11 0x00007ffff6904585 in nj::Kernel::invoke(std::string const&, _jl_value_t*, _jl_value_t*) ()
   from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#12 0x00007ffff69046b0 in nj::Kernel::getError(_jl_value_t*, _jl_value_t*) () from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#13 0x00007ffff690f01f in nj::genJuliaError(_jl_value_t*) () from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#14 0x00007ffff6911f2e in nj::getJuliaException(_jl_value_t*) () from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#15 0x00007ffff6903608 in nj::Kernel::load() () from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#16 0x00007ffff6904585 in nj::Kernel::invoke(std::string const&, _jl_value_t*, _jl_value_t*) ()
   from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#17 0x00007ffff69046b0 in nj::Kernel::getError(_jl_value_t*, _jl_value_t*) () from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#18 0x00007ffff690f01f in nj::genJuliaError(_jl_value_t*) () from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#19 0x00007ffff6911f2e in nj::getJuliaException(_jl_value_t*) () from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#20 0x00007ffff6903608 in nj::Kernel::load() () from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#21 0x00007ffff6904585 in nj::Kernel::invoke(std::string const&, _jl_value_t*, _jl_value_t*) ()
   from /MyApp/build/node_modules/node-julia/build/Release/nj.node

The cycle (e.g. #17-#21) repeats as long as I'm willing to page through the results of backtrace

sebastiang commented 8 years ago
(gdb) thread apply all bt 8

Thread 3 (Thread 0x7ffff4d8a700 (LWP 8421)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff7704cdc in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007ffff68fca1d in JMain::asyncQueueGet() () from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#3  0x00007ffff690815f in Trampoline::operator()() () from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#4  0x00007ffff7709e40 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff6f1f182 in start_thread (arg=0x7ffff4d8a700) at pthread_create.c:312
#6  0x00007ffff6c4c47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7ffff558b700 (LWP 8420)):
#0  0x00007ffff6b9dc87 in _IO_vfprintf_internal (s=s@entry=0x7ffff4d8c6d0, format=<optimized out>, 
    format@entry=0x7ffff620919c "could not open file %s", ap=ap@entry=0x7ffff4d8c858) at vfprintf.c:1777
#1  0x00007ffff6bc42a3 in _IO_vasprintf (result_ptr=result_ptr@entry=0x7ffff4d8c800, format=format@entry=0x7ffff620919c "could not open file %s", 
    args=args@entry=0x7ffff4d8c858) at vasprintf.c:62
#2  0x00007ffff588368b in jl_vexceptionf (exception_type=0x7ffded9218d0, fmt=fmt@entry=0x7ffff620919c "could not open file %s", 
    args=args@entry=0x7ffff4d8c858) at builtins.c:56
#3  0x00007ffff5883b98 in jl_errorf (fmt=fmt@entry=0x7ffff620919c "could not open file %s") at builtins.c:73
#4  0x00007ffff58e4e26 in jl_load (fname=0x7ffdef581cb0 "/MyApp/node_modules/node-julia/lib/nj.jl", len=45) at toplevel.c:612
#5  0x00007fffee6dce80 in julia_include_680 () at boot.jl:261
#6  0x00007ffff587c16b in jl_apply (nargs=1, args=0x7ffff4d8ca90, f=<optimized out>) at julia.h:1328
#7  jl_apply_generic (F=0x7ffdef37d370, args=0x7ffff4d8ca90, nargs=<optimized out>) at gf.c:1684
(More stack frames follow...)

Thread 1 (Thread 0x7ffff7fea780 (LWP 8416)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff7704cdc in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007ffff68fbe0d in JMain::syncQueueGet() () from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#3  0x00007ffff693ab99 in doImport(v8::FunctionCallbackInfo<v8::Value> const&) () from /MyApp/build/node_modules/node-julia/build/Release/nj.node
#4  0x00000000007b8e62 in v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) ()
#5  0x00000000007d9af1 in ?? ()
#6  0x00002486aa5060a2 in ?? ()
#7  0x00002486aa506001 in ?? ()
(More stack frames follow...)
waTeim commented 8 years ago

it would appear I have to build the latest build with JL_OPTIONS_HANDLE_SIGNALS_OFF defined? That value is defined in julia.h but only in version 0.4+, thus the ifdef.

waTeim commented 8 years ago

The infinite recursion usually stems from some error when loading nj.jl since that acts as a generic error processor. If there's an error in nj.jl then it processes an error which loads nj.jl which causes an error, etc. In each of thread 1 and thread 3, it appears the process is blocked in cond_wait and doing nothing except waiting on the result. Thread 2 is where all the action is. if npm install succeeded, then lib/nj.jl should exist but like you said maybe something about this install is messed up?

This is apt-get issue resolved? I've seen lots of problems with ubuntu apt-get install node because it puts it in /usr/bin/node and it's waaaay too old and then both n and (and probably nvm) put it in /usr/local/bin, and there are weird circumstances where both versions of node end up getting used simultaneously (especially by node-gyp).

sebastiang commented 8 years ago

It's al little hard to figure out what's going on, but it appears on a tightened up environment I get UVError exceptions thrown if when calling import through node-julia, any pre-compilation needs to be carried out. If I arrange for all precompilation to happen out of band, then my app starts up properly.

sebastiang commented 8 years ago

I assume this is because of a failure to spawn a child julia instance to do the compilation ,but just why, or why the error isn't cleanly conveyed is somewhat beyond me.

sebastiang commented 8 years ago

I also get a hang when a binary dependency is missing (bad deployment my part), but obviously triggering the stack overflow is problematic; we can't see the underlying error.

sebastiang commented 8 years ago

I'll close this as I have a workaround and can't contribute enough context to reliably reproduce.