tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.35k stars 1.92k forks source link

Invalid Memory Access leads to SIGABRT or SIGSEGV with Tensorflow + nodejs #7719

Open ryanhugh opened 1 year ago

ryanhugh commented 1 year ago

System information

Describe the current behavior

I would like to use @tensorflow/tfjs-node in Node.js to run @vladmandic/face-api and nsfwjs to detect people and identify nsfw content in images. I have it all set up and everything works perfectly while running on my local MacBook (13.3.1 Ventura, M2 ARM). However, when I deploy this to my production servers, they frequently crash with memory errors (SIGABRT or SIGSEGV).

Usually the memory error is one of these two:

corrupted size vs. prev_size
corrupted size vs. prev_size while consolidating

But I have also see these as well:

malloc(): unaligned tcache chunk detected
free(): invalid size
munmap_chunk(): invalid pointer

This kills the entire Node.js process with a SIGABRT or SIGSEGV which cannot happen in production. I have tried different servers on my hosting provider, different versions of Ubuntu (Ubuntu 22.04 LTS x86_64 and Ubuntu 20.04.6 LTS). I've also tried different Tensorflow versions (4.1.0 and 4.6.0) and diferent Node.js versions (16, 18, 20). I'm confident this is an issue with tensorflow because I ran it and captured this below log where it crashes halfway through a debugging output.

tensorflowdebugmode_upload.log

Describe the expected behavior Not to crash.

Standalone code to reproduce the issue I can reproduce the issue very consistently when running @vladmandic/face-api, nsfwjs and a few other pure-JS packages on images (eg. tesseract.js).

Repository with code to reproduce crash: https://github.com/ryanhugh/tensorflow-crash-repro

touploadtogithub.txt

vladmandic commented 1 year ago

my $0.02 since @ryanhugh asked for opinion ;)

first thing i'd look at is os system logs to see where exactly is the SIG happening. can be in nodejs (not likely), node-bindings or bundled tensorflow.so itself.

if its in bindings, than issue belongs here. if its in tensorflow.so, well tfjs-node 4.6.0 packages tensorflow 2.9.1 while latest is 2.12.0,
so first thing i'd look is how to get that in to avoid working on something that may-or-may-not already been solved
(which does need updated bindings), so again, it belongs here

on a total side-note, i've been quite disapointed with built-in glibc malloc when it gets streched by ML frameworks
when working with pytorch, i've had much better luck using tcmalloc

sudo apt install libgoogle-perftools-dev
sudo ldconfig
export LD_PRELOAD=libtcmalloc.so

ryanhugh commented 1 year ago

Thank you for your help! Much appreciated :)

Which OS system logs do you recommend looking at ?

I just ran the command with gdb, caught it when it crashed, and got this output. Looks like the issue is with libtensorflow.so.2, if I'm reading this correctly. I'll look into using the latest tensorflow version, and see if that helps. Also trying to put together a repro-case that is isolated from the rest of my codebase, but this is a suprisingly hard issue to pin down...

TF backend: tensorflow
got width and height {
  unreliable: true,
  numTensors: 517,
  numDataBuffers: 517,
  numBytes: 111665556
}
corrupted size vs. prev_size while consolidating

Thread 1 "node" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352685568) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352685568) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352685568) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352685568, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7842476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff78287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff78896f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff79dbb8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff78a0d7c in malloc_printerr (str=str@entry=0x7ffff79de820 "corrupted size vs. prev_size while consolidating") at ./malloc/malloc.c:5664
#7  0x00007ffff78a2fa2 in _int_free (av=0x7ffff7a19c80 <main_arena>, p=0x1d8cf580, have_lock=<optimized out>) at ./malloc/malloc.c:4606
#8  0x00007ffff78a54d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9  0x00007fffa9cbe0f9 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#10 0x00007fffa9cbecfa in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#11 0x00007fffa9dcae60 in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#12 0x00007fffa9927e2c in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#13 0x00007fffa9579473 in dnnl_sgemm ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#14 0x00007fffa62832b9 in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 8, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 8, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#15 0x00007fffa628d095 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#16 0x00007fffa628ef4c in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#17 0x00007fffa628f591 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#18 0x00007fffa628f558 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#19 0x00007fffa62a7fdb in void Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::evalProductImpl<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen
--Type <RET> for more, q to quit, c to continue without paging--
> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, 0>(float*, Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback) const ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#20 0x00007fffa62a8f01 in Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const> const, Eigen::ThreadPoolDevice, true, (Eigen::internal::TiledEvaluation)0>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const> const&, Eigen::ThreadPoolDevice const&) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#21 0x00007fffa8a91bc0 in tensorflow::(anonymous namespace)::LaunchGeneric<Eigen::ThreadPoolDevice, float>::operator()(tensorflow::OpKernelContext*, tensorflow::Tensor const&, tensorflow::Tensor const&, int, int, int, int, tensorflow::Padding const&, std::vector<long, std::allocator<long> > const&, tensorflow::Tensor*, tensorflow::TensorFormat) [clone .isra.0] [clone .constprop.0] ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#22 0x00007fffa8a91e01 in tensorflow::LaunchConv2DOp<Eigen::ThreadPoolDevice, float>::operator()(tensorflow::OpKernelContext*, bool, bool, tensorflow::Tensor const&, tensorflow::Tensor const&, int, int, int, int, tensorflow::Padding const&, std::vector<long, std::allocator<long> > const&, tensorflow::Tensor*, tensorflow::TensorFormat) () from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#23 0x00007fffa8a92d3f in tensorflow::Conv2DOp<Eigen::ThreadPoolDevice, float>::Compute(tensorflow::OpKernelContext*) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#24 0x00007fffb6d76fbb in tensorflow::ThreadPoolDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow_framework.so.2
#25 0x00007fffa4c26187 in tensorflow::KernelAndDeviceOp::Run(tensorflow::ScopedStepContainer*, tensorflow::EagerKernelArgs const&, std::vector<absl::lts_20211102::variant<tensorflow::Tensor, tensorflow::TensorShape>, std::allocator<absl::lts_20211102::variant<tensorflow::Tensor, tensorflow::TensorShape> > >*, tensorflow::CancellationManager*, absl::lts_20211102::optional<tensorflow::EagerFunctionParams> const&, absl::lts_20211102::optional<tensorflow::ManagedStackTrace> const&, tensorflow::CoordinationServiceAgent*) () from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#26 0x00007fffa4bd67c9 in tensorflow::EagerKernelExecute(tensorflow::EagerContext*, absl::lts_20211102::InlinedVector<tensorflow::TensorHandle*, 4ul, std::allocator<tensorflow::TensorHandle*> > const&, absl::lts_20211102::optional<tensorflow::EagerFunctionParams> const&, std::unique_ptr<tensorflow::KernelAndDevice, tensorflow::core::RefCountDeleter> const&, tensorflow::GraphCollector*, tensorflow::CancellationManager*, absl::lts_20211102::Span<tensorflow::TensorHandle*>, absl::lts_20211102::optional<tensorflow::ManagedStackTrace> const&) () from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#27 0x00007fffa4bd7b89 in tensorflow::ExecuteNode::Run() () from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#28 0x00007fffa4c1dd50 in tensorflow::EagerExecutor::SyncExecute(tensorflow::EagerNode*) () from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#29 0x00007fffa4bd1c46 in tensorflow::(anonymous namespace)::EagerLocalExecute(tensorflow::EagerOperation*, tensorflow::TensorHandle**, int*) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#30 0x00007fffa4bd22b4 in tensorflow::EagerExecute(tensorflow::EagerOperation*, tensorflow::TensorHandle**, int*) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#31 0x00007fffa2fb3ae0 in tensorflow::EagerOperation::Execute(absl::lts_20211102::Span<tensorflow::AbstractTensorHandle*>, int*) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#32 0x00007fffa4c3244a in tensorflow::CustomDeviceOpHandler::Execute(tensorflow::ImmediateExecutionOperation*, tensorflow::ImmediateExecutionTensorHandle**, int*) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#33 0x00007fffa0c8c9f6 in TFE_Execute () from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#34 0x00007ffff7bf674e in tfnodejs::TFJSBackend::ExecuteOp(napi_env__*, napi_value__*, napi_value__*, napi_value__*, napi_value__*) ()
   from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/tfjs_binding.node
#35 0x00007ffff7bf9f7c in tfnodejs::ExecuteOp(napi_env__*, napi_callback_info__*) () from /home/ryan/mycodeproject/node_modules/@tensorflow/tfjs-node/lib/napi-v8/tfjs_binding.node
#36 0x0000000000c2ee99 in v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) ()
#37 0x0000000000ec8f0c in v8::internal::InvokeFunctionCallback(v8::FunctionCallbackInfo<v8::Value> const&, void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) ()
vladmandic commented 1 year ago

since you already have gdb output, no point of looking into system logs. and yes, this is deep inside tensorflow shared library that is prepackaged with tfjs-node. if i'm reading this correctly, it all goes wrong around dnnl_sgemm which is commonly used when executing matmul operations - which are pretty much everywhere and anywhere. but beyond that, not much i can say. perhps devs here can chime in more now this info is available?

ryanhugh commented 1 year ago

Here's another stack trace:


Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x00007ffff7e64eba in tfnodejs::TFJSBackend::InsertHandle(TFE_TensorHandle*) () from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/tfjs_binding.node
(gdb) bt
#0  0x00007ffff7e64eba in tfnodejs::TFJSBackend::InsertHandle(TFE_TensorHandle*) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/tfjs_binding.node
#1  0x00007ffff7e654d6 in tfnodejs::TFJSBackend::GenerateOutputTensorInfo(napi_env__*, TFE_TensorHandle*) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/tfjs_binding.node
#2  0x00007ffff7e6801f in tfnodejs::TFJSBackend::ExecuteOp(napi_env__*, napi_value__*, napi_value__*, napi_value__*, napi_value__*) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/tfjs_binding.node
#3  0x00007ffff7e6b5f0 in tfnodejs::ExecuteOp(napi_env__*, napi_callback_info__*) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/tfjs_binding.node
#4  0x0000000000c2ee99 in v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) ()
#5  0x0000000000f1470f in v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) ()
#6  0x0000000000f14f7d in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, unsigned long*, int) ()
#7  0x0000000000f15445 in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) ()
#8  0x000000000191cdf6 in Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit ()
ryanhugh commented 1 year ago

Here's a reproduction repository too: https://github.com/ryanhugh/tensorflow-crash-repro

And another stack trace:

Thread 1 "node" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352685568) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352685568) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352685568) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352685568, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7842476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff78287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff78896f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff79dbb8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff78a0d7c in malloc_printerr (str=str@entry=0x7ffff79de820 "corrupted size vs. prev_size while consolidating") at ./malloc/malloc.c:5664
#7  0x00007ffff78a2fa2 in _int_free (av=0x7ffff7a19c80 <main_arena>, p=0x1785cba0, have_lock=<optimized out>) at ./malloc/malloc.c:4606
#8  0x00007ffff78a54d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9  0x00007fffb9cbe0f9 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#10 0x00007fffb9cbecfa in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#11 0x00007fffb9dcae60 in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#12 0x00007fffb9927e2c in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#13 0x00007fffb9579473 in dnnl_sgemm ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#14 0x00007fffb62832b9 in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 8, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 8, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#15 0x00007fffb628d095 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#16 0x00007fffb628ef4c in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#17 0x00007fffb628f591 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#18 0x00007fffb628f558 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#19 0x00007fffb62a7fdb in void Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::evalProductImpl<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen
--Type <RET> for more, q to quit, c to continue without paging-- 8
> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, 0>(float*, Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback) const ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#20 0x00007fffb62a8f01 in Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const> const, Eigen::ThreadPoolDevice, true, (Eigen::internal::TiledEvaluation)0>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const> const&, Eigen::ThreadPoolDevice const&) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#21 0x00007fffb8a91bc0 in tensorflow::(anonymous namespace)::LaunchGeneric<Eigen::ThreadPoolDevice, float>::operator()(tensorflow::OpKernelContext*, tensorflow::Tensor const&, tensorflow::Tensor const&, int, int, int, int, tensorflow::Padding const&, std::vector<long, std::allocator<long> > const&, tensorflow::Tensor*, tensorflow::TensorFormat) [clone .isra.0] [clone .constprop.0] ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#22 0x00007fffb8a91e01 in tensorflow::LaunchConv2DOp<Eigen::ThreadPoolDevice, float>::operator()(tensorflow::OpKernelContext*, bool, bool, tensorflow::Tensor const&, tensorflow::Tensor const&, int, int, int, int, tensorflow::Padding const&, std::vector<long, std::allocator<long> > const&, tensorflow::Tensor*, tensorflow::TensorFormat) () from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#23 0x00007fffb8a92d3f in tensorflow::Conv2DOp<Eigen::ThreadPoolDevice, float>::Compute(tensorflow::OpKernelContext*) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#24 0x00007fffaeb76fbb in tensorflow::ThreadPoolDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow_framework.so.2
#25 0x00007fffb4c26187 in tensorflow::KernelAndDeviceOp::Run(tensorflow::ScopedStepContainer*, tensorflow::EagerKernelArgs const&, std::vector<absl::lts_20211102::variant<tensorflow::Tensor, tensorflow::TensorShape>, std::allocator<absl::lts_20211102::variant<tensorflow::Tensor, tensorflow::TensorShape> > >*, tensorflow::CancellationManager*, absl::lts_20211102::optional<tensorflow::EagerFunctionParams> const&, absl::lts_20211102::optional<tensorflow::ManagedStackTrace> const&, tensorflow::CoordinationServiceAgent*) () from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#26 0x00007fffb4bd67c9 in tensorflow::EagerKernelExecute(tensorflow::EagerContext*, absl::lts_20211102::InlinedVector<tensorflow::TensorHandle*, 4ul, std::allocator<tensorflow::TensorHandle*> > const&, absl::lts_20211102::optional<tensorflow::EagerFunctionParams> const&, std::unique_ptr<tensorflow::KernelAndDevice, tensorflow::core::RefCountDeleter> const&, tensorflow::GraphCollector*, tensorflow::CancellationManager*, absl::lts_20211102::Span<tensorflow::TensorHandle*>, absl::lts_20211102::optional<tensorflow::ManagedStackTrace> const&) () from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#27 0x00007fffb4bd7b89 in tensorflow::ExecuteNode::Run() () from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#28 0x00007fffb4c1dd50 in tensorflow::EagerExecutor::SyncExecute(tensorflow::EagerNode*) () from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#29 0x00007fffb4bd1c46 in tensorflow::(anonymous namespace)::EagerLocalExecute(tensorflow::EagerOperation*, tensorflow::TensorHandle**, int*) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#30 0x00007fffb4bd22b4 in tensorflow::EagerExecute(tensorflow::EagerOperation*, tensorflow::TensorHandle**, int*) () from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#31 0x00007fffb2fb3ae0 in tensorflow::EagerOperation::Execute(absl::lts_20211102::Span<tensorflow::AbstractTensorHandle*>, int*) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#32 0x00007fffb4c3244a in tensorflow::CustomDeviceOpHandler::Execute(tensorflow::ImmediateExecutionOperation*, tensorflow::ImmediateExecutionTensorHandle**, int*) ()
   from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#33 0x00007fffb0c8c9f6 in TFE_Execute () from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/../../deps/lib/libtensorflow.so.2
#34 0x00007ffff7e67fcb in tfnodejs::TFJSBackend::ExecuteOp(napi_env__*, napi_value__*, napi_value__*, napi_value__*, napi_value__*) () from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/tfjs_binding.node
#35 0x00007ffff7e6b5f0 in tfnodejs::ExecuteOp(napi_env__*, napi_callback_info__*) () from /home/ryanhughes/tensorflow-crash-repro/node_modules/@tensorflow/tfjs-node/lib/napi-v8/tfjs_binding.node
#36 0x0000000000c2ee99 in v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) ()
#37 0x0000000000ec8f0c in v8::internal::InvokeFunctionCallback(v8::FunctionCallbackInfo<v8::Value> const&, void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) ()
ryanhugh commented 1 year ago

@mattsoulanille Let me know if you can look into this. Thank you!

muhammadsohaib60 commented 1 year ago

Hello , It is easy. Can I help .

muhammadsohaib60 commented 1 year ago

1) First dependencies installed latest versions of TensorFlow.js . Use package managers like npm or yarn to update the packages to their latest compatible versions.

muhammadsohaib60 commented 1 year ago

2) Secondly set breakpoints, inspect variables, and step through the code to identify the exact location where the segmentation fault occurs.

ryanhugh commented 1 year ago

@mattsoulanille Let me know if you can look into this. Thank you!

ryanhugh commented 1 year ago

Hey all - I just got this code working by re-arranging some Node.js code. Uploaded it to my repo (https://github.com/ryanhugh/tensorflow-crash-repro) No longer blocked by this issue, but it still seems like there is a memory leak in these bindings, tensorflow, or node. I'll let someone from tensorflow investigate.