Open BlobMaster41 opened 2 months ago
After some more investigation, I can confirm that this is the problem, I have recompiled napi from source and removed the entire custom_gc function. The program no longer segfault.
After ~2milion calls, my program now leak about 53GB of memory. It is not perfect but at least it prevent a fatal segfault.
This is a critial issue, I can not identify the cause of this behavior.
You may take a look at my project source:
It should be fixed in 3.0.0-alpha, can you upgrade and test again? @BlobMaster41
Will try it asap, I need to make it compatible with NAPI 3.
Hey @Brooooooklyn, I have converted my vm to napi3 and im having a problem and I dont understand what is happening. It's coming from napi3.
when calling a threadsafe function from rust, im getting the following result:
Error { status: "Ok", reason: "" } Error calling tsfn function: Ok
whatever I do.
You may take a look at my implementation:
https://github.com/btc-vision/op-vm/pull/118
It is very critical that I find my problem before going into production.
Thanks for your help!
@BlobMaster41 how can I reproduce it in your project? Can you give me the reproduce steps?
Hey @Brooooooklyn yes.
You can easily replicate the problem by following these specific steps:
If one test fail with:
[OPNetUnit DEBUG]: Running test: Call depth tests - should fail to do more nested calls than the maximum allowed
ExitData::to_napi_value
[GenericExternalFunction] Executing with data: [0, 0, 0, 0, 1, 248, 240, 149, 0, 0, 0, 1, 119, 29, 114, 31, 164, 121, 175, 78, 70, 168, 44, 87, 112, 196, 115, 227, 100, 118, 243, 46, 77, 231, 51, 215, 18, 218, 214, 95, 143, 146, 48, 208, 0, 0, 0, 8, 97, 245, 7, 2, 0, 0, 0, 199]
Error { status: "Ok", reason: "" }
Error calling tsfn function: Ok
Error: RuntimeError:
at <unnamed> (<module>[79]:0x1b32)
at <unnamed> (<module>[81]:0x1e17)
at <unnamed> (<module>[69]:0x187d)
Caused by:
Root cause: RuntimeStringError { details: "" }
That's a problem. I added println in the code that log the response that the call tsfn.call_async returns.
You can search in the op-vm project:
let fut = async move {
println!(
"[GenericExternalFunction] Executing with data: {:?}",
request.buffer
);
let promise = tsfn.call_async(Ok(request)).await;
let promise = match promise {
Ok(promise) => promise,
Err(e) => {
println!("{:?}", e);
println!("Error calling tsfn function: {}", e);
return Err(RuntimeError::new(e.reason));
}
};
let buffer = promise.await.map_err(|e| {
println!("Error awaiting promise: {}", e);
RuntimeError::new(e.reason)
})?;
Ok(buffer.to_vec())
};
runtime.block_on(fut)
}
If after fixing the issue an other problem raise, let me know, I don't know if the tests will still pass under napi3.
Good news. I located the cause of the Ok().
The problem raise from threadsafe_function.rs
Here is the code that have a problem:
if let ThreadsafeFunctionCallVariant::WithCallback = call_variant {
// throw Error in JavaScript callback
let callback_arg = if status == sys::Status::napi_pending_exception {
let mut exception = ptr::null_mut();
status = unsafe { sys::napi_get_and_clear_last_exception(raw_env, &mut exception) };
let mut error_reference = ptr::null_mut();
unsafe { sys::napi_create_reference(raw_env, exception, 1, &mut error_reference) };
Err(Error {
maybe_raw: error_reference,
maybe_env: raw_env,
raw: true,
status: Status::from(status),
reason: "".to_owned(),
})
} else {
unsafe { Return::from_napi_value(raw_env, return_value) }
};
if let Err(err) = callback(callback_arg, Env::from_raw(raw_env)) {
unsafe { sys::napi_fatal_exception(raw_env, JsError::from(err).into_value(raw_env)) };
}
}
status
}
If you log
Error {
maybe_raw: error_reference,
maybe_env: raw_env,
raw: true,
status: Status::from(status),
reason: "".to_owned(),
}
It will always log "Ok" for some reasons. But if you do:
if let ThreadsafeFunctionCallVariant::WithCallback = call_variant {
// throw Error in JavaScript callback
let callback_arg = if status == sys::Status::napi_pending_exception {
let mut exception = ptr::null_mut();
unsafe { sys::napi_get_and_clear_last_exception(raw_env, &mut exception) };
let mut error_ref = ptr::null_mut();
status = unsafe { sys::napi_create_reference(raw_env, exception, 1, &mut error_ref) };
let err: Error = unsafe {
JsUnknown::from_raw_unchecked(raw_env, exception)
}.into();
println!("callback error: {}", err);
let err = Error {
maybe_raw: error_ref,
maybe_env: raw_env,
raw: true,
status: Status::from(status),
reason: String::new(),
};
Err(err)
} else {
unsafe { Return::from_napi_value(raw_env, return_value) }
};
if let Err(err) = callback(callback_arg, Env::from_raw(raw_env)) {
println!("callback returned error: {:?}", err);
unsafe { sys::napi_fatal_exception(raw_env, JsError::from(err).into_value(raw_env)) };
}
}
status
}
Now, you can see this in console:
callback error: GenericFailure, TypeError: Cannot read properties of null (reading 'buffer')
This makes me think that something in wrong in the snippet I send for napi 3. Error management have a problem?
If I do
if let ThreadsafeFunctionCallVariant::WithCallback = call_variant {
// throw Error in JavaScript callback
let callback_arg = if status == sys::Status::napi_pending_exception {
let mut exception = ptr::null_mut();
unsafe { sys::napi_get_and_clear_last_exception(raw_env, &mut exception) };
let mut error_ref = ptr::null_mut();
status = unsafe { sys::napi_create_reference(raw_env, exception, 1, &mut error_ref) };
let err: Error = unsafe {
JsUnknown::from_raw_unchecked(raw_env, exception)
}.into();
Err(err)
} else {
unsafe { Return::from_napi_value(raw_env, return_value) }
};
if let Err(err) = callback(callback_arg, Env::from_raw(raw_env)) {
println!("callback returned error: {:?}", err);
unsafe { sys::napi_fatal_exception(raw_env, JsError::from(err).into_value(raw_env)) };
}
}
status
}
I get the error in js as well but, I don't know if this is valid since I didnt make napi. There could be an other error if something else goes wrong.
On an other note, once I applied this patch & corrected the js error I can run my unit tests.
One observation that I have already is:
It hangs for a couple seconds when the execution is completed.
I didnt had this issue on napi2.7
I will check to see if it's still segfault.
I can confirm it still segfault.
I will investigate with gdb and tell you whats the new trace.
@BlobMaster41 can not get npm run test-contract
work
Hey @Brooooooklyn sorry! I had an other local dependency I forgot to change. I pushed to the same branch again for the repo "unit-test-framework". It should work now, please run npm i and try again!
Please note that you will run the issue with the error handling in napi3 like explained in my previous comments.
I patched this problem locally with what I mentioned in my comments. I dont know if this is the correct solution but, napi3 hang for some reasons. You will see what I mean once you have it working.
--- AN OTHER ISSUE UNREALTED TO THE NAPI3 ERROR ISSUE (still segfault under napi3 as well) ---
To check the hanging, please go in the following branches (I fixed the problem that the merge/napi3 branch have, i had a js bug, thats why I was unable to upgrade to napi3 but having this problem made me found an other problem in napi3.)
Switch to: op-vm -> error/fix-napi3-error-handling unit-test-framework -> napi3-fix-test
You should see the hang after running the same tests. It hangs longer on intel than amd for some reasons.
I collected the backtrace on NAPI3:
Thread 19 "node" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe87fff700 (LWP 1707879)]
0x0000000001363f23 in v8::internal::GlobalHandles::Destroy(unsigned long*) ()
(gdb) bt
#0 0x0000000001363f23 in v8::internal::GlobalHandles::Destroy(unsigned long*) ()
#1 0x0000000000ee9f32 in v8impl::Reference::~Reference() ()
#2 0x0000000000ef511f in napi_delete_reference ()
#3 0x00007ffdbd5eecf5 in ?? () from /root/op-vm/op-vm.linux-x64-gnu.node
#4 0x0000000000f111c9 in v8impl::(anonymous namespace)::ThreadSafeFunction::AsyncCb(uv_async_s*) ()
#5 0x0000000001ca72f3 in uv__async_io (loop=0x7ffe87ffe9c8, w=<optimized out>, events=<optimized out>) at ../deps/uv/src/unix/async.c:176
#6 0x0000000001cbce64 in uv__io_poll (loop=loop@entry=0x7ffe87ffe9c8, timeout=<optimized out>) at ../deps/uv/src/unix/linux.c:1564
#7 0x0000000001ca8017 in uv_run (loop=0x7ffe87ffe9c8, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:458
#8 0x0000000000e526d6 in node::SpinEventLoopInternal(node::Environment*) ()
#9 0x000000000109ab47 in node::worker::Worker::Run() ()
#10 0x000000000109acf9 in node::worker::Worker::StartThread(v8::FunctionCallbackInfo<v8::Value> const&)::{lambda(void*)#1}::_FUN(void*) ()
#11 0x00007ffff7c51609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007ffff7b76353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
@BlobMaster41 I just remembered, this error is not related to NAPI-RS; it occurs when using hard link .node
file in Node.js
I've upgraded to NAPI-RS beta in your project, here is pr: https://github.com/btc-vision/op-vm/pull/121
beta.2 enhanced the error status and messages:
Hey @Brooooooklyn. Thanks for this patch about errors. I have a working branch that you can try:
Switch to: op-vm -> error/fix-napi3-error-handling unit-test-framework -> napi3-fix-test
I noticed that switching from napi2.7 to napi3 hang now. When the program is done running, it takes about ~10second for it to fully stop. On napi2.7 I had the same issue, I resolved it by doing abort on tsfn but on NAPI3 I think the drop is now automatic?
Eitherway, when using napi3 now, it hang.
After running intensive tests, it still segfaults.
What I am wondering is, what if two library use napi? Could that be the cause of the issue?
Our program use op-vm and https://github.com/btc-vision/rust-merkle-tree to generate merkle tree and this library also use napi.
Now, the segfault happens in op-vm but could something from rust-merkle-tree break the gc collection of op-vm?
I will provide a new gdb dump using napi3.beta2.
@BlobMaster41 If you don't want the ThreadsafeFunction
block your program, you can declare the ThreadsafeFunction
as Weak
, change the Weak
to true
here:
Our program use op-vm and https://github.com/btc-vision/rust-merkle-tree to generate merkle tree and this library also use napi.
No, it shouldn't be a problem; We have several servers that depend on 4 to 5 NAPI-RS libraries and have been running stably for several years.
Your segfault is caused by hardlink and dll cache as the describe in https://stackoverflow.com/questions/45954861/how-to-circumvent-dlopen-caching#:~:text=5-,POSIX%20says%3A,-Only%20a%20single
When you declare dependency like this, the npm create hardlink of the project in the node_modules
:
If you want to avoid the segfault, you can copy the dist manually to the node_modules
under the unit-test-framework rather than declare the file:..
protocol dependency of it.
@BlobMaster41 If you don't want the
ThreadsafeFunction
block your program, you can declare theThreadsafeFunction
asWeak
, change theWeak
totrue
here:
Our program use op-vm and https://github.com/btc-vision/rust-merkle-tree to generate merkle tree and this library also use napi.
No, it shouldn't be a problem; We have several servers that depend on 4 to 5 NAPI-RS libraries and have been running stably for several years.
Your segfault is caused by hardlink and dll cache as the describe in https://stackoverflow.com/questions/45954861/how-to-circumvent-dlopen-caching#:~:text=5-,POSIX%20says%3A,-Only%20a%20single
When you declare dependency like this, the npm create hardlink of the project in the
node_modules
:
If you want to avoid the segfault, you can copy the dist manually to the
node_modules
under the unit-test-framework rather than declare thefile:..
protocol dependency of it.
Hey, thanks for responding. The ../op-vm is only for dev. In production its set to @btc-vision/op-vm and it still segfaults.
I dont think thats the issue..
Could it be caused because we are using it in worker threads?
Could it be caused because we are using it in worker threads?
Yes it could be. The biggest problem here is that I can't reliably reproduce this segfault, so I can't debug it.
Also, which version of Node is your program that encountered the segfault running, and what operating system and CPU model are you using?
Hey @Brooooooklyn, for the segfault, it happens on my pc and my servers that use way different cpu, im on intel and my servers are on different amd versions.
To reproduce the segfault, i would have to explain to you how to do it. Its in the project opnet-node but it requires some specific configurations. Do you have any chats like telegram or discord I can reach out to you on?
@BlobMaster41 I can confirm it was caused by worker_threads
, I can reproduce it in the unit test here: https://github.com/napi-rs/napi-rs/blob/main/examples/napi/__tests__/worker-thread.spec.ts#L55
If I change the unit test from worker_threads
into the normal Node.js main thread code, the issue was gone.
Maybe related: https://github.com/nodejs/node/issues/55706
@Brooooooklyn Whats the next step from here? I need worker threads..
Ok I tried some stuff out and I noticed that if I change everything from Buffer to Uint8Array, it does not segfault but node js crash instead.
#
# Fatal error in , line 0
# Check failed: node->IsInUse().
#
#
#
#FailureMessage Object: 0x7fe477fba1e0
----- Native stack trace -----
1: 0xfe3191 [node]
2: 0x279da3b V8_Fatal(char const*, ...) [node]
3: 0x1363ff9 v8::internal::GlobalHandles::Destroy(unsigned long*) [node]
4: 0xe51672 node::CallbackScope::~CallbackScope() [node]
5: 0xf1123a [node]
6: 0x1ca72f3 [node]
7: 0x1cbce64 [node]
8: 0x1ca8017 uv_run [node]
9: 0xe526d6 node::SpinEventLoopInternal(node::Environment*) [node]
10: 0x109ab47 node::worker::Worker::Run() [node]
11: 0x109acf9 [node]
12: 0x7fe9818a9609 [/lib/x86_64-linux-gnu/libpthread.so.0]
13: 0x7fe9817ce353 clone [/lib/x86_64-linux-gnu/libc.so.6]
Trace/breakpoint trap (core dumped)
Even converting everything to strings and attempting to send string from node js to napi and string from napi to node js result in a fatal segfault after a while.
An other older alternative to worker_threads is cluster. Do you think cluster could work instead? I could switch my code to use cluster instead of worker_threads temporally until this is resolved.
@BlobMaster41 I created a simple example to demonstrate how to maintain API consistency while avoiding the use of ThreadsafeFunction
in worker_threads
: https://github.com/Brooooooklyn/threadsafe_function_in_woker_threads_workaround
@BlobMaster41 can you try the napi@3.0.0-beta.4
? I've made some workaround for the ThreadsafeFunction usages
@BlobMaster41 can you try the
napi@3.0.0-beta.4
? I've made some workaround for the ThreadsafeFunction usages
Thanks! Give me 1h and I try that. Hopefully it resolves the issue :D
Give me a bit I have so many errors..
@BlobMaster41 there is a breaking change here: https://github.com/napi-rs/napi-rs/pull/2672. it's about ThreadsafeFunction
signature
@BlobMaster41 there is a breaking change here: #2672. it's about
ThreadsafeFunction
signature
Ya, just fixed everything. Im trying now
Hey @Brooooooklyn https://github.com/btc-vision/op-vm/actions/runs/15458050738/job/43514025151?pr=122 do you know whys that?
@Brooooooklyn Still segfault sadly...
This time it got a super long backtrace.
#0 0x0000000001363f23 in v8::internal::GlobalHandles::Destroy(unsigned long*) ()
#1 0x0000000000ee9f32 in v8impl::Reference::~Reference() ()
#2 0x0000000000ef511f in napi_delete_reference ()
#3 0x00007ffd6f259dac in <napi::bindgen_runtime::js_values::buffer::Buffer as core::ops::drop::Drop>::drop (self=0x7ffcb7ff9ad0) at src/bindgen_runtime/js_values/buffer.rs:346
#4 0x00007ffd6f25906b in core::ptr::drop_in_place<napi::bindgen_runtime::js_values::buffer::Buffer> () at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/ptr/mod.rs:523
#5 0x00007ffd6f156cb3 in core::ptr::drop_in_place<core::option::Option<napi::bindgen_runtime::js_values::buffer::Buffer>> () at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/ptr/mod.rs:523
#6 0x00007ffd6f1f284e in op_vm::interfaces::napi::js_contract_manager::ContractManager::instantiate (self=0x7ed939e54d40, reserved_id=..., address=..., bytecode=..., used_gas=..., max_gas=..., memory_pages_used=...,
network=op_vm::interfaces::napi::bitcoin_network_request::BitcoinNetworkRequest::Testnet, is_debug_mode=false, return_proofs=false) at src/interfaces/napi/js_contract_manager.rs:393
#7 0x00007ffd6f0de07b in op_vm::interfaces::napi::js_contract_manager::__napi_impl_helper_ContractManager_0::_napi_internal_register_instantiate::{{closure}} (cb=...) at src/interfaces/napi/js_contract_manager.rs:208
#8 0x00007ffd6f0cf9ed in core::result::Result<T,E>::and_then (self=..., op=...) at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/result.rs:1353
#9 0x00007ffd6f1f6874 in op_vm::interfaces::napi::js_contract_manager::__napi_impl_helper_ContractManager_0::_napi_internal_register_instantiate (env=0x7ffca02a3df0, cb=0x7ffcb7ff9e00) at src/interfaces/napi/js_contract_manager.rs:208
#10 0x0000000000ee9da5 in v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) ()
#11 0x00007ffc9fe0f745 in ?? ()
#12 0x00007ffcb7ff9e80 in ?? ()
#13 0x00007ffcb7ff9eb8 in ?? ()
#14 0x0000000000000009 in ?? ()
#15 0x0000000000000080 in ?? ()
#16 0x00007ffcb7ff9e40 in ?? ()
#17 0x0000000000000006 in ?? ()
#18 0x00007ffcb7ff9f60 in ?? ()
#19 0x00007ffc8040c39f in ?? ()
#20 0x000039cfea2a8819 in ?? ()
#21 0x00007ffca0002000 in ?? ()
#22 0x00003e9da8d80069 in ?? ()
#23 0x00003e9da8d80069 in ?? ()
#24 0x000014769610e4e9 in ?? ()
#25 0x00003e9da8d80069 in ?? ()
#26 0x000039cfea2a8819 in ?? ()
#27 0x0000370477847539 in ?? ()
#28 0x0000370477845891 in ?? ()
#29 0x0000370477847609 in ?? ()
#30 0x0000370477847689 in ?? ()
#31 0x00003704778476c1 in ?? ()
#32 0x00003704778476f1 in ?? ()
#33 0x0000000100000000 in ?? ()
#34 0x00003e9da8d800d9 in ?? ()
#35 0x00003e9da8d800d9 in ?? ()
#36 0x0000370477847689 in ?? ()
#37 0x00003704778476c1 in ?? ()
#38 0x0000370477847609 in ?? ()
#39 0x0000370477845891 in ?? ()
#40 0x00003704778476f1 in ?? ()
#41 0x0000370477847539 in ?? ()
#42 0x0000000100000000 in ?? ()
#43 0x000039cfea2a8819 in ?? ()
#44 0x000035b75c5c2ab1 in ?? ()
#45 0x0000000000000001 in ?? ()
#46 0x000014769610efd1 in ?? ()
#47 0x000035b75c5c2ab1 in ?? ()
#48 0x00007ffcb7ff9fc8 in ?? ()
#49 0x00007ffc804db824 in ?? ()
#50 0x0000370477847411 in ?? ()
#51 0x00003e9da8d80109 in ?? ()
#52 0x00007ffc9fe0a702 in ?? ()
#53 0x000019172f762d19 in ?? ()
#54 0x000039cfea2a8819 in ?? ()
#55 0x000039cfea2a8819 in ?? ()
#56 0x0000370477847411 in ?? ()
#57 0x00002d839551da39 in ?? ()
#58 0x0000000000000002 in ?? ()
#59 0x000035b75c5d4019 in ?? ()
#60 0x00002d839551da39 in ?? ()
#61 0x00007ffcb7ffa050 in ?? ()
#62 0x00007ffc804cd9b3 in ?? ()
#63 0x00003704778466d1 in ?? ()
#64 0x0000370477846db1 in ?? ()
#65 0x0000000000000022 in ?? ()
#66 0x0000370477846bd1 in ?? ()
#67 0x00003704778466d1 in ?? ()
#68 0x00002d839551e109 in ?? ()
#69 0x0000370477846db1 in ?? ()
#70 0x00002d839551e109 in ?? ()
#71 0x0000370477845a51 in ?? ()
#72 0x0000370477845b71 in ?? ()
#73 0x0000370477846d59 in ?? ()
#74 0x000035b75c5c3db9 in ?? ()
#75 0x0000000000000002 in ?? ()
#76 0x000035b75c5c3df1 in ?? ()
#77 0x00002d839551e109 in ?? ()
#78 0x00007ffcb7ffa098 in ?? ()
#79 0x00007ffc9fe4c9c3 in ?? ()
#80 0x000021d3fb382501 in ?? ()
#81 0x000019172f7792c1 in ?? ()
#82 0x00007ffc9ff2bc8b in ?? ()
#83 0x00007ffcd80a8220 in ?? ()
#84 0x0000000000000002 in ?? ()
#85 0x0000370477845fd1 in ?? ()
#86 0x0000370477845fa9 in ?? ()
#87 0x00007ffcb7ffa0d0 in ?? ()
#88 0x00007ffc9ff2b275 in ?? ()
#89 0x000030689c5c1189 in ?? ()
#90 0x00003704778466d1 in ?? ()
#91 0x0000370477845fa9 in ?? ()
#92 0x00003e9da8d80069 in ?? ()
#93 0x0000000000000022 in ?? ()
#94 0x00007ffcb7ffa138 in ?? ()
#95 0x00007ffc9fe3c919 in ?? ()
#96 0x00007ff9b0026050 in ?? ()
#97 0x00007ffca02a3df0 in ?? ()
#98 0x0000000000000054 in ?? ()
#99 0x00007ffcd80ff260 in ?? ()
#100 0x0000000000000054 in ?? ()
#101 0x00003e9da8d80069 in ?? ()
#102 0x0000370477845fa9 in ?? ()
#103 0x0000000000000001 in ?? ()
#104 0x000030689c5c1231 in ?? ()
#105 0x00007ffca0016120 in ?? ()
#106 0x0000000000000022 in ?? ()
#107 0x00007ffcb7ffa1a0 in ?? ()
#108 0x00007ffc9fe0b403 in ?? ()
#109 0x0000000000000000 in ?? ()
Interesting enough it seems to come from
We have the actual cause of the segfault now?
@Brooooooklyn I think im on something:
https://github.com/btc-vision/napi3-rs/commit/cac0a906b043d368b7f929433ca9d1b92a025caa
It hasnt crashed since ~30minutes im spamming a bunch of drop and ref in threads. Please take a look.
anddddddddddd boom after a LOT of patching now it PANIC no longer SEGFAULT.
Panic occurred: PanicHookInfo { payload: Any { .. }, location: Location { file: "/root/.cargo/git/checkouts/napi3-rs-870ea236a9e912ac/c196918/crates/napi/src/bindgen_runtime/js_values/buffer.rs", line: 29, col: 7 }, can_unwind: true, force_no_backtrace: false }
here:
#[cfg(all(debug_assertions, not(windows)))]
#[inline]
pub fn register_backing_ptr(ptr: *mut u8) {
if ptr.is_null() {
return;
} // 0-length buffers use NULL
BUFFER_DATA.with(|buffer_data| {
let mut set = buffer_data.lock().unwrap();
if !set.insert(ptr) {
panic!(
"Share the same data between different buffers is not allowed, \
see: https://github.com/nodejs/node/issues/32463#issuecomment-631974747"
);
}
});
}
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7a79859 in __GI_abort () at abort.c:79
#2 0x00007ffd4da0e3fa in std::sys::pal::unix::abort_internal () at library/std/src/sys/pal/unix/mod.rs:367
#3 0x00007ffd4da0b21f in std::panicking::rust_panic () at library/std/src/rt.rs:50
#4 0x00007ffd4da0afd2 in std::panicking::rust_panic_with_hook () at library/std/src/panicking.rs:856
#5 0x00007ffd4da0ac46 in std::panicking::begin_panic_handler::{{closure}} () at library/std/src/panicking.rs:697
#6 0x00007ffd4da09959 in std::sys::backtrace::__rust_end_short_backtrace () at library/std/src/sys/backtrace.rs:168
#7 0x00007ffd4da0a90d in rust_begin_unwind () at library/std/src/panicking.rs:695
#8 0x00007ffd4ce9be90 in core::panicking::panic_fmt () at library/core/src/panicking.rs:75
#9 0x00007ffd4d0585fc in napi::bindgen_runtime::js_values::buffer::register_backing_ptr::{{closure}} (buffer_data=0x7ffcd8152a90) at src/bindgen_runtime/js_values/buffer.rs:29
#10 0x00007ffd4d05057c in std::thread::local::LocalKey<T>::try_with (self=0x7ffd4dd7fbc0, f=...) at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/std/src/thread/local.rs:310
#11 0x00007ffd4d050234 in std::thread::local::LocalKey<T>::with (self=0x7ffd4dd7fbc0, f=...) at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/std/src/thread/local.rs:274
#12 0x00007ffd4cf12e73 in napi::bindgen_runtime::js_values::buffer::register_backing_ptr (ptr=0x7ffa04008b50 "cs\235\\") at /root/.cargo/git/checkouts/napi3-rs-870ea236a9e912ac/c196918/crates/napi/src/bindgen_runtime/js_values/buffer.rs:26
#13 0x00007ffd4ce9cc0b in napi::bindgen_runtime::js_values::buffer::BufferSlice::copy_from (env=0x7ffcefff9448, data=...) at /root/.cargo/git/checkouts/napi3-rs-870ea236a9e912ac/c196918/crates/napi/src/bindgen_runtime/js_values/buffer.rs:233
#14 0x00007ffd4cf24bdc in <op_vm::domain::runner::exit_data::ExitData as napi::bindgen_runtime::js_values::ToNapiValue>::to_napi_value (env_raw=0x7ffcd808ed00, val=...) at src/domain/runner/exit_data.rs:40
#15 0x00007ffd4cf484b5 in napi::env::Env::spawn_future::{{closure}} (env=0x7ffcd808ed00, val=<error reading variable: Cannot access memory at address 0x0>) at /root/.cargo/git/checkouts/napi3-rs-870ea236a9e912ac/c196918/crates/napi/src/env.rs:1166
#16 0x00007ffd4cf59ed0 in napi::tokio_runtime::SendableResolver<Data,R>::resolve (self=..., env=0x7ffcd808ed00, data=...) at /root/.cargo/git/checkouts/napi3-rs-870ea236a9e912ac/c196918/crates/napi/src/tokio_runtime.rs:193
#17 0x00007ffd4cf59c0b in napi::tokio_runtime::execute_tokio_future::{{closure}}::{{closure}} (env=...) at /root/.cargo/git/checkouts/napi3-rs-870ea236a9e912ac/c196918/crates/napi/src/tokio_runtime.rs:234
#18 0x00007ffd4cee2e68 in napi::js_values::deferred::napi_resolve_deferred::{{closure}} (resolver=...) at /root/.cargo/git/checkouts/napi3-rs-870ea236a9e912ac/c196918/crates/napi/src/js_values/deferred.rs:247
#19 0x00007ffd4cfc180b in core::result::Result<T,E>::and_then (self=..., op=...) at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/result.rs:1353
#20 0x00007ffd4cee20f1 in napi::js_values::deferred::napi_resolve_deferred (env=0x7ffcd808ed00, _js_callback=0x0, context=0x7ffcd8e3b690, data=0x7fc784002200)
at /root/.cargo/git/checkouts/napi3-rs-870ea236a9e912ac/c196918/crates/napi/src/js_values/deferred.rs:245
#21 0x0000000000f111c9 in v8impl::(anonymous namespace)::ThreadSafeFunction::AsyncCb(uv_async_s*) ()
#22 0x0000000001ca72f3 in uv__async_io (loop=0x7ffcefffe9c8, w=<optimized out>, events=<optimized out>) at ../deps/uv/src/unix/async.c:176
#23 0x0000000001cbce64 in uv__io_poll (loop=loop@entry=0x7ffcefffe9c8, timeout=<optimized out>) at ../deps/uv/src/unix/linux.c:1564
#24 0x0000000001ca8017 in uv_run (loop=0x7ffcefffe9c8, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:458
#25 0x0000000000e526d6 in node::SpinEventLoopInternal(node::Environment*) ()
#26 0x000000000109ab47 in node::worker::Worker::Run() ()
#27 0x000000000109acf9 in node::worker::Worker::StartThread(v8::FunctionCallbackInfo<v8::Value> const&)::{lambda(void*)#1}::_FUN(void*) ()
#28 0x00007ffff7c51609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#29 0x00007ffff7b76353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
new segfault again... it's so random..
#0 0x0000000000ef9fdf in std::pair<std::__detail::_Node_iterator<v8impl::RefTracker*, true, false>, bool> std::_Hashtable<v8impl::RefTracker*, v8impl::RefTracker*, std::allocator<v8impl::RefTracker*>, std::__detail::_Identity, std::equal_to<v8impl::RefTracker*>, std::hash<v8impl::RefTracker*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_M_emplace<v8impl::RefTracker*&>(std::integral_constant<bool, true>, v8impl::RefTracker*&) ()
#1 0x0000000000f16d4f in node_napi_env__::EnqueueFinalizer(v8impl::RefTracker*) ()
#2 0x0000000000ef95b6 in node_api_post_finalizer ()
#3 0x00007ffd4f25ab7f in <napi::bindgen_runtime::js_values::buffer::Buffer as core::ops::drop::Drop>::drop (self=0x7ed4e49df910) at src/bindgen_runtime/js_values/buffer.rs:404
#4 0x00007ffd4f277a6b in core::ptr::drop_in_place<napi::bindgen_runtime::js_values::buffer::Buffer> () at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/ptr/mod.rs:523
#5 0x00007ffd4f13ee46 in <op_vm::interfaces::napi::external_functions::generic_external_function::GenericExternalFunction<napi::bindgen_runtime::js_values::promise::Promise<napi::bindgen_runtime::js_values::buffer::Buffer>> as op_vm::interfaces::napi::external_functions::external_function::ExternalFunction>::execute::{{closure}} () at src/interfaces/napi/external_functions/generic_external_function.rs:76
#6 0x00007ffd4f174bae in tokio::runtime::park::CachedParkThread::block_on::{{closure}} () at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/park.rs:284
#7 0x00007ffd4f16ef97 in tokio::task::coop::with_budget (budget=..., f=...) at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/task/coop/mod.rs:167
#8 tokio::task::coop::budget (f=...) at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/task/coop/mod.rs:133
#9 tokio::runtime::park::CachedParkThread::block_on (self=0x7ed4e49dfd87, f=...) at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/park.rs:284
#10 0x00007ffd4f1a9c82 in tokio::runtime::context::blocking::BlockingRegionGuard::block_on (self=0x7ed4e49e0000, f=...) at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/context/blocking.rs:66
#11 0x00007ffd4f13e2a1 in tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}} (blocking=0x7ed4e49e0000) at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/scheduler/multi_thread/mod.rs:87
#12 0x00007ffd4f1af819 in tokio::runtime::context::runtime::enter_runtime (handle=0x7ffcd887d550, allow_block_in_place=true, f=...) at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/context/runtime.rs:65
#13 0x00007ffd4f13df50 in tokio::runtime::scheduler::multi_thread::MultiThread::block_on (self=0x7ffcd887d528, handle=0x7ffcd887d550, future=...)
at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/scheduler/multi_thread/mod.rs:86
#14 0x00007ffd4f10bde7 in tokio::runtime::runtime::Runtime::block_on_inner (self=0x7ffcd887d520, future=...) at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/runtime.rs:358
#15 0x00007ffd4f10d8ae in tokio::runtime::runtime::Runtime::block_on (self=0x7ffcd887d520, future=...) at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/runtime.rs:330
#16 0x00007ffd4f13d0bf in <op_vm::interfaces::napi::external_functions::generic_external_function::GenericExternalFunction<napi::bindgen_runtime::js_values::promise::Promise<napi::bindgen_runtime::js_values::buffer::Buffer>> as op_vm::interfaces::napi::external_functions::external_function::ExternalFunction>::execute (self=0x7ffcd8156dc8, data=..., runtime=0x7ffcd887d520) at src/interfaces/napi/external_functions/generic_external_function.rs:78
#17 0x00007ffd4f119654 in <op_vm::interfaces::napi::external_functions::storage_load_external_function::StorageLoadExternalFunction as op_vm::interfaces::napi::external_functions::external_function::ExternalFunction>::execute (self=0x7ffcd8156dc8,
data=..., runtime=0x7ffcd887d520) at src/interfaces/napi/external_functions/storage_load_external_function.rs:45
#18 0x00007ffd4f1b3351 in op_vm::domain::runner::import_functions::storage_load_import::StorageLoadImport::execute (context=..., key_ptr=18352, result_ptr=37936) at src/domain/runner/import_functions/storage_load_import.rs:31
#19 0x00007ffd4f0f351a in core::ops::function::Fn::call () at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/ops/function.rs:79
#20 0x00007ffd4f1d51a0 in wasmer::backend::sys::entities::function::gen_fn_callback_s2::func_wrapper::{{closure}}::{{closure}} () at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmer-6.0.1/src/backend/sys/entities/function/mod.rs:600
#21 0x00007ffd4f0f5e49 in core::ops::function::FnOnce::call_once () at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/ops/function.rs:250
#22 0x00007ffd4f141601 in <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=...) at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/panic/unwind_safe.rs:272
#23 0x00007ffd4f1a3f84 in std::panicking::try::do_call (data=0x7ed4e49e0ed0 "\300\250\032L\375\177") at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/std/src/panicking.rs:587
#24 0x00007ffd4f17bd7b in __rust_try () from /root/op-vm/op-vm.linux-x64-gnu.node
#25 0x00007ffd4f17b8ca in std::panicking::try (f=...) at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/std/src/panicking.rs:550
#26 std::panic::catch_unwind (f=...) at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/std/src/panic.rs:358
#27 0x00007ffd4f1d4663 in wasmer::backend::sys::entities::function::gen_fn_callback_s2::func_wrapper::{{closure}} () at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmer-6.0.1/src/backend/sys/entities/function/mod.rs:591
#28 0x00007ffd4f1f72b0 in wasmer_vm::trap::traphandlers::on_host_stack::{{closure}} () at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmer-vm-6.0.1/src/trap/traphandlers.rs:1015
#29 0x00007ffd4f140928 in <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=...) at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/panic/unwind_safe.rs:272
#30 0x00007ffd4f1a4e1a in std::panicking::try::do_call (data=0x7ed4e49e10c8 "\300\250\032L\375\177") at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/std/src/panicking.rs:587
#31 0x00007ffd4f17bd7b in __rust_try () from /root/op-vm/op-vm.linux-x64-gnu.node
#32 0x00007ffd4f17a88e in std::panicking::try (f=...) at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/std/src/panicking.rs:550
#33 std::panic::catch_unwind (f=...) at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/std/src/panic.rs:358
#34 0x00007ffd4f183f88 in corosensei::unwind::catch_unwind_at_root (f=...) at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/corosensei-0.2.1/src/unwind.rs:228
#35 0x00007ffd4f19a40b in corosensei::coroutine::on_stack::wrapper (ptr=0x7ffd4c1aa660 "\300\250\032L\375\177") at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/corosensei-0.2.1/src/coroutine.rs:568
#36 <signal handler called>
#37 0x00007ffd4f96327f in corosensei::arch::x86_64::on_stack (arg=0x7ffd4c1aa660 "\300\250\032L\375\177", stack=..., f=0x7ffd4f19a3b0 <corosensei::coroutine::on_stack::wrapper>)
at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/corosensei-0.2.1/src/unwind.rs:137
#38 0x00007ffd4f199b83 in corosensei::coroutine::on_stack (stack=..., f=...) at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/corosensei-0.2.1/src/coroutine.rs:581
#39 0x00007ffd4f198bc0 in corosensei::coroutine::Yielder<Input,Yield>::on_parent_stack (self=0x7ffd4c1aaff0, f=...) at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/corosensei-0.2.1/src/coroutine.rs:535
#40 0x00007ffd4f1f58c2 in wasmer_vm::trap::traphandlers::on_host_stack (f=...) at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmer-vm-6.0.1/src/trap/traphandlers.rs:1013
#41 0x00007ffd4f1d3ea7 in wasmer::backend::sys::entities::function::gen_fn_callback_s2::func_wrapper (env=0x7ffcd8902ac0, A1=18352, A2=37936)
at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmer-6.0.1/src/backend/sys/entities/function/mod.rs:590
#42 0x00007ffeac0ba093 in ?? ()
#43 0x0000000000000000 in ?? ()
My program segfault fatally after ~30minutes with a reference deletion problem. Here is the full backtrace from gdb:
It seems that there is a problem in a async function cleaning up a buffer object in custom_gc:
https://github.com/napi-rs/napi-rs/blob/napi%402.16.17/crates/napi/src/bindgen_runtime/module_register.rs#L602
I have no clue how to replicate easily but it does happen.