zeromq / zeromq.js

:zap: Node.js bindings to the ØMQ library
http://zeromq.github.io/zeromq.js/
MIT License
1.45k stars 209 forks source link

Abort (core dumped) problem #296

Open chengsu opened 5 years ago

chengsu commented 5 years ago

In my test code, I use connection pool to send 300 request in parallel , but get "Abort (core dumped)" message a few minites later, and my program exit. I use lldb to analyse the coredump file and got this:

(lldb) target create "node" --core "core"
Core file '/home/chengjiabin/Documents/verifier2/core' (x86_64) was loaded.
(lldb) bt
* thread #1, name = 'node', stop reason = signal SIGABRT
  * frame #0: 0x00007f779a0dde97 libc.so.6`__GI_raise(sig=2) at raise.c:51
    frame #1: 0x00007f779a0df801 libc.so.6`__GI_abort at abort.c:79
    frame #2: 0x0000000000a476aa node`uv_async_send at async.c:153
    frame #3: 0x0000000000a47680 node`uv_async_send(handle=<unavailable>) at async.c:65
    frame #4: 0x00007f779354ea8d zmq.node`zmq::Socket::OutgoingMessage::BufferReference::FreeCallback(void*, void*) + 13
    frame #5: 0x00007f7793568f14 zmq.node`___lldb_unnamed_symbol697$$zmq.node + 218
    frame #6: 0x00007f77935b7d1e zmq.node`___lldb_unnamed_symbol2795$$zmq.node + 168
    frame #7: 0x00007f77935ab9c2 zmq.node`___lldb_unnamed_symbol2639$$zmq.node + 618
    frame #8: 0x00007f77935abbdf zmq.node`___lldb_unnamed_symbol2640$$zmq.node + 109
    frame #9: 0x00007f77935a6b99 zmq.node`___lldb_unnamed_symbol2575$$zmq.node + 323
    frame #10: 0x00007f7793571066 zmq.node`___lldb_unnamed_symbol942$$zmq.node + 108
    frame #11: 0x00007f7793569e96 zmq.node`___lldb_unnamed_symbol733$$zmq.node + 90
    frame #12: 0x00007f7793566755 zmq.node`___lldb_unnamed_symbol637$$zmq.node + 81
    frame #13: 0x00007f77935656b2 zmq.node`___lldb_unnamed_symbol568$$zmq.node + 586
    frame #14: 0x00007f77935657ca zmq.node`___lldb_unnamed_symbol569$$zmq.node + 24
    frame #15: 0x00007f7793586130 zmq.node`___lldb_unnamed_symbol1566$$zmq.node + 342
    frame #16: 0x00007f779a4976db libpthread.so.0`start_thread + 219
    frame #17: 0x00007f779a1c088f libc.so.6`__GI___clone at clone.S:9
FanAs commented 5 years ago

Subscribed for this issue. We downgraded to 4.6.0 version and it works perfectly fine. We met random SIGABRT without any trace information in 5.1.0. It doesn't depends on old_memory size or connection pooling. Happens on any Linux based OS and in Docker. We couldn't reproduce in in MacOS.

It's show stopper, please react.

FanAs commented 5 years ago

This error affects all 5.* versions.

FanAs commented 5 years ago

int result = uv_async_send(static_cast<BufferReference*>(bufref)->async_);

Should be

int result = uv_async_send(&static_cast<BufferReference*>(bufref)->async_);

abhassaroha commented 5 years ago

On Mac OS process doesn't crash but IO thread crashes and sockets stop sending or receiving messages. Have been able to reproduce it by running a process with about 200-300 connections for around 30 minutes. Can share more details if needed. Happens on latest versions where the new buffer allocation logic was auto-enabled.

noahw3 commented 5 years ago

Have also run into this. Similar behavior as @FanAs - process exits with a random abort and without any trace information.

Witnessed on 5.1.0. Doesn't occur in 4.6.0. Unfortunately, there's no prebuilt Node 10 version on 4.6.0 so it's not as simple as rolling back.

noahw3 commented 5 years ago

Potentially caused by https://github.com/libuv/libuv/issues/2226?

Edit: Seems unrelated, I ran the example from that issue and it manifests differently (segfault rather than hitting the explicit abort at https://github.com/libuv/libuv/blob/89cbbc895bb459c6ab5319fa86a32a9fecbf8744/src/unix/async.c#L153).

JamesYYang commented 5 years ago

@FanAs Do you resolve this issue now?

mahan commented 5 years ago

Thanks for the report @FanAs

Same issue here - 5.1.0 (seemingly) random "Abort (core dumped)"

Reverted to 4.6.0 and I can send millions of messages without any glitches.

gabrieldodan commented 5 years ago

Same issue #333

gabrieldodan commented 5 years ago

Downgraded to 4.6.0, everything works well now. thanks @FanAs

rolftimmermans commented 4 years ago

Recently we have released 6.0 beta. It features a new API that addresses some fundamental issues with the previous API and also addresses a number of stability bugs. To make upgrading easier it includes a compatibility layer with versions 4.x/5.x. It would be great if you could give the latest version a spin to see if this solves this particular issue. If you run into any problems with it, feel free to report it here or in a new issue.

NathanRSmith commented 4 years ago

This seems to still be an issue in 5.2.0.

aminya commented 1 month ago

v6 was released. Please try again with the latest version, and report back if the issue still persists. https://github.com/zeromq/zeromq.js/releases/tag/v6.0.0