joyent / libuv

Go to
https://github.com/libuv/libuv
3.27k stars 654 forks source link

src/uv-common.c:92: uv_err_name: Assertion `0' failed. #206

Closed tommypowerz closed 12 years ago

tommypowerz commented 12 years ago

my node server.js is breaking from time to time with this error:

src/uv-common.c:92: uv_err_name: Assertion `0' failed.

it uses socket.io and get many connections...

is there a way to try/catch this somehow in my server?

i'm totally lost.. there are no more informations when it crashes....

thanks in advance for your help guys..

bnoordhuis commented 12 years ago

Try this:

gdb --args node server.js
break uv-common.c:92
run
# wait until breakpoint is hit
backtrace

Post the backtrace here.

bnoordhuis commented 12 years ago

Oh, and make sure you build Node with ./configure --debug (and substitute node with node_g). The release build doesn't have debug symbols so the backtrace won't be very enlightening.

tommypowerz commented 12 years ago

ok i will try that and post what i find out...

how do i (and substitute node with node_g). ?

node_g server.js ?

or gdb --args node server.js ?

tommypowerz commented 12 years ago

Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x7fabbdf6c6f0 (LWP 18634)] 0x00007fabbc7b275b in write () from /lib/libpthread.so.0

(gdb) backtrace

0 0x00007fabbc7b275b in write () from /lib/libpthread.so.0

1 0x0000000000803a5a in uv__write (stream=0x1d16208) at src/unix/stream.c:354

2 0x00000000008048b3 in uv_write (req=0x1cd9d88, stream=0x1d16208, bufs=0x7fffa250af30,

bufcnt=1, cb=0x54a320 <node::StreamWrap::AfterWrite(uv_write_s*, int)>)
at src/unix/stream.c:730

3 0x000000000054a678 in node::StreamWrap::Write (args=)

at ../src/stream_wrap.cc:242

4 0x00000817031d9b2a in ?? ()

5 0x00007fffa250afd0 in ?? ()

6 0x00007fffa250afd8 in ?? ()

7 0x0000000000000001 in ?? ()

8 0x0000000000000000 in ?? ()

then i start it over and get:

events.js:48 throw arguments[1]; // Unhandled 'error' event ^ Error: accept ENOTCONN at errnoException (net_uv.js:561:11) at TCP.onconnection (net_uv.js:707:17)

Program exited with code 01.

backtrace gives no stack

tommypowerz commented 12 years ago

another one: Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x7fceb97926f0 (LWP 23027)] 0x00007fceb7fd875b in write () from /lib/libpthread.so.0 (gdb) backtrace

0 0x00007fceb7fd875b in write () from /lib/libpthread.so.0

1 0x0000000000803a5a in uv__write (stream=0xf025f8) at src/unix/stream.c:354

2 0x00000000008048b3 in uv_write (req=0xf22e78, stream=0xf025f8, bufs=0x7fffa2762830,

bufcnt=1, cb=0x54a320 <node::StreamWrap::AfterWrite(uv_write_s*, int)>)
at src/unix/stream.c:730

3 0x000000000054a678 in node::StreamWrap::Write (args=)

at ../src/stream_wrap.cc:242

4 0x000001eedc0ee7ca in ?? ()

5 0x00007fffa27628d0 in ?? ()

6 0x00007fffa27628d8 in ?? ()

7 0x0000000000000001 in ?? ()

8 0x0000000000000000 in ?? ()

tommypowerz commented 12 years ago

now i compiled the v.0.4.12 stable version with --debug.. works longer. but comes up with this:

[New Thread 0x7f2a70fbd6f0 (LWP 25400)]

Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x7f2a70fbd6f0 (LWP 25400)] 0x00007f2a6f803720 in __write_nocancel () from /lib/libpthread.so.0 (gdb) backtrace

0 0x00007f2a6f803720 in __write_nocancel () from /lib/libpthread.so.0

1 0x000000000052ba8d in Write (args=@0x7fffd46912e0) at ../src/node_net.cc:988

2 0x0000000000582ec3 in v8::internal::Builtin_HandleApiCall ()

3 0x00007f2a4b28c24a in ?? ()

4 0x0000000000beafc0 in ?? ()

5 0x00007f2a4b28c1c1 in ?? ()

6 0x00007fffd4691360 in ?? ()

7 0x00007fffd46913d0 in ?? ()

8 0x00007f2a4b2c5e30 in ?? ()

9 0x00007f2a4b242b31 in ?? ()

10 0x00000acd00000000 in ?? ()

11 0x0000000000000000 in ?? ()

thejh commented 12 years ago

$ gdb --args node /usr/local/bin/coffee proxy GNU gdb (GDB) Fedora (7.3-43.fc15) Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/local/bin/node...done. (gdb) break uv-common.c:92 Breakpoint 1 at 0x8409807: file src/uv-common.c, line 92. (gdb) handle SIGPIPE nostop noprint pass Signal Stop Print Pass to program Description SIGPIPE No No Yes Broken pipe (gdb) run Starting program: /usr/local/bin/node /usr/local/bin/coffee proxy [Thread debugging using libthread_db enabled] [New Thread 0xb7fb7b70 (LWP 20914)] [New Thread 0xb777bb70 (LWP 20915)]

Breakpoint 1, uv_err_name (err=...) at src/uv-common.c:92
92        assert(0);
(gdb) backtrace
#0  uv_err_name (err=...) at src/uv-common.c:92
#1  0x081509dd in node::SetErrno (code=104) at ../src/node.cc:1103
#2  0x08173004 in node::StreamWrap::AfterWrite (req=0x69, status=-1) at ../src/stream_wrap.cc:269
#3  0x084083a3 in uv__write_callbacks (stream=0x86a1dd0) at src/unix/stream.c:432
#4  0x08408938 in uv__stream_io (loop=0x8642040, watcher=0x86a1e30, revents=2) at src/unix/stream.c:550
#5  0x0840db2f in ev_invoke_pending (loop=0x8642040) at src/unix/ev/ev.c:2143
#6  0x0840e6e9 in ev_run (loop=0x8642040, flags=0) at src/unix/ev/ev.c:2519
#7  0x08400702 in uv_run (loop=0x8641dc0) at src/unix/core.c:188
#8  0x08153d8c in node::Start (argc=3, argv=0x69) at ../src/node.cc:2549
#9  0x0814cf1b in main (argc=3, argv=0xbffff3a4) at ../src/node_main.cc:25
(gdb)
tommypowerz commented 12 years ago

another one node v0.4.12

Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x7fe5dda596f0 (LWP 28196)] 0x00007fe5dc29f75b in write () from /lib/libpthread.so.0 (gdb) backtrace

0 0x00007fe5dc29f75b in write () from /lib/libpthread.so.0

1 0x000000000052ba8d in Write (args=@0x7fff8b391510) at ../src/node_net.cc:988

2 0x0000000000582ec3 in v8::internal::Builtin_HandleApiCall ()

3 0x00007fe5b7d2824a in ?? ()

4 0x00007fe5b7683f09 in ?? ()

5 0x00007fe5b7d281c1 in ?? ()

6 0x00007fff8b391590 in ?? ()

7 0x00007fff8b391600 in ?? ()

8 0x00007fe5b7d61e30 in ?? ()

9 0x00007fe5b7cdeb31 in ?? ()

10 0x0000015800000000 in ?? ()

11 0x0000000000000000 in ?? ()

tommypowerz commented 12 years ago

Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x7fa7f67de6f0 (LWP 32186)] 0x00007fa7f5024720 in __write_nocancel () from /lib/libpthread.so.0 (gdb) backtrace

0 0x00007fa7f5024720 in __write_nocancel () from /lib/libpthread.so.0

1 0x000000000052ba8d in Write (args=@0x7fffa80535c0) at ../src/node_net.cc:988

2 0x0000000000582ec3 in v8::internal::Builtin_HandleApiCall ()

3 0x00007fa7d0aae24a in ?? ()

4 0x00007f0100000001 in ?? ()

5 0x00007fa7d0aae1c1 in ?? ()

6 0x00007fffa8053640 in ?? ()

7 0x00007fffa80536b0 in ?? ()

8 0x00007fa7d0aead10 in ?? ()

9 0x00007fa7d0a66bc1 in ?? ()

10 0x0000017800000000 in ?? ()

11 0x0000000000000000 in ?? ()

thejh commented 12 years ago

From the kernel sources:

include/asm-generic/errno.h:#define     ECONNRESET      104     /* Connection reset by peer */

As I can sometimes reproduce the error by disconnecting and then reconnecting the network interface while streaming stuff from nodejs, this seems like it could be it.

thejh commented 12 years ago

Do those SIGPIPE errors really matter? I used handle SIGPIPE nostop noprint pass to get the interesting data... but then again, I don't know much about C or gdb or anything from that topic.

tommypowerz commented 12 years ago

ok trying it also with handle sigpipe...

thejh commented 12 years ago

Actually, I'm not so sure about that disconnecting and then reconnecting the network interface thing being the cause, it randomly crashes now without me manually disrupting it, too.

alexandertrefz commented 12 years ago

I got that error on 0.5.9 with a very tiny socket.io server, too. So i guess it is triggered by something in socket.io. (there are no other dependencies and i never assert anything in the other 30 lines that power the server)

bnoordhuis commented 12 years ago

Can one you guys run your script through strace (i.e. strace node script.js) and post the output?

alexandertrefz commented 12 years ago

Here you go. This was run on: Linux live-059 2.6.32-33-server # 72-Ubuntu SMP Fri Jul 29 21:21:55 UTC 2011 x86_64 GNU/Linux with this script

bnoordhuis commented 12 years ago

@atrefz: Thanks. I'm afraid I can't reproduce it on the current master (1b55010) with the script you linked.

The two possible culprits I see in the strace log are ECONNRESET and EPIPE but both are handled in libuv. Can you try this?

gdb --args ./node_g script.js
break uv-common.c:92
run
# wait until breakpoint
print err

EDIT: break uv-common.c:94 if you upgrade to master.

alexandertrefz commented 12 years ago

do i need to recompile node with special parameters if i want to do this?

bnoordhuis commented 12 years ago

./configure --debug && make <- builds node_g

ryanrolds commented 12 years ago
Breakpoint 1, uv_err_name (err=...) at src/uv-common.c:92
92            assert(0);
(gdb) print err
$1 = {code = 104, sys_errno_ = 32767}
(gdb) backtrace
#0  uv_err_name (err=...) at src/uv-common.c:92
#1  0x000000000061143a in node::SetErrno (code=104) at ../src/node.cc:1103
#2  0x0000000000642245 in node::StreamWrap::AfterWrite (req=0xfd3dc8, status=-1) at ../src/stream_wrap.cc:335
#3  0x0000000000a0d7d1 in uv__write_callbacks (stream=0x11f2c50) at src/unix/stream.c:464
#4  0x0000000000a0e159 in uv__stream_io (loop=0xf5b560, watcher=0x11f2cf8, revents=2) at src/unix/stream.c:666
#5  0x0000000000a140d8 in ev_invoke_pending (loop=0xf5b560) at src/unix/ev/ev.c:2143
#6  0x0000000000a14e9a in ev_run (loop=0xf5b560, flags=0) at src/unix/ev/ev.c:2519
#7  0x0000000000a04f6f in uv_run (loop=0xf5b240) at src/unix/core.c:188
#8  0x0000000000618a33 in node::Start (argc=2, argv=0x7fffffffe658) at ../src/node.cc:2578
#9  0x0000000000610200 in main (argc=2, argv=0x7fffffffe658) at ../src/node_main.cc:25

Little context, this only happens when running ab -k -n 10000 -c 5000 http://localhost:3000/tobi.jpeg. Non-keepalive works fine. The code being performance tested is Connect's staticCache. It should be serving the requested jpeg from memory.

bnoordhuis commented 12 years ago

@ryanrolds: Thanks. Not sure how that can happen. Will investigate.

ryanrolds commented 12 years ago

If you need more information, just ask.

ryanrolds commented 12 years ago

Confirming that I'm experiencing the same issue in 0.5.10rc2.

tojocky commented 12 years ago

Hello,

With v0.5.11-pre is the same problem, but with src/uv-common.c:94.

The result of gdbs is: Breakpoint 1, uv_err_name (err=...) at src/uv-common.c:94 warning: Source file is more recent than executable. 94 assert(0); (gdb) backtrace

0 uv_err_name (err=...) at src/uv-common.c:94

1 0x0815a665 in node::SetErrno (err=...) at ../src/node.cc:1098

2 0x0817d81c in node::StreamWrap::AfterWrite (req=0x69, status=-1)

at ../src/stream_wrap.cc:337

3 0x0845264d in uv__write_callbacks (stream=0x8a17e58)

at src/unix/stream.c:476

4 0x08452f70 in uv__stream_io (loop=0x86631a0, watcher=0x8a17ebc, revents=2)

at src/unix/stream.c:677

5 0x0845878d in ev_invoke_pending (loop=0x86631a0) at src/unix/ev/ev.c:2143

6 0x08459347 in ev_run (loop=0x86631a0, flags=0) at src/unix/ev/ev.c:2519

7 0x0844a292 in uv_run (loop=0x8662f20) at src/unix/core.c:188

8 0x0815d9ec in node::Start (argc=3, argv=0x69) at ../src/node.cc:2532

9 0x081566bb in main (argc=3, argv=0xbffff0e4) at ../src/node_main.cc:25

(gdb)

JerrySievert commented 12 years ago

confirming in 0.6

JerrySievert commented 12 years ago

the error code failing is 104, which under linux is connection reset:

#define ECONNRESET      104     /* Connection reset by peer */

adding an additional reset to the uv_error_code enum in uv.h:

  UV_ECONNRESET2 = 104

and adding a case to uv_err_name in uv-common.c:

  case UV_ECONNRESET2: return "ECONNRESET2";

fixes the assertion failure and allows node to recover correctly.

connection reset appears to occur under heavy load.

ry commented 12 years ago

We should fprintf(stderr) the bad error code

JerrySievert commented 12 years ago

@ry did that, error code was 104 (linux connection reset). i added it to the maps, but the patch got rejected.

olostan commented 12 years ago

Looks like same issue on ubuntu 11.10, node 0.6.1-pre (updated from master branch and recompiled).

node: src/uv-common.c:58: uv_err_name: Assertion `0' failed. Segmentation fault

ry commented 12 years ago

this issue was fixed in fd2b04d784d6cc10ca985a5548835310622bda50

pyrostrex commented 12 years ago

node: src/uv-common.c:58: uv_err_name: Assertion `0' failed.

still happening with node 0.6.1 cluster used.

quexer commented 12 years ago

same problem with @pyrostrex.

node 0.6.1 on REHL 5.3 node cluster is used.

devdazed commented 12 years ago

I also got a broken pipe:

Program received signal SIGPIPE, Broken pipe.
0x00002aaaac22a6bb in write () from /lib64/libpthread.so.0
(gdb) backtrace
#0  0x00002aaaac22a6bb in write () from /lib64/libpthread.so.0
#1  0x00000000007fd375 in uv__write (stream=0xc955c0) at src/unix/stream.c:400
#2  0x00000000007fe5ef in uv_write2 (req=0xcaa958, stream=0xc955c0, bufs=0x7fff3601eb70, bufcnt=1, send_handle=0x0, cb=0x53a000 <node::StreamWrap::AfterWrite(uv_write_s*, int)>) at src/unix/stream.c:881
#3  0x00000000007fe680 in uv_write (req=0xcaa958, stream=0xc955c0, bufs=0x7fff3601eb70, bufcnt=1, cb=0x53a000  <node::StreamWrap::AfterWrite(uv_write_s*, int)>) at src/unix/stream.c:902
#4  0x000000000053a672 in node::StreamWrap::Write (args=@0x7fff3601ebc0) at ../src/stream_wrap.cc:313
#5  0x00001e33f5cd116a in ?? ()
#6  0x00007fff3601ec18 in ?? ()
#7  0x00007fff3601ec20 in ?? ()
#8  0x0000000000000001 in ?? ()
#9  0x0000000000000000 in ?? ()
ry commented 12 years ago

SIGPIPE should be ignored... https://github.com/joyent/node/blob/5c3954aa6fb2e0b23a7bcfec6fc4636d5d152333/src/node.cc#L2380

bnoordhuis commented 12 years ago

I also got a broken pipe:

   Program received signal SIGPIPE, Broken pipe.    0x00002aaaac22a6bb in write () from /lib64/libpthread.so.0    (gdb) backtrace    #0  0x00002aaaac22a6bb in write () from /lib64/libpthread.so.0    #1  0x00000000007fd375 in uv__write (stream=0xc955c0) at src/unix/stream.c:400    #2  0x00000000007fe5ef in uv_write2 (req=0xcaa958, stream=0xc955c0, bufs=0x7fff3601eb70, bufcnt=1, send_handle=0x0, cb=0x53a000 <node::StreamWrap::AfterWrite(uv_write_s, int)>) at src/unix/stream.c:881    #3  0x00000000007fe680 in uv_write (req=0xcaa958, stream=0xc955c0, bufs=0x7fff3601eb70, bufcnt=1, cb=0x53a000  <node::StreamWrap::AfterWrite(uv_write_s, int)>) at src/unix/stream.c:902    #4  0x000000000053a672 in node::StreamWrap::Write (args=@0x7fff3601ebc0) at ../src/stream_wrap.cc:313    #5  0x00001e33f5cd116a in ?? ()    #6  0x00007fff3601ec18 in ?? ()    #7  0x00007fff3601ec20 in ?? ()    #8  0x0000000000000001 in ?? ()    #9  0x0000000000000000 in ?? ()

It's an artefact of how gdb reports some signals: it reports them even if the program ignores them. handle SIGPIPE nostop noprint makes gdb ignore them as well.

pyrostrex commented 12 years ago

It turns out, it is not because of cluster. I tried to turn off cluster but it still raise up.

bnoordhuis commented 12 years ago

@ry Are you okay with applying the patch below as a band aid? It doesn't fix the root issue (unmapped errors) but at least it gives people a chance to handle the error themselves.

diff --git a/src/uv-common.c b/src/uv-common.c
index 3143bd2..9ace0f3 100644
--- a/src/uv-common.c
+++ b/src/uv-common.c
@@ -55,8 +55,7 @@ const char* uv_err_name(uv_err_t err) {
   switch (err.code) {
     UV_ERRNO_MAP(UV_ERR_NAME_GEN)
     default:
-      assert(0);
-      return NULL;
+      return "Unknown system error";
   }
 }
 #undef UV_ERR_NAME_GEN
ry commented 12 years ago

I want to root cause this. They shouldn't be seeing this error now that all of the errnos are mapped.

MagicalTux commented 12 years ago

Confirmed to be still happening with nodeJS 0.6.1:

node: src/uv-common.c:58: uv_err_name: Assertion `0' failed. Aborted

I'd suggest adding a way to know which errno caused this. I'll manually patch my nodejs to show it for now.

EDIT: added a fprintf right before the assert(0) to show the error when the thing crashes, should have it happen within ~24 hours or so.

May I suggest the list of errno not to be made manually but generated automatically from the system's includes?

ry commented 12 years ago

please insert a printf there and tell us the code:

diff --git a/src/uv-common.c b/src/uv-common.c
index 3143bd2..1ee7d42 100644
--- a/src/uv-common.c
+++ b/src/uv-common.c
@@ -55,6 +55,7 @@ const char* uv_err_name(uv_err_t err) {
   switch (err.code) {
     UV_ERRNO_MAP(UV_ERR_NAME_GEN)
     default:
+      printf("bad err.code = %d\n", err.code);
       assert(0);
       return NULL;
   }

you'll need to do make clean all

tojocky commented 12 years ago

In future, do not debug hard, maybe you put this patch in the repository? Will be simple to communicate only error code without using gdb.

devdazed commented 12 years ago

bad err.code = 104 node: src/uv-common.c:59: uv_err_name: Assertion `0' failed. Aborted

devdazed commented 12 years ago

here is from the bt full:

node: src/uv-common.c:59: uv_err_name: Assertion `0' failed.

Program received signal SIGABRT, Aborted.
0x00002aaaac468265 in raise () from /lib64/libc.so.6
(gdb) bt full
#0  0x00002aaaac468265 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00002aaaac469d10 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00002aaaac4616e6 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#3  0x00000000007ff40f in uv_err_name (err={code = 104, sys_errno_ = 0}) at src/uv-common.c:59
        __PRETTY_FUNCTION__ = "uv_err_name"
#4  0x000000000051dda9 in node::SetErrno (err=<value optimized out>) at ../src/node.cc:902
        scope = {isolate_ = 0xc3e280, prev_next_ = 0xc75ed0, prev_limit_ = 0xc77bb0, is_closed_ = false}
#5  0x000000000053a145 in node::StreamWrap::AfterWrite (req=<value optimized out>, status=-1) at ../src/stream_wrap.cc:358
        req_wrap = (WriteWrap *) 0xca0590
        wrap = <value optimized out>
        scope = {isolate_ = 0xc3e280, prev_next_ = 0xc75ed0, prev_limit_ = 0xc77bb0, is_closed_ = false}
        argv = {{<v8::Handle<v8::Value>> = {val_ = 0x7fff85d00bc0}, <No data fields>}, {<v8::Handle<v8::Value>> = {val_ = 0x804ef7}, <No data fields>}, {<v8::Handle<v8::Value>> = {val_ = 0x0}, <No data fields>},
  {<v8::Handle<v8::Value>> = {val_ = 0xca0f80}, <No data fields>}}
        __PRETTY_FUNCTION__ = "static void node::StreamWrap::AfterWrite(uv_write_t*, int)"
#6  0x00000000007fd737 in uv__write_callbacks (stream=0xca2c90) at src/unix/stream.c:494
        callbacks_made = 0
        q = (ngx_queue_t *) 0xca05c0
        req = (uv_write_t *) 0xca0598
        __PRETTY_FUNCTION__ = "uv__write_callbacks"
#7  0x00000000007fdfa6 in uv__stream_io (loop=0xc3c000, watcher=0xca2d38, revents=2) at src/unix/stream.c:695
        stream = (uv_stream_t *) 0xca2c90
        __PRETTY_FUNCTION__ = "uv__stream_io"
#8  0x000000000080367c in ev_invoke_pending (loop=0xc3c000) at src/unix/ev/ev.c:2143
        p = (ANPENDING *) 0xce8f60
        pri = 2
#9  0x0000000000803e46 in ev_run (loop=0xc3c000, flags=0) at src/unix/ev/ev.c:2519
        __PRETTY_FUNCTION__ = "ev_run"
#10 0x00000000007f52ce in uv_run (loop=0xc3bce0) at src/unix/core.c:192
No locals.
#11 0x000000000051f107 in node::Start (argc=<value optimized out>, argv=0xc3d340) at ../src/node.cc:2478
        handle_scope = {isolate_ = 0xc3e280, prev_next_ = 0x0, prev_limit_ = 0x0, is_closed_ = false}
        process = <value optimized out>
#12 0x00002aaaac455994 in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#13 0x000000000051b3f9 in _start ()
No symbol table info available.
ry commented 12 years ago

Updated Node v0.6 branch with this fix in joyent/node@44314ccf48332c91db57a53d5fc0e3de48642681

daniel-j commented 12 years ago

Got this on node v0.6.1:

PROCESS TITLE: src/uv-common.c:58: uv_err_name: Assertion `0' failed. Avbruten (SIGABRT)

where PROCESS TITLE is the process.title value.

thejh commented 12 years ago

@daniel-j Yes, use the newest node v0.6.

veeru-as commented 12 years ago

I am getting this often on my node.js v0.6.17

bnoordhuis commented 12 years ago

@veeru-as On what platform? If you're on a UNIX, can you try this:

$ gdb -q --args node script.js
> run
# wait for assertion to hit
> backtrace full

Please post the output.