Closed tommypowerz closed 12 years ago
Try this:
gdb --args node server.js
break uv-common.c:92
run
# wait until breakpoint is hit
backtrace
Post the backtrace here.
Oh, and make sure you build Node with ./configure --debug
(and substitute node with node_g). The release build doesn't have debug symbols so the backtrace won't be very enlightening.
ok i will try that and post what i find out...
how do i (and substitute node with node_g). ?
node_g server.js ?
or gdb --args node server.js ?
Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x7fabbdf6c6f0 (LWP 18634)] 0x00007fabbc7b275b in write () from /lib/libpthread.so.0
(gdb) backtrace
bufcnt=1, cb=0x54a320 <node::StreamWrap::AfterWrite(uv_write_s*, int)>)
at src/unix/stream.c:730
at ../src/stream_wrap.cc:242
then i start it over and get:
events.js:48 throw arguments[1]; // Unhandled 'error' event ^ Error: accept ENOTCONN at errnoException (net_uv.js:561:11) at TCP.onconnection (net_uv.js:707:17)
Program exited with code 01.
backtrace gives no stack
another one: Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x7fceb97926f0 (LWP 23027)] 0x00007fceb7fd875b in write () from /lib/libpthread.so.0 (gdb) backtrace
bufcnt=1, cb=0x54a320 <node::StreamWrap::AfterWrite(uv_write_s*, int)>)
at src/unix/stream.c:730
at ../src/stream_wrap.cc:242
now i compiled the v.0.4.12 stable version with --debug.. works longer. but comes up with this:
[New Thread 0x7f2a70fbd6f0 (LWP 25400)]
Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x7f2a70fbd6f0 (LWP 25400)] 0x00007f2a6f803720 in __write_nocancel () from /lib/libpthread.so.0 (gdb) backtrace
$ gdb --args node /usr/local/bin/coffee proxy GNU gdb (GDB) Fedora (7.3-43.fc15) Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/local/bin/node...done. (gdb) break uv-common.c:92 Breakpoint 1 at 0x8409807: file src/uv-common.c, line 92. (gdb) handle SIGPIPE nostop noprint pass Signal Stop Print Pass to program Description SIGPIPE No No Yes Broken pipe (gdb) run Starting program: /usr/local/bin/node /usr/local/bin/coffee proxy [Thread debugging using libthread_db enabled] [New Thread 0xb7fb7b70 (LWP 20914)] [New Thread 0xb777bb70 (LWP 20915)]
Breakpoint 1, uv_err_name (err=...) at src/uv-common.c:92
92 assert(0);
(gdb) backtrace
#0 uv_err_name (err=...) at src/uv-common.c:92
#1 0x081509dd in node::SetErrno (code=104) at ../src/node.cc:1103
#2 0x08173004 in node::StreamWrap::AfterWrite (req=0x69, status=-1) at ../src/stream_wrap.cc:269
#3 0x084083a3 in uv__write_callbacks (stream=0x86a1dd0) at src/unix/stream.c:432
#4 0x08408938 in uv__stream_io (loop=0x8642040, watcher=0x86a1e30, revents=2) at src/unix/stream.c:550
#5 0x0840db2f in ev_invoke_pending (loop=0x8642040) at src/unix/ev/ev.c:2143
#6 0x0840e6e9 in ev_run (loop=0x8642040, flags=0) at src/unix/ev/ev.c:2519
#7 0x08400702 in uv_run (loop=0x8641dc0) at src/unix/core.c:188
#8 0x08153d8c in node::Start (argc=3, argv=0x69) at ../src/node.cc:2549
#9 0x0814cf1b in main (argc=3, argv=0xbffff3a4) at ../src/node_main.cc:25
(gdb)
another one node v0.4.12
Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x7fe5dda596f0 (LWP 28196)] 0x00007fe5dc29f75b in write () from /lib/libpthread.so.0 (gdb) backtrace
Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x7fa7f67de6f0 (LWP 32186)] 0x00007fa7f5024720 in __write_nocancel () from /lib/libpthread.so.0 (gdb) backtrace
From the kernel sources:
include/asm-generic/errno.h:#define ECONNRESET 104 /* Connection reset by peer */
As I can sometimes reproduce the error by disconnecting and then reconnecting the network interface while streaming stuff from nodejs, this seems like it could be it.
Do those SIGPIPE errors really matter? I used handle SIGPIPE nostop noprint pass
to get the interesting data... but then again, I don't know much about C or gdb or anything from that topic.
ok trying it also with handle sigpipe...
Actually, I'm not so sure about that disconnecting and then reconnecting the network interface thing being the cause, it randomly crashes now without me manually disrupting it, too.
I got that error on 0.5.9 with a very tiny socket.io server, too. So i guess it is triggered by something in socket.io. (there are no other dependencies and i never assert anything in the other 30 lines that power the server)
Can one you guys run your script through strace (i.e. strace node script.js
) and post the output?
Here you go. This was run on: Linux live-059 2.6.32-33-server # 72-Ubuntu SMP Fri Jul 29 21:21:55 UTC 2011 x86_64 GNU/Linux with this script
@atrefz: Thanks. I'm afraid I can't reproduce it on the current master (1b55010) with the script you linked.
The two possible culprits I see in the strace log are ECONNRESET and EPIPE but both are handled in libuv. Can you try this?
gdb --args ./node_g script.js
break uv-common.c:92
run
# wait until breakpoint
print err
EDIT: break uv-common.c:94
if you upgrade to master.
do i need to recompile node with special parameters if i want to do this?
./configure --debug && make
<- builds node_g
Breakpoint 1, uv_err_name (err=...) at src/uv-common.c:92
92 assert(0);
(gdb) print err
$1 = {code = 104, sys_errno_ = 32767}
(gdb) backtrace
#0 uv_err_name (err=...) at src/uv-common.c:92
#1 0x000000000061143a in node::SetErrno (code=104) at ../src/node.cc:1103
#2 0x0000000000642245 in node::StreamWrap::AfterWrite (req=0xfd3dc8, status=-1) at ../src/stream_wrap.cc:335
#3 0x0000000000a0d7d1 in uv__write_callbacks (stream=0x11f2c50) at src/unix/stream.c:464
#4 0x0000000000a0e159 in uv__stream_io (loop=0xf5b560, watcher=0x11f2cf8, revents=2) at src/unix/stream.c:666
#5 0x0000000000a140d8 in ev_invoke_pending (loop=0xf5b560) at src/unix/ev/ev.c:2143
#6 0x0000000000a14e9a in ev_run (loop=0xf5b560, flags=0) at src/unix/ev/ev.c:2519
#7 0x0000000000a04f6f in uv_run (loop=0xf5b240) at src/unix/core.c:188
#8 0x0000000000618a33 in node::Start (argc=2, argv=0x7fffffffe658) at ../src/node.cc:2578
#9 0x0000000000610200 in main (argc=2, argv=0x7fffffffe658) at ../src/node_main.cc:25
Little context, this only happens when running ab -k -n 10000 -c 5000 http://localhost:3000/tobi.jpeg
. Non-keepalive works fine. The code being performance tested is Connect's staticCache. It should be serving the requested jpeg from memory.
@ryanrolds: Thanks. Not sure how that can happen. Will investigate.
If you need more information, just ask.
Confirming that I'm experiencing the same issue in 0.5.10rc2.
Hello,
With v0.5.11-pre is the same problem, but with src/uv-common.c:94.
The result of gdbs is: Breakpoint 1, uv_err_name (err=...) at src/uv-common.c:94 warning: Source file is more recent than executable. 94 assert(0); (gdb) backtrace
at ../src/stream_wrap.cc:337
at src/unix/stream.c:476
at src/unix/stream.c:677
(gdb)
confirming in 0.6
the error code failing is 104, which under linux is connection reset:
#define ECONNRESET 104 /* Connection reset by peer */
adding an additional reset to the uv_error_code enum in uv.h:
UV_ECONNRESET2 = 104
and adding a case to uv_err_name in uv-common.c:
case UV_ECONNRESET2: return "ECONNRESET2";
fixes the assertion failure and allows node to recover correctly.
connection reset appears to occur under heavy load.
We should fprintf(stderr)
the bad error code
@ry did that, error code was 104 (linux connection reset). i added it to the maps, but the patch got rejected.
Looks like same issue on ubuntu 11.10, node 0.6.1-pre (updated from master branch and recompiled).
node: src/uv-common.c:58: uv_err_name: Assertion `0' failed. Segmentation fault
this issue was fixed in fd2b04d784d6cc10ca985a5548835310622bda50
node: src/uv-common.c:58: uv_err_name: Assertion `0' failed.
still happening with node 0.6.1 cluster used.
same problem with @pyrostrex.
node 0.6.1 on REHL 5.3 node cluster is used.
I also got a broken pipe:
Program received signal SIGPIPE, Broken pipe.
0x00002aaaac22a6bb in write () from /lib64/libpthread.so.0
(gdb) backtrace
#0 0x00002aaaac22a6bb in write () from /lib64/libpthread.so.0
#1 0x00000000007fd375 in uv__write (stream=0xc955c0) at src/unix/stream.c:400
#2 0x00000000007fe5ef in uv_write2 (req=0xcaa958, stream=0xc955c0, bufs=0x7fff3601eb70, bufcnt=1, send_handle=0x0, cb=0x53a000 <node::StreamWrap::AfterWrite(uv_write_s*, int)>) at src/unix/stream.c:881
#3 0x00000000007fe680 in uv_write (req=0xcaa958, stream=0xc955c0, bufs=0x7fff3601eb70, bufcnt=1, cb=0x53a000 <node::StreamWrap::AfterWrite(uv_write_s*, int)>) at src/unix/stream.c:902
#4 0x000000000053a672 in node::StreamWrap::Write (args=@0x7fff3601ebc0) at ../src/stream_wrap.cc:313
#5 0x00001e33f5cd116a in ?? ()
#6 0x00007fff3601ec18 in ?? ()
#7 0x00007fff3601ec20 in ?? ()
#8 0x0000000000000001 in ?? ()
#9 0x0000000000000000 in ?? ()
SIGPIPE should be ignored... https://github.com/joyent/node/blob/5c3954aa6fb2e0b23a7bcfec6fc4636d5d152333/src/node.cc#L2380
I also got a broken pipe:
Program received signal SIGPIPE, Broken pipe. 0x00002aaaac22a6bb in write () from /lib64/libpthread.so.0 (gdb) backtrace #0 0x00002aaaac22a6bb in write () from /lib64/libpthread.so.0 #1 0x00000000007fd375 in uv__write (stream=0xc955c0) at src/unix/stream.c:400 #2 0x00000000007fe5ef in uv_write2 (req=0xcaa958, stream=0xc955c0, bufs=0x7fff3601eb70, bufcnt=1, send_handle=0x0, cb=0x53a000 <node::StreamWrap::AfterWrite(uv_write_s, int)>) at src/unix/stream.c:881 #3 0x00000000007fe680 in uv_write (req=0xcaa958, stream=0xc955c0, bufs=0x7fff3601eb70, bufcnt=1, cb=0x53a000 <node::StreamWrap::AfterWrite(uv_write_s, int)>) at src/unix/stream.c:902 #4 0x000000000053a672 in node::StreamWrap::Write (args=@0x7fff3601ebc0) at ../src/stream_wrap.cc:313 #5 0x00001e33f5cd116a in ?? () #6 0x00007fff3601ec18 in ?? () #7 0x00007fff3601ec20 in ?? () #8 0x0000000000000001 in ?? () #9 0x0000000000000000 in ?? ()
It's an artefact of how gdb reports some signals: it reports them even if the program ignores them. handle SIGPIPE nostop noprint
makes gdb ignore them as well.
It turns out, it is not because of cluster. I tried to turn off cluster but it still raise up.
@ry Are you okay with applying the patch below as a band aid? It doesn't fix the root issue (unmapped errors) but at least it gives people a chance to handle the error themselves.
diff --git a/src/uv-common.c b/src/uv-common.c
index 3143bd2..9ace0f3 100644
--- a/src/uv-common.c
+++ b/src/uv-common.c
@@ -55,8 +55,7 @@ const char* uv_err_name(uv_err_t err) {
switch (err.code) {
UV_ERRNO_MAP(UV_ERR_NAME_GEN)
default:
- assert(0);
- return NULL;
+ return "Unknown system error";
}
}
#undef UV_ERR_NAME_GEN
I want to root cause this. They shouldn't be seeing this error now that all of the errnos are mapped.
Confirmed to be still happening with nodeJS 0.6.1:
node: src/uv-common.c:58: uv_err_name: Assertion `0' failed. Aborted
I'd suggest adding a way to know which errno caused this. I'll manually patch my nodejs to show it for now.
EDIT: added a fprintf right before the assert(0) to show the error when the thing crashes, should have it happen within ~24 hours or so.
May I suggest the list of errno not to be made manually but generated automatically from the system's includes?
please insert a printf there and tell us the code:
diff --git a/src/uv-common.c b/src/uv-common.c
index 3143bd2..1ee7d42 100644
--- a/src/uv-common.c
+++ b/src/uv-common.c
@@ -55,6 +55,7 @@ const char* uv_err_name(uv_err_t err) {
switch (err.code) {
UV_ERRNO_MAP(UV_ERR_NAME_GEN)
default:
+ printf("bad err.code = %d\n", err.code);
assert(0);
return NULL;
}
you'll need to do make clean all
In future, do not debug hard, maybe you put this patch in the repository? Will be simple to communicate only error code without using gdb.
bad err.code = 104 node: src/uv-common.c:59: uv_err_name: Assertion `0' failed. Aborted
here is from the bt full:
node: src/uv-common.c:59: uv_err_name: Assertion `0' failed.
Program received signal SIGABRT, Aborted.
0x00002aaaac468265 in raise () from /lib64/libc.so.6
(gdb) bt full
#0 0x00002aaaac468265 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00002aaaac469d10 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00002aaaac4616e6 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#3 0x00000000007ff40f in uv_err_name (err={code = 104, sys_errno_ = 0}) at src/uv-common.c:59
__PRETTY_FUNCTION__ = "uv_err_name"
#4 0x000000000051dda9 in node::SetErrno (err=<value optimized out>) at ../src/node.cc:902
scope = {isolate_ = 0xc3e280, prev_next_ = 0xc75ed0, prev_limit_ = 0xc77bb0, is_closed_ = false}
#5 0x000000000053a145 in node::StreamWrap::AfterWrite (req=<value optimized out>, status=-1) at ../src/stream_wrap.cc:358
req_wrap = (WriteWrap *) 0xca0590
wrap = <value optimized out>
scope = {isolate_ = 0xc3e280, prev_next_ = 0xc75ed0, prev_limit_ = 0xc77bb0, is_closed_ = false}
argv = {{<v8::Handle<v8::Value>> = {val_ = 0x7fff85d00bc0}, <No data fields>}, {<v8::Handle<v8::Value>> = {val_ = 0x804ef7}, <No data fields>}, {<v8::Handle<v8::Value>> = {val_ = 0x0}, <No data fields>},
{<v8::Handle<v8::Value>> = {val_ = 0xca0f80}, <No data fields>}}
__PRETTY_FUNCTION__ = "static void node::StreamWrap::AfterWrite(uv_write_t*, int)"
#6 0x00000000007fd737 in uv__write_callbacks (stream=0xca2c90) at src/unix/stream.c:494
callbacks_made = 0
q = (ngx_queue_t *) 0xca05c0
req = (uv_write_t *) 0xca0598
__PRETTY_FUNCTION__ = "uv__write_callbacks"
#7 0x00000000007fdfa6 in uv__stream_io (loop=0xc3c000, watcher=0xca2d38, revents=2) at src/unix/stream.c:695
stream = (uv_stream_t *) 0xca2c90
__PRETTY_FUNCTION__ = "uv__stream_io"
#8 0x000000000080367c in ev_invoke_pending (loop=0xc3c000) at src/unix/ev/ev.c:2143
p = (ANPENDING *) 0xce8f60
pri = 2
#9 0x0000000000803e46 in ev_run (loop=0xc3c000, flags=0) at src/unix/ev/ev.c:2519
__PRETTY_FUNCTION__ = "ev_run"
#10 0x00000000007f52ce in uv_run (loop=0xc3bce0) at src/unix/core.c:192
No locals.
#11 0x000000000051f107 in node::Start (argc=<value optimized out>, argv=0xc3d340) at ../src/node.cc:2478
handle_scope = {isolate_ = 0xc3e280, prev_next_ = 0x0, prev_limit_ = 0x0, is_closed_ = false}
process = <value optimized out>
#12 0x00002aaaac455994 in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#13 0x000000000051b3f9 in _start ()
No symbol table info available.
Updated Node v0.6 branch with this fix in joyent/node@44314ccf48332c91db57a53d5fc0e3de48642681
Got this on node v0.6.1:
PROCESS TITLE: src/uv-common.c:58: uv_err_name: Assertion `0' failed. Avbruten (SIGABRT)
where PROCESS TITLE is the process.title value.
@daniel-j Yes, use the newest node v0.6.
I am getting this often on my node.js v0.6.17
@veeru-as On what platform? If you're on a UNIX, can you try this:
$ gdb -q --args node script.js
> run
# wait for assertion to hit
> backtrace full
Please post the output.
my node server.js is breaking from time to time with this error:
src/uv-common.c:92: uv_err_name: Assertion `0' failed.
it uses socket.io and get many connections...
is there a way to try/catch this somehow in my server?
i'm totally lost.. there are no more informations when it crashes....
thanks in advance for your help guys..