Closed lomaster1 closed 1 year ago
Here is a stack which I catch with debug node:
Core was generated by `/var/lib/nave/installed/10.16.3/bin/node --expose-internals --expose-gc --max-h'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f51f196d428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f51f2de0740 (LWP 11051))]
(gdb) bt
#0 0x00007f51f196d428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007f51f196f02a in __GI_abort () at abort.c:89
#2 0x0000000001ab1288 in node::Abort () at ../src/node.cc:1043
#3 0x0000000001ab1f15 in node::OnFatalError (location=0x35e6798 "v8::HandleScope::CreateHandle()", message=0x35e6768 "Cannot create a handle without a HandleScope") at ../src/node.cc:1351
#4 0x0000000001e2f7d2 in v8::Utils::ReportApiFailure (location=0x35e6798 "v8::HandleScope::CreateHandle()", message=0x35e6768 "Cannot create a handle without a HandleScope")
at ../deps/v8/src/api.cc:435
#5 0x0000000001e1ccbb in v8::Utils::ApiCheck (condition=false, location=0x35e6798 "v8::HandleScope::CreateHandle()", message=0x35e6768 "Cannot create a handle without a HandleScope")
at ../deps/v8/src/api.h:139
#6 0x00000000024ed4ce in v8::internal::HandleScope::Extend (isolate=0x52265e0) at ../deps/v8/src/handles.cc:69
#7 0x0000000001e05092 in v8::internal::HandleScope::CreateHandle (isolate=0x52265e0, value=0x15fd40874939) at ../deps/v8/src/handles-inl.h:169
#8 0x0000000001e323cd in v8::HandleScope::CreateHandle (isolate=0x52265e0, value=0x15fd40874939) at ../deps/v8/src/api.cc:1144
#9 0x0000000001a4d698 in v8::Local<v8::Object>::New (isolate=0x52265e0, that=0x9981950) at ../deps/v8/include/v8.h:9378
#10 0x0000000001a4c40c in v8::Local<v8::Object>::New (isolate=0x52265e0, that=...) at ../deps/v8/include/v8.h:9369
#11 0x0000000001a4c9c8 in node::WeakPersistentToLocal<v8::Object> (isolate=0x52265e0, persistent=...) at ../src/util-inl.h:190
#12 0x0000000001a4adb0 in node::PersistentToLocal<v8::Object> (isolate=0x52265e0, persistent=...) at ../src/util-inl.h:173
#13 0x0000000001a49a94 in node::BaseObject::object (this=0x8429340) at ../src/base_object-inl.h:65
#14 0x0000000001b20b9c in node::http2::Http2Scope::Http2Scope (this=0x7ffe4352c470, session=0x8429340) at ../src/node_http2.cc:87
#15 0x0000000001b20b20 in node::http2::Http2Scope::Http2Scope (this=0x7ffe4352c470, stream=0x142524e0) at ../src/node_http2.cc:70
#16 0x0000000001b2ab8f in node::http2::Http2Stream::FlushRstStream (this=0x142524e0) at ../src/node_http2.cc:2237
#17 0x0000000001b27edb in node::http2::Http2Session::ClearOutgoing (this=0x8429340, status=0) at ../src/node_http2.cc:1708
#18 0x0000000001b27818 in node::http2::Http2Session::OnStreamAfterWrite (this=0x8429340, w=0x5f82900, status=0) at ../src/node_http2.cc:1616
#19 0x0000000001a6dadb in node::StreamResource::EmitAfterWrite (this=0x1785b080, w=0x5f82900, status=0) at ../src/stream_base-inl.h:135
#20 0x0000000001a6dbf0 in node::WriteWrap::OnDone (this=0x5f82900, status=0) at ../src/stream_base-inl.h:430
#21 0x0000000001a9da21 in node::StreamReq::Done (this=0x5f82900, status=0, error_str=0x0) at ../src/stream_base-inl.h:442
#22 0x0000000001c991b4 in node::TLSWrap::InvokeQueued (this=0x1785af90, status=0, error_str=0x0) at ../src/tls_wrap.cc:96
#23 0x0000000001c9a472 in node::TLSWrap::EncOut (this=0x1785af90) at ../src/tls_wrap.cc:295
#24 0x0000000001c9ab2d in node::TLSWrap::OnStreamAfterWrite (this=0x1785af90, req_wrap=0x82c1bd0, status=0) at ../src/tls_wrap.cc:368
#25 0x0000000001a6dadb in node::StreamResource::EmitAfterWrite (this=0x6bfee40, w=0x82c1bd0, status=0) at ../src/stream_base-inl.h:135
#26 0x0000000001a6dbf0 in node::WriteWrap::OnDone (this=0x82c1bd0, status=0) at ../src/stream_base-inl.h:430
#27 0x0000000001a9da21 in node::StreamReq::Done (this=0x82c1bd0, status=0, error_str=0x0) at ../src/stream_base-inl.h:442
#28 0x0000000001a9dd7e in node::JSStream::Finish<node::WriteWrap> (args=...) at ../src/js_stream.cc:159
#29 0x0000000001f37714 in v8::internal::FunctionCallbackArguments::Call (this=0x7ffe4352ccb0, handler=0x17d923ae2849) at ../deps/v8/src/api-arguments-inl.h:94
#30 0x0000000001f3a53a in v8::internal::(anonymous namespace)::HandleApiCallHelper<false> (isolate=0x52265e0, function=..., new_target=..., fun_data=..., receiver=..., args=...)
at ../deps/v8/src/builtins/builtins-api.cc:109
#31 0x0000000001f38430 in v8::internal::Builtin_Impl_HandleApiCall (args=..., isolate=0x52265e0) at ../deps/v8/src/builtins/builtins-api.cc:139
#32 0x0000000001f381b1 in v8::internal::Builtin_HandleApiCall (args_length=7, args_object=0x7ffe4352ced8, isolate=0x52265e0) at ../deps/v8/src/builtins/builtins-api.cc:127
#33 0x000003d4db841d64 in ?? ()
#34 0x000003d4db841cc1 in ?? ()
#35 0x00007ffe4352ce80 in ?? ()
#36 0x0000000000000006 in ?? ()
#37 0x00007ffe4352cf08 in ?? ()
#38 0x000003d4dc23f41a in ?? ()
#39 0x00001d2549c826f1 in ?? ()
#40 0x000017d923ae3599 in ?? ()
#41 0x0000000700000000 in ?? ()
#42 0x00001d2549c82801 in ?? ()
#43 0x0000000000000000 in ?? ()
@lomaster1 Thanks for the report! I assume this would fix the debug mode crash, but maybe not the original crash? Can you verify?
diff --git a/src/node_http2.cc b/src/node_http2.cc
index 6bd282593e28..0ba88946fc5c 100644
--- a/src/node_http2.cc
+++ b/src/node_http2.cc
@@ -2100,6 +2100,7 @@ void Http2Stream::SubmitRstStream(const uint32_t code) {
void Http2Stream::FlushRstStream() {
if (IsDestroyed())
return;
+ HandleScope handle_scope(env()->isolate());
Http2Scope h2scope(this);
CHECK_EQ(nghttp2_submit_rst_stream(**session_, NGHTTP2_FLAG_NONE,
id_, code_), 0);
@addaleax thanks for the fix. I applied it and got the same error as in production.
Core was generated by `/var/lib/nave/installed/10.16.3/bin/node --expose-internals --expose-gc --max-h'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000001b27c83 in node::http2::Http2Session::MaybeStopReading (this=0x6975610) at ../src/node_http2.cc:1673
1673 stream_->ReadStop();
[Current thread is 1 (Thread 0x7f39abdd0740 (LWP 20545))]
(gdb) bt
#0 0x0000000001b27c83 in node::http2::Http2Session::MaybeStopReading (this=0x6975610) at ../src/node_http2.cc:1673
#1 0x0000000001b2864f in node::http2::Http2Session::SendPendingData (this=0x6975610) at ../src/node_http2.cc:1815
#2 0x0000000001b27974 in node::http2::Http2Session::<lambda(node::Environment*, void*)>::operator()(node::Environment *, void *) const (__closure=0x0, env=0x7ffc13154330, data=0x6975610)
at ../src/node_http2.cc:1662
#3 0x0000000001b279cb in node::http2::Http2Session::<lambda(node::Environment*, void*)>::_FUN(node::Environment *, void *) () at ../src/node_http2.cc:1663
#4 0x0000000001a78744 in node::Environment::<lambda()>::operator()(void) const (__closure=0x7ffc13154140) at ../src/env.cc:500
#5 0x0000000001a78993 in node::Environment::RunAndClearNativeImmediates (this=0x7ffc13154330) at ../src/env.cc:516
#6 0x0000000001a78acb in node::Environment::CheckImmediate (handle=0x7ffc13154498) at ../src/env.cc:536
#7 0x0000000001d1fbee in uv__run_check (loop=0x4af9120 <default_loop_struct>) at ../deps/uv/src/unix/loop-watcher.c:67
#8 0x0000000001d166b7 in uv_run (loop=0x4af9120 <default_loop_struct>, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:365
#9 0x0000000001abbc11 in node::Start (isolate=0x5f045e0, isolate_data=0x5f6ce10, args=std::vector of length 2, capacity 5 = {...}, exec_args=std::vector of length 3, capacity 4 = {...})
at ../src/node.cc:2895
#10 0x0000000001abbf91 in node::Start (event_loop=0x4af9120 <default_loop_struct>, args=std::vector of length 2, capacity 5 = {...}, exec_args=std::vector of length 3, capacity 4 = {...})
at ../src/node.cc:2994
#11 0x0000000001ab960d in node::Start (argc=5, argv=0x5f02030) at ../src/node.cc:3053
#12 0x0000000002e8ebc0 in main (argc=5, argv=0x7ffc13154fc8) at ../src/node_main.cc:124
@lomaster1 Can you print the contents of stream_
on the crashing line? If that’s not NULL, is there any chance you could run the program under valgrind? That might give a better error than gdb here if this is e.g. use-after-free.
@addaleax I added this before stream_->ReadStop();
(also I make listener_ and other fields public)
fprintf(stderr, "XXX this %p -> stream_ is not null: %d\n", this, stream_!=nullptr);
fprintf(stderr, "XXX this %p -> stream_ %p: listener %p, bR%d, bW%d\n", this, stream_, stream_->listener_, stream_->bytes_read_, stream_->bytes_written_);
And got crash with stack:
Core was generated by `/var/lib/nave/installed/10.16.3/bin/node --expose-internals --expose-gc --max-h'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000001b27cb5 in node::http2::Http2Session::MaybeStopReading (this=0x9c8f8c0) at ../src/node_http2.cc:1674
1674 fprintf(stderr, "XXX this %p -> stream_ %p: listener %p, bR%d, bW%d\n", this, stream_, stream_->listener_, stream_->bytes_read_, stream_->bytes_written_);
[Current thread is 1 (Thread 0x7f10ecb58740 (LWP 24690))]
(gdb) bt
#0 0x0000000001b27cb5 in node::http2::Http2Session::MaybeStopReading (this=0x9c8f8c0) at ../src/node_http2.cc:1674
#1 0x0000000001b286d9 in node::http2::Http2Session::SendPendingData (this=0x9c8f8c0) at ../src/node_http2.cc:1817
#2 0x0000000001b27974 in node::http2::Http2Session::<lambda(node::Environment*, void*)>::operator()(node::Environment *, void *) const (__closure=0x0, env=0x7ffeca078e50, data=0x9c8f8c0)
at ../src/node_http2.cc:1662
#3 0x0000000001b279cb in node::http2::Http2Session::<lambda(node::Environment*, void*)>::_FUN(node::Environment *, void *) () at ../src/node_http2.cc:1663
#4 0x0000000001a78744 in node::Environment::<lambda()>::operator()(void) const (__closure=0x7ffeca078c60) at ../src/env.cc:500
#5 0x0000000001a78993 in node::Environment::RunAndClearNativeImmediates (this=0x7ffeca078e50) at ../src/env.cc:516
#6 0x0000000001a78acb in node::Environment::CheckImmediate (handle=0x7ffeca078fb8) at ../src/env.cc:536
#7 0x0000000001d1fc78 in uv__run_check (loop=0x4af9120 <default_loop_struct>) at ../deps/uv/src/unix/loop-watcher.c:67
#8 0x0000000001d16741 in uv_run (loop=0x4af9120 <default_loop_struct>, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:365
#9 0x0000000001abbc11 in node::Start (isolate=0x63b35e0, isolate_data=0x641be10, args=std::vector of length 2, capacity 5 = {...}, exec_args=std::vector of length 3, capacity 4 = {...})
at ../src/node.cc:2895
#10 0x0000000001abbf91 in node::Start (event_loop=0x4af9120 <default_loop_struct>, args=std::vector of length 2, capacity 5 = {...}, exec_args=std::vector of length 3, capacity 4 = {...})
at ../src/node.cc:2994
#11 0x0000000001ab960d in node::Start (argc=5, argv=0x63b1030) at ../src/node.cc:3053
#12 0x0000000002e8ebc0 in main (argc=5, argv=0x7ffeca079ae8) at ../src/node_main.cc:124
In logs:
XXX this 0x9c8f8c0 -> stream_ is not null: 1
XXX this 0x9c8f8c0 -> stream_ 0x9be5d30: listener 0x9c8f8f0, bR40, bW42
XXX this 0x9c8f8c0 -> stream_ is not null: 1
XXX this 0x9c8f8c0 -> stream_ 0x9be5d30: listener 0x9c8f8f0, bR49, bW308
XXX this 0x9c8f8c0 -> stream_ is not null: 1
XXX this 0x9c8f8c0 -> stream_ 0x9be5d30: listener 0x9c8f8f0, bR34751, bW334
XXX this 0x9c8f8c0 -> stream_ is not null: 1
XXX this 0x9c8f8c0 -> stream_ 0x9be5d30: listener 0x9c8f8f0, bR83060, bW360
XXX this 0x9c8f8c0 -> stream_ is not null: 0
That means that stream_ is null in some cases.
I didn't use valgrind before. All with valgrind will be work too slow then debug node. To reproduce the bug I send a lot of requests to service.
Okay, thanks for the info so far – that’s definitely helpful.
Some times I got this error:
TypeError: Cannot read property 'finishWrite' of null
at JSStreamWrap.finishWrite (internal/wrap_js_stream.js:195:12)
at Immediate.setImmediate (internal/wrap_js_stream.js:218:12)
at runCallback (timers.js:705:18)
at tryOnImmediate (timers.js:676:5)
at processImmediate (timers.js:658:5)
on the same steps when I'm repeating segfaults. Maybe it's related to issue.
Some additional information: we're using TlsSocket around our own Duplex stream. We wrote own protocol to send/receive data from our proxy server (Http2->TlsSocket->Duplex->Proxy). I send a lot of requests to the service and server CPU is about 95-100% (maybe it's because node is in debug). and all works slow. I noticed when Duplex closed, for example, by timeout, this error or segfault happened. There are also caught errors in the log:
Error [ERR_STREAM_WRITE_AFTER_END]: write after end
at writeAfterEnd (_stream_writable.js:248:12)
at Duplex.Writable.write (_stream_writable.js:296:5) // it's Duplex stream to proxy
at Duplex.fake_socket.write (/var/product/module.js:694:31) // our module
at JSStreamWrap.doWrite (internal/wrap_js_stream.js:160:19)
at JSStream.onwrite (internal/wrap_js_stream.js:20:57)
Some additional information: we're using TlsSocket around our own Duplex stream. We wrote own protocol to send/receive data from our proxy server (Http2->TlsSocket->Duplex->Proxy)
That makes it likely that looking at your code would be very helpful – would you be willing to share it, or at least the parts that implement that Duplex stream?
Some additional information: we're using TlsSocket around our own Duplex stream. We wrote own protocol to send/receive data from our proxy server (Http2->TlsSocket->Duplex->Proxy)
That makes it likely that looking at your code would be very helpful – would you be willing to share it, or at least the parts that implement that Duplex stream?
I can't share the code. Duplex stream is a stream in multiplexing WS connection (like as http2 streams). It has a lot of code.
New information: we have a timer to cancel a long query and we're calling http2stream.close(http2.constants.NGHTTP2_CANCEL);
if timeout. There are no segfaults without this close call. UPD: there is one crash for many tries (>6).
Also, I checked destroyed, aborted and closed flags for http2stream and session before calling the close method and got a segfault. The same segfault if call the close method without arguments.
In production, this solution (no http2stream.close if timeout) worked not good - segfaults returned. UPD: not so many as before
I created wrapper (new duplex) around our duplex stream to the proxy and segfaults gone. I'm passing with a wrapper to tls socket and then tls socket to http2.
class TunnelStreamWrapper extends Duplex {
constructor(source, opt){
super(opt);
this.source = source;
this._source_on_data = this._source_on_data.bind(this);
this._source_on_error = this._source_on_error.bind(this);
this._source_on_close = this._source_on_close.bind(this);
this.source.on('data', this._source_on_data);
this.source.on('error', this._source_on_error);
this.source.once('close', this._source_on_close);
}
_source_on_data(d){ this.push(d); }
_source_on_error(e){ this.emit('error', e); }
_source_on_close(){
// setImmediate important to prevent error 'Cannot read
// property 'finishWrite' of null' and crash
setImmediate(this.destroy.bind(this));
}
_write(chunk, encoding, cb){
if (this.source.writable)
this.source.write(chunk, encoding, cb);
else
cb(new Error('Source socket is not writable'));
}
_read(size){}
_destroy(e, cb){
this.source.off('data', this._source_on_data);
this.source.off('error', this._source_on_error);
this.source.destroy();
this.source = undefined;
cb(e);
}
}
Also, I have been testing with emitClose: false
for our Duplex to proxy (used without this wrapper) and segfaults gone too.
When I added this.emit('close')
to wrapper when source stream emitted close, segfaults returned.
I'm having a similar-looking segfault issue in my server, only when using HTTP/2. @lomaster1 / @addaleax did you ever get to the bottom of this?
Reproducible for me in node 10.15.3, 10.21.0 and 12.18.3.
I'm also wrapping streams up as sockets in some cases in my app, using the built-in _stream_wrap
, but I've now disabled that everywhere and I'm still seeing this behaviour, so I'm fairly confident it's unrelated.
Some example traces, via segfault-handler:
.../node_modules/segfault-handler/build/Release/segfault-handler.node(+0x3127)[0x7fdb5a5fb127]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7fdb735f58a0]
node(_ZN4node7TLSWrap6EncOutEv+0x170)[0xa09010]
node(_ZN4node7TLSWrap7DoWriteEPNS_9WriteWrapEP8uv_buf_tmP11uv_stream_s+0x2c7)[0xa0a6c7]
node(_ZN4node5http212Http2Session15SendPendingDataEv+0x4ce)[0x93b5ae]
node(_ZN4node5http212Http2Session5CloseEjb+0xda)[0x93c4fa]
node[0xb62a3f]
node(_ZN2v88internal21Builtin_HandleApiCallEiPPNS0_6ObjectEPNS0_7IsolateE+0xb9)[0xb635a9]
[0xcec6c2dbe1d]
[1] 30818 segmentation fault (core dumped) node ./bin/run start
PID 26794 received SIGSEGV for address: 0x20
.../node_modules/segfault-handler/build/Release/segfault-handler.node(+0x3127)[0x7f7dd00b4127]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7f7dd1ebc8a0]
node(_ZN4node10StreamBase5WriteEP8uv_buf_tmP11uv_stream_sN2v85LocalINS5_6ObjectEEE+0x2b)[0xad97cb]
node(_ZN4node7TLSWrap6EncOutEv+0x2a7)[0xb53e07]
node(_ZN4node7TLSWrap7DoWriteEPNS_9WriteWrapEP8uv_buf_tmP11uv_stream_s+0x438)[0xb554d8]
node(_ZN4node5http212Http2Session15SendPendingDataEv+0x4ec)[0xa40b4c]
node(_ZN4node5http212Http2Session5CloseEjb+0x26e)[0xa413be]
node[0xbf0239]
node(_ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE+0xb7)[0xbf2027]
node[0x13cf119]
[1] 26794 segmentation fault (core dumped) node ./bin/run start
PID 32363 received SIGSEGV for address: 0x20
.../node_modules/segfault-handler/build/Release/segfault-handler.node(+0x3127)[0x7fc4c85f7127]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7fc4d0d428a0]
node(_ZN4node10StreamBase5WriteEP8uv_buf_tmP11uv_stream_sN2v85LocalINS5_6ObjectEEE+0x2b)[0xad97cb]
node(_ZN4node7TLSWrap6EncOutEv+0x2a7)[0xb53e07]
node(_ZN4node7TLSWrap7DoWriteEPNS_9WriteWrapEP8uv_buf_tmP11uv_stream_s+0x438)[0xb554d8]
node(_ZN4node5http212Http2Session15SendPendingDataEv+0x4ec)[0xa40b4c]
node(_ZN4node5http212Http2Session5CloseEjb+0x26e)[0xa413be]
node[0xbf0239]
node(_ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE+0xb7)[0xbf2027]
node[0x13cf119]
Any clues would be much appreciated! I'm still doing some fairly funky networking tricks in places (unwrapping CONNECT requests, peeking at first bytes of connections), I'm going to remove each of those one by one and see if I can hunt down a clearer culprit.
For my case, all the code involved is open-source (see https://github.com/httptoolkit/httpolyglot/blob/master/lib/index.js and https://github.com/httptoolkit/mockttp/tree/h2, especially http-combo-server) but it's tricky to set up - currently my most reliable option for reproducing this is by proxying connections from Tiktok on my Android phone for 2 minutes :smile:.
Ok, I'm not sure if I've narrowed down the root cause, but I've found a reliable workaround.
Currently, I accept CONNECT requests over HTTP/1, and then call server.emit('connection', socket)
after the initial setup for my HTTP/2 server, effectively passing the tunnelled connection right back to myself. It sounds like @lomaster1 might be doing something similar. That looks like this:
h1Server.on('connect', (req, socket) => {
socket.write('HTTP/' + req.httpVersion + ' 200 OK\r\n\r\n', 'utf-8', () => {
h2Server.emit('connection', socket);
});
});
If I add const SocketWrapper = require('_stream_wrap')
and change the above to h2Server.emit('connection', new SocketWrapper(socket))
, the segfault here disappears completely, I can't reproduce it at all.
Not sure why, I can only guess that using the stream wrapper moves it to the simpler (and slower) stream path, avoiding the underlying bug. Regardless of the root cause, that seems to be sufficient as a workaround for me for now.
I'm now fairly sure this was fixed by https://github.com/nodejs/node/pull/41185 in v16.14+ and this can be closed. I no longer see this issue in production despite heavy use of all the above.
You're almost certainly right. Thanks, closing.
I got it on production (many times per day):
Can you help me investigate why it happens? What can I do to provide you with information to fix it? I can't use debug node in production, because it's very slow.