Open kotsss opened 5 years ago
How easy is this to reproduce for you? Can you modify the Node.js source code? It might be helpful to add an assertion in doClose()
that checks that this._handle
is still alive at that point, so that the crash is observable more easily?
Putting a modified Node.js compilation on the server and waiting to reproduce was something we raised as an option but were hopping to for it to be plan B. Plan A was to find way (using this post) to alter the existing version:
But if you're saying that compiling is the way then I'll get working on that.
Ok, I've got a modified node running and got new stacktrace:
AssertionError [ERR_ASSERTION]: Handle null - very bad
Error
at JSStreamWrap.doClose (internal/wrap_js_stream.js:214:9) // assert(_this.handle)
at JSStreamWrap.handle.close (internal/wrap_js_stream.js:39:12)
at TLSWrap.close (_tls_wrap.js:395:23)
at TLSSocket.Socket._destroy (net.js:595:18)
at TLSSocket._destroy (/var/agent/util/net.js:379:37) // net.Socket.prototype hook for stats
at TLSSocket.destroy (internal/streams/destroy.js:32:8)
at JSStreamWrap.TLSSocket.wrap.once (_tls_wrap.js:317:35)
at Object.onceWrapper (events.js:277:13)
at JSStreamWrap.emit (events.js:189:13)
at _handle.close (net.js:597:12)
at Immediate.setImmediate (internal/wrap_js_stream.js:231:7) // cb();
at runCallback (timers.js:705:18)
at tryOnImmediate (timers.js:676:5)
at processImmediate (timers.js:658:5)
Async
at JSStreamWrap.doClose (internal/wrap_js_stream.js:212:30) // save first doClose stack on 'this'
at JSStreamWrap.handle.close (internal/wrap_js_stream.js:39:12)
at JSStreamWrap.Socket._destroy (net.js:595:18)
at JSStreamWrap._destroy (/var/agent/util/net.js:379:37) // net.Socket.prototype hook for stats
at JSStreamWrap.destroy (internal/streams/destroy.js:32:8)
at Duplex.JSStreamWrap.stream.once (internal/wrap_js_stream.js:76:12)
at Object.onceWrapper (events.js:277:13)
at Duplex.emit (events.js:189:13)
at emitCloseNT (internal/streams/destroy.js:59:8)
at process._tickCallback (internal/process/next_tick.js:63:19)
So TLSWrap._parent points to Socket._handle that shows how the invalid call is possible, but this doesn't reproduce to a million other tls sockets used by the app.
Sounds like a race condition of some sorts, but it's beyond my code understanding to figure it out.
Edit: increased Error.stackTraceLimit and got full stack Edit2: add async jump to get more stack Edit3: managed to catch a non-crashing doClose stack for reference
STREAM_WRAP 13998: doClose stack
Error
at JSStreamWrap.doClose (internal/wrap_js_stream.js:212:30)
at JSStreamWrap.handle.close (internal/wrap_js_stream.js:39:12)
at JSStreamWrap.Socket._destroy (net.js:595:18)
at JSStreamWrap._destroy (/var/agent/util/net.js:379:37)
at JSStreamWrap.destroy (internal/streams/destroy.js:32:8)
at TLSWrap.close (_tls_wrap.js:393:29)
at TLSSocket.Socket._destroy (net.js:595:18)
at TLSSocket._destroy (/var/agent/util/net.js:379:37)
at TLSSocket.destroy (internal/streams/destroy.js:32:8)
at TLSSocket.onConnectEnd (_tls_wrap.js:1095:10)
at Object.onceWrapper (events.js:277:13)
at TLSSocket.emit (events.js:194:15)
at endReadableNT (_stream_readable.js:1125:12)
at process._tickCallback (internal/process/next_tick.js:63:19)
We've been able to narrow the crash down to HTTP2 client sockets by correlating http2 usage and crash frequency. We're trying to reproduce issue in our tests, no luck yet. I'll be happy to run patches on the code to provide more information.
Will appreciate if someone can take a look at the stacktraces I provided and/or give more debugging instructions.
looks like difference between good and bad case is the former is handling end
while the later, close
event, but definitely not enough to make any conclusions.
cc @nodejs/streams
Since upgrading to Node 10 from Node 8 we've been seeing the following stack appear. I'm having trouble tracking down what's causing it - but could it be related to this issue?
{
"date": "Thu Feb 06 2020 12:05:47 GMT+1100 (Australian Eastern Daylight Time)",
"process": {
"pid": 339,
"uid": 496,
"gid": 494,
"cwd": "/var/app/current",
"execPath": "/node-install/node-v10.18.1-linux-x64/bin/node",
"version": "v10.18.1",
"argv": [
"/node-install/node-v10.18.1-linux-x64/bin/node",
"/var/app/current/packages/app/lib/server.js"
],
"memoryUsage": {
"rss": 217214976,
"heapTotal": 111587328,
"heapUsed": 85340080,
"external": 62857751
}
},
"os": {
"loadavg": [
0,
0,
0
],
"uptime": 89121
},
"trace": [
{
"column": 23,
"file": "_tls_wrap.js",
"function": "TLSWrap.close",
"line": 395,
"method": "close",
"native": false
},
{
"column": 18,
"file": "net.js",
"function": "TLSSocket.Socket._destroy",
"line": 605,
"method": "_destroy",
"native": false
},
{
"column": 8,
"file": "internal/streams/destroy.js",
"function": "TLSSocket.destroy",
"line": 37,
"method": "destroy",
"native": false
},
{
"column": 20,
"file": "events.js",
"function": "Object.onceWrapper",
"line": 286,
"method": "onceWrapper",
"native": false
},
{
"column": 13,
"file": "events.js",
"function": "TLSSocket.emit",
"line": 198,
"method": "emit",
"native": false
},
{
"column": 23,
"file": "domain.js",
"function": "TLSSocket.EventEmitter.emit",
"line": 466,
"method": "emit",
"native": false
},
{
"column": 14,
"file": "_stream_writable.js",
"function": "finishMaybe",
"line": 646,
"method": null,
"native": false
},
{
"column": 5,
"file": "_stream_writable.js",
"function": "ShutdownWrap.stream._final [as callback]",
"line": 624,
"method": "_final [as callback]",
"native": false
},
{
"column": 8,
"file": "net.js",
"function": "ShutdownWrap.afterShutdown [as oncomplete]",
"line": 374,
"method": "afterShutdown [as oncomplete]",
"native": false
}
],
"stack": [
"TypeError: this._parent.close is not a function",
" at TLSWrap.close (_tls_wrap.js:395:23)",
" at TLSSocket.Socket._destroy (net.js:605:18)",
" at TLSSocket.destroy (internal/streams/destroy.js:37:8)",
" at Object.onceWrapper (events.js:286:20)",
" at TLSSocket.emit (events.js:198:13)",
" at TLSSocket.EventEmitter.emit (domain.js:466:23)",
" at finishMaybe (_stream_writable.js:646:14)",
" at ShutdownWrap.stream._final [as callback] (_stream_writable.js:624:5)",
" at ShutdownWrap.afterShutdown [as oncomplete] (net.js:374:8)"
],
"level": "error",
"message": "uncaughtException: this._parent.close is not a function"
}
@omgaz have you been able to reproduce this reliably? could you post a script to reproduce?
@omgaz have you been able to reproduce this reliably? could you post a script to reproduce?
Not yet, it's proving taxing tracking down what could potentially be trying to end this connection. I'll update when/if I can repro the script.
I've also been seeing a lot of this, with HTTP/2 traffic on both node v10.15.1 and v12.8.3, but I've just been testing v14.6.0 and it doesn't seem to reproduce at all there.
I haven't been able to create an isolated example for this, but believe I'm getting this when using custom certificate authorities.
I've debated whether this belongs in node/help, but due to potential of being a node bug I've decided to post here (apologize in advance if it turns out otherwise).
We have a crash happening on our production servers a few times a day (especially during high load times):
I was unable to see any possible code path that could cause
this._handle
to be null before the call todoClose
as it's only triggered by callingthis._handle.close()
. Async paths printing would have come very useful here, however, this is node 10.x.I've done grep on all our code as well as all node-modules for anything that could relate to 'games' with
"_handle = "
ordoClose
, nothing pop'ed up.I've put DEBUG_NODE=net,stream_wrap on one of the servers to avoid overall network degradation (waiting for crash to re-occur).
Note: server is a large traffic networking proxy application involving http/s/2.
I would like to give more info (specifically
setImmediate
async callstack), however, I'm unsure how to continue, would appreciate any advice.