Open zzz08900 opened 3 years ago
The same also happens for our dcfjs v2(https://github.com/dcfjs/dcf2) We were not able to find a easy way to reproduce the problem, the crash is only observed under heavy computation load, i.e. 40 minutes or more on 4 CPU cores.
But if any additional information related to dcf/dcf2 is needed, please let me know and I'll provide as much info as I can.
Node.js 15.x has a newer V8 than 14.x etc. Any chance you could test with 15.x and see if the problem persists?
@nodejs/v8
Hi, looks like this could be a missing write barrier. V8 has some command line flags, which might help reproducing this bug: --stress-incremental-marking
, --stress-scavenge
and/or --verify-heap
. Also try to reproduce this on the latest version of Node/V8, it could be that this is already fixed.
Node.js 15.x has a newer V8 than 14.x etc. Any chance you could test with 15.x and see if the problem persists?
@nodejs/v8
Thanks for the heads up, I'll be trying it out.
Just ran some quick stress test (30 some minutes) with NodeJS 15.4.0. No crash was observed :) I'll be setting up a more serious torture test and see if everything is really holding up together.
Nope, the problem persists. But in most cases NodeJS 15.4.0 fails straight with signal 11/code 139. I'm building a debug build of NodeJS 15.4.0 now and will post error message later.
Hi, did you try to add some of the flags mentioned above to reproduce the crash faster? As soon as we have a reliable and fast way to reproduce the bug locally, I can try to start investigating it.
Hi, did you try to add some of the flags mentioned above to reproduce the crash faster? As soon as we have a reliable and fast way to reproduce the bug locally, I can try to start investigating it.
Thanks in advance, I'll tweak with them later. I was hoping to get the whole thing sorted out with an upgrade to NodeJS 15 :(
Hi, we just found triggering GC manually every once a while (almost) fixed the crash on nodeJS 14.15.1. Is there any way of knowing which line of JS code was being executed just before nodeJS crash?
Hi, we just found triggering GC manually every once a while (almost) fixed the crash on nodeJS 14.15.1. Is there any way of knowing which line of JS code was being executed just before nodeJS crash?
/ping @nodejs/diagnostics
@zzz08900 You can try setting up your system so that a core dump is produced when the process crashes, and use https://github.com/nodejs/llnode to load the core dump along with the Node.js executable to get the JS stack trace at the time of the crash.
@zzz08900 I understand this thread has not been active for a while. I hope you have tried using the latest node version, which might have addressed the issue.
What steps will reproduce the bug?
We made this Distributed Computing Framework for Node.js(https://github.com/dcfjs/dcf) with nodeJS 8 and everything was fine. But while trying to upgrade to newer LTS versions of node, the worker process (the node process that is actually doing all the computation that involves reconstructing objects/functions from string) crashes under extensive computation load.
We tried a few versions from 10.xx, 12.xx and 14.xx respectively but all of them crashes with error messages related to the scavenger.
We tried a debug build of nodeJS 14.15.1, the debug build always crashes with the same error message: # Fatal error in ../deps/v8/src/heap/scavenger-inl.h, line 376 # Debug check failed: Heap::InFromPage(object).
The complete error message is attached below.
How often does it reproduce? Is there a required condition?
It depends. On nodeJS 14.15.1: It's more likely while running more worker than the number of CPU cores on the host machine, usually within 10 minutes. It's less likely while running worker process with parallel scavenger disabled and single-threaded GC (usually I can get away with 20 to 30 minutes under heavy computation load, but eventually one of the workers will crash).
What is the expected behavior?
Does not crash.
What do you see instead?
Full error message from nodeJS 14.15.1 debug build attached below:
Additional information
The test machine is using i7-7820HQ CPU and 32G of memory. The debug build didn't change any default compilation flag.