nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.5k stars 29.02k forks source link

OOM does not report JS stacktrace or it's minimal #33532

Closed zen0wu closed 1 year ago

zen0wu commented 4 years ago

What steps will reproduce the bug?

Here's the code

function oom() {
const objs = []
for (let i = 0; i < 1000000; i++) {
    objs.push({
        x: 1,
        y: "abc".repeat(i)
    })
}
}

function caller() {
  oom()
}

caller()

How often does it reproduce? Is there a required condition?

node --max-old-space-size=128 test.js

What is the expected behavior?

When the process OOMs, it should print a useful JS stacktrace.

What do you see instead?

Node 12.16.2

<--- Last few GCs --->
re[71186:0x110008000]      501 ms: Mark-sweep 123.7 (125.8) -> 123.5 (127.3) MB, 12.4 / 0.0 ms  (+ 71.2 ms in 33 steps since start of marking, biggest step 5.5 ms, walltime since start of marking 84 ms) (average mu = 0.195, current mu = 0.012) finalize incr[71186:0x110008000]      611 ms: Mark-sweep 125.8 (127.5) -> 125.8 (129.0) MB, 12.3 / 0.0 ms  (+ 93.4 ms in 34 steps since start of marking, biggest step 4.7 ms, walltime since start of marking 110 ms) (average mu = 0.113, current mu = 0.037) finalize inc

<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x10097d5b9]
Security context: 0x3ce0613008d1 <JSObject>
    1: oom [0x3ce013fb8d99] [/Users/zeno/Developer/XXX/Source/test.js:~1] [pc=0x386e93984c64](this=0x3ce013f803f9 <JSGlobal Object>)
    2: caller [0x3ce013fb8e11] [/Users/zeno/Developer/XXX/Source/test.js:12] [bytecode=0x3ce05a1541e9 offset=5](this=0x3ce013f803f9 <JSGlobal Object>)
    3: /* anonymous */ [0x3ce013fb8e51] [/Users/zeno/Developer/XXX/Sou...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x1010248bd node::Abort() (.cold.1) [/Users/zeno/.nvm/versions/node/v12.16.2/bin/node]
 2: 0x100084c4d node::FatalError(char const*, char const*) [/Users/zeno/.nvm/versions/node/v12.16.2/bin/node]
 3: 0x100084d8e node::OnFatalError(char const*, char const*) [/Users/zeno/.nvm/versions/node/v12.16.2/bin/node]
 4: 0x100186477 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/Users/zeno/.nvm/versions/node/v12.16.2/bin/node]
 5: 0x100186417 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/Users/zeno/.nvm/versions/node/v12.16.2/bin/node]
 6: 0x1003141c5 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/Users/zeno/.nvm/versions/node/v12.16.2/bin/node]
 7: 0x100315a3a v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [/Users/zeno/.nvm/versions/node/v12.16.2/bin/node]
 8: 0x10031246c v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/Users/zeno/.nvm/versions/node/v12.16.2/bin/node]
 9: 0x10031026e v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/Users/zeno/.nvm/versions/node/v12.16.2/bin/node]
10: 0x10030f2b1 v8::internal::Heap::HandleGCRequest() [/Users/zeno/.nvm/versions/node/v12.16.2/bin/node]
11: 0x1002d4551 v8::internal::StackGuard::HandleInterrupts() [/Users/zeno/.nvm/versions/node/v12.16.2/bin/node]
12: 0x10063e79c v8::internal::Runtime_StackGuard(int, unsigned long*, v8::internal::Isolate*) [/Users/zeno/.nvm/versions/node/v12.16.2/bin/node]
13: 0x10097d5b9 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit [/Users/zeno/.nvm/versions/node/v12.16.2/bin/node]
14: 0x386e93984c64

Node 14.3.0

<--- Last few GCs --->

[68996:0x108008000]      772 ms: Mark-sweep 127.8 (129.0) -> 126.4 (128.7) MB, 212.8 / 0.0 ms  (+ 0.0 ms in 83 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 224 ms) (average mu = 0.157, current mu = 0.097) allocation f[68996:0x108008000]      997 ms: Mark-sweep 127.4 (128.7) -> 127.3 (129.7) MB, 224.1 / 0.0 ms  (+ 0.0 ms in 15 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 225 ms) (average mu = 0.076, current mu = 0.005) allocation f

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x100bc99f3 node::Abort() (.cold.1) [/usr/local/bin/node]
 2: 0x100081835 node::FatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0x10008199e node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
 4: 0x100184f95 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 5: 0x100184f3f v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 6: 0x1002a4ff9 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/bin/node]
 7: 0x1002a635c v8::internal::Heap::MarkCompactPrologue() [/usr/local/bin/node]
 8: 0x1002a3d43 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/usr/local/bin/node]
 9: 0x1002a250e v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
10: 0x1002aa4d0 v8::internal::Heap::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
11: 0x1002aa526 v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
12: 0x100288579 v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/usr/local/bin/node]
13: 0x1004ebf2a v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/usr/local/bin/node]
14: 0x1007542b9 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit [/usr/local/bin/node]
15: 0x10075512d Builtins_StringAdd_CheckNone [/usr/local/bin/node]
16: 0x1007ab726 Builtins_StringRepeat [/usr/local/bin/node]

Additional information

As shown above, in Node 12, even though there's a stack trace from JS world, but it's very minimal because it only shows 3 frames, which is generally not too useful.

Since the process is going to abort, I don't think it's a big deal to extend the length of JS stack traces to provide more useful info. In fact I think the minimal stack trace is not really intentional.

With Node 14, it does not even show the stack trace anymore, which I believe should be a bug.

bnoordhuis commented 4 years ago

The JS stack trace was removed in v8/v8@be5dd77215df4963aa4d63f970d807c565f45be6 because collecting it can require memory when none is available.

I think it should be safe to re-enable it again when --noenable_lazy_source_positions is passed on the command line because then no additional memory is needed.

That flag has an adverse effect on runtime performance, however.

zen0wu commented 4 years ago

Oh it's an upstream change, good to know!

Regardless, I should we should fix it from the node side. Stack trace is very valuable when debugging hard issues, and also 3 lines is not super helpful.

Memory wise, it shouldn't be that big of a concern, especially in this case, since the process is going away. In my understanding this happens when node's heap limit is hit, but not the process is killed by the OOM killer, right? If it's the OOM killer, node doesn't even have a chance to print anything. So allocating like 32Kb of memory for the stack trace in this case shouldn't be that big of a deal.

bnoordhuis commented 4 years ago

Unfortunately, it's not nearly as simple as that. For example, an out-of-memory condition can also happen when V8 cannot mmap the memory it needs.

What's more, computing source positions means re-parsing the source code of the functions in the call stack, and that scales with the size of the functions, the number of variables, and so on.

Expectation management: I'm not going to work on this (no time) but I'm leaving this open in case someone else wants to, or wants to take this up with V8.

zen0wu commented 4 years ago

@bnoordhuis Sounds good, thanks for the clarification.

Unfortunately, it's not nearly as simple as that. For example, an out-of-memory condition can also happen when V8 cannot mmap the memory it needs.

That all makes perfect sense. What I was saying is we should make the best effort to print the stack trace if possible.

What's more, computing source positions means re-parsing the source code of the functions in the call stack, and that scales with the size of the functions, the number of variables, and so on.

Yeah that does sound very involved. Understandable.

Also I wonder if it's possible to just expand the size of stacktrace (to be more than 512 bytes) in node 12, that could be beneficial and sounds easy enough. Downside is it's not future proof.

gireeshpunathil commented 3 years ago

@zen0wu - the only use case (IMO) where the current stack trace is usable in an OOM site is when the current code creates a large object. In all other cases, the OOM is a result of cumulative allocations in the past, for which the present stack trace has nothing to offer. Am I missing anything?

gregmartyn commented 3 years ago

I'd imagine the stack trace's usefulness would be correlated with how bad the memory leak is. The stack trace would be useful if the code is allocating a large object, yes, (100% useful) but also if it is allocating lots of tiny objects in a loop, (100%) and it'd also be useful probabilistically: if the offender is doing 75% of the allocations, there's a 75% chance the stack trace would be useful.

bnoordhuis commented 1 year ago

I'm going to close this. There's been no movement in the last 1.5 years and it's an upstream issue so there isn't anything actionable.