nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.66k stars 29.09k forks source link

parallel/test-worker-nearheaplimit-deadlock is flaky #50761

Open joyeecheung opened 10 months ago

joyeecheung commented 10 months ago

Refs: https://github.com/nodejs/reliability/issues/720

It seems the failure started about 3 days ago, around the time that https://github.com/nodejs/node/pull/48655 landed. I cannot reproduce it locally however.

One peculiar bit I noticed from the error logs:

16:17:07      1: 0x104c0fb6c node::Abort() [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node]
16:17:07      2: 0x104c0fd6c node::ModifyCodeGenerationFromStrings(v8::Local<v8::Context>, v8::Local<v8::Value>, bool) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node]
16:17:07      3: 0x104daf0fc v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node]
16:17:07      4: 0x104f8fb50 v8::internal::Heap::stack() [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node]
16:17:07      5: 0x104fab008 v8::internal::LocalFactory::NumberToStringCacheHash(v8::internal::Tagged<v8::internal::Smi>) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node]
See full stack trace ``` 16:17:07 not ok 3261 parallel/test-worker-nearheaplimit-deadlock 16:17:07 --- 16:17:07 duration_ms: 193.09600 16:17:07 severity: crashed 16:17:07 exitcode: -6 16:17:07 stack: |- 16:17:07 16:17:07 <--- Last few GCs ---> 16:17:07 16:17:07 [15425:0x128098000] 20 ms: Mark-Compact 1.7 (6.0) -> 1.7 (6.0) MB, 1.12 / 0.00 ms (average mu = 0.312, current mu = 0.000) background allocation failure; GC in old space requested 16:17:07 [15425:0x128098000] 21 ms: Mark-Compact 1.7 (6.0) -> 1.7 (6.0) MB, 1.42 / 0.00 ms (average mu = 0.208, current mu = 0.055) background allocation failure; GC in old space requested 16:17:07 16:17:07 16:17:07 <--- JS stacktrace ---> 16:17:07 16:17:07 FATAL ERROR: LocalHeap: allocation failed Allocation failed - JavaScript heap out of memory 16:17:07 ----- Native stack trace ----- 16:17:07 16:17:07 1: 0x104c0fb6c node::Abort() [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 2: 0x104c0fd6c node::ModifyCodeGenerationFromStrings(v8::Local, v8::Local, bool) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 3: 0x104daf0fc v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 4: 0x104f8fb50 v8::internal::Heap::stack() [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 5: 0x104fab008 v8::internal::LocalFactory::NumberToStringCacheHash(v8::internal::Tagged) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 6: 0x104f641c8 v8::internal::FactoryBase::NewByteArray(int, v8::internal::AllocationType) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 7: 0x104e01058 v8::internal::Handle v8::internal::baseline::BytecodeOffsetTableBuilder::ToBytecodeOffsetTable(v8::internal::LocalIsolate*) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 8: 0x104e00f2c v8::internal::baseline::BaselineCompiler::Build(v8::internal::LocalIsolate*) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 9: 0x104e17050 v8::internal::GenerateBaselineCode(v8::internal::Isolate*, v8::internal::Handle) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 10: 0x104e657e8 v8::internal::Compiler::CompileSharedWithBaseline(v8::internal::Isolate*, v8::internal::Handle, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 11: 0x104e65a9c v8::internal::Compiler::CompileBaseline(v8::internal::Isolate*, v8::internal::Handle, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 12: 0x104dfaf3c v8::internal::baseline::BaselineBatchCompiler::EnqueueFunction(v8::internal::Handle) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 13: 0x104f1ce68 v8::internal::TieringManager::OnInterruptTick(v8::internal::Handle, v8::internal::CodeKind) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 14: 0x105338bd0 v8::internal::(anonymous namespace)::BytecodeBudgetInterruptWithStackCheck(v8::internal::Isolate*, v8::internal::Arguments<(v8::internal::ArgumentsType)0>&, v8::internal::CodeKind) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 15: 0x105337694 v8::internal::Runtime_BytecodeBudgetInterruptWithStackCheck_Ignition(int, unsigned long*, v8::internal::Isolate*) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 16: 0x1056d3954 Builtins_CEntry_Return1_ArgvOnStack_NoBuiltinExit [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 17: 0x1057aa5c4 Builtins_JumpLoopHandler [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 18: 0x105648b84 Builtins_InterpreterEntryTrampoline [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 19: 0x15000bb84 16:17:07 20: 0x15000ce8c 16:17:07 21: 0x105648b84 Builtins_InterpreterEntryTrampoline [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 22: 0x105648b84 Builtins_InterpreterEntryTrampoline [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 23: 0x1056468ac Builtins_JSEntryTrampoline [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 24: 0x105646594 Builtins_JSEntry [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 25: 0x104ef1740 v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 26: 0x104ef0cc0 v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle, v8::internal::Handle, int, v8::internal::Handle*) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 27: 0x104dc8b28 v8::Function::Call(v8::Local, v8::Local, int, v8::Local*) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 28: 0x104bf2a50 node::builtins::BuiltinLoader::CompileAndCall(v8::Local, char const*, node::Realm*) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 29: 0x104c822d4 node::PrincipalRealm::BootstrapRealm() [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 30: 0x104c80298 node::Realm::RunBootstrapping() [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 31: 0x104b40d98 node::CreateEnvironment(node::IsolateData*, v8::Local, std::__1::vector, std::__1::allocator >, std::__1::allocator, std::__1::allocator > > > const&, std::__1::vector, std::__1::allocator >, std::__1::allocator, std::__1::allocator > > > const&, node::EnvironmentFlags::Flags, node::ThreadId, std::__1::unique_ptr >) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 32: 0x104cc486c node::worker::Worker::Run() [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 33: 0x104cc7f50 node::worker::Worker::StartThread(v8::FunctionCallbackInfo const&)::$_3::__invoke(void*) [/Users/iojs/build/workspace/node-test-commit-osx-arm/nodes/osx11/out/Release/node] 16:17:07 34: 0x181467878 _pthread_start [/usr/lib/system/libsystem_pthread.dylib] 16:17:07 35: 0x1814625e0 thread_start [/usr/lib/system/libsystem_pthread.dylib] 16:17:07 ... ```

It seems strange that the ModifyCodeGenerationFromStrings callback can get called from FatalProcessOutOfMemory. Even if this gets called for OOM somehow, that shouldn't do much unless source maps are enabled, in which case it would call into JavaScript and that's definitely going to crash in OOM.

legendecas commented 10 months ago

I think the incorrect backtrace is a problem related to calling backtrace in a [[noreturn]] function: https://github.com/nodejs/node/pull/50849. As a matter of fact, FatalProcessOutOfMemory didn’t call ModifyCodeGenerationFromStringCallback after all.

mhdawson commented 10 months ago

again - https://ci.nodejs.org/job/node-test-commit-osx-arm/nodes=osx11/14638/

joyeecheung commented 6 months ago

This still shows up in the CI occasionally. I happened to get a stack trace locally:

FATAL ERROR: LocalHeap: allocation failed Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0x100bd9dec node::OOMErrorHandler(char const*, v8::OOMDetails const&) [/Users/joyee/projects/node/out/Release/node]
 2: 0x100d9b028 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/Users/joyee/projects/node/out/Release/node]
 3: 0x100f92f0c v8::internal::Heap::stack() [/Users/joyee/projects/node/out/Release/node]
 4: 0x100fae760 v8::internal::LocalFactory::NumberToStringCacheHash(v8::internal::Tagged<v8::internal::Smi>) [/Users/joyee/projects/node/out/Release/node]
 5: 0x100f66da8 v8::internal::FactoryBase<v8::internal::LocalFactory>::NewByteArray(int, v8::internal::AllocationType) [/Users/joyee/projects/node/out/Release/node]
 6: 0x100df091c v8::internal::Handle<v8::internal::ByteArray> v8::internal::baseline::BytecodeOffsetTableBuilder::ToBytecodeOffsetTable<v8::internal::LocalIsolate>(v8::internal::LocalIsolate*) [/Users/joyee/projects/node/out/Release/node]
 7: 0x100df07fc v8::internal::baseline::BaselineCompiler::Build(v8::internal::LocalIsolate*) [/Users/joyee/projects/node/out/Release/node]
 8: 0x100e06588 v8::internal::GenerateBaselineCode(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SharedFunctionInfo>) [/Users/joyee/projects/node/out/Release/node]
 9: 0x100e5a450 v8::internal::Compiler::CompileSharedWithBaseline(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SharedFunctionInfo>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [/Users/joyee/projects/node/out/Release/node]
10: 0x100e5a740 v8::internal::Compiler::CompileBaseline(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [/Users/joyee/projects/node/out/Release/node]
11: 0x100dea938 v8::internal::baseline::BaselineBatchCompiler::EnqueueFunction(v8::internal::Handle<v8::internal::JSFunction>) [/Users/joyee/projects/node/out/Release/node]
12: 0x100f17c1c v8::internal::TieringManager::OnInterruptTick(v8::internal::Handle<v8::internal::JSFunction>, v8::internal::CodeKind) [/Users/joyee/projects/node/out/Release/node]
13: 0x10135496c v8::internal::(anonymous namespace)::BytecodeBudgetInterrupt(v8::internal::Isolate*, v8::internal::Arguments<(v8::internal::ArgumentsType)0>&, v8::internal::CodeKind) [/Users/joyee/projects/node/out/Release/node]
14: 0x101353150 v8::internal::Runtime_BytecodeBudgetInterrupt_Ignition(int, unsigned long*, v8::internal::Isolate*) [/Users/joyee/projects/node/out/Release/node]
15: 0x1017dc594 Builtins_CEntry_Return1_ArgvOnStack_NoBuiltinExit [/Users/joyee/projects/node/out/Release/node]
16: 0x1018bb764 Builtins_ReturnHandler [/Users/joyee/projects/node/out/Release/node]
17: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
18: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
19: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
20: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
21: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
22: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
23: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
24: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
25: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
26: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
27: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
28: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
29: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
30: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
31: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
32: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
33: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
34: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
35: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
36: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
37: 0x1017549b0 Builtins_InterpreterEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
38: 0x1017526cc Builtins_JSEntryTrampoline [/Users/joyee/projects/node/out/Release/node]
39: 0x1017523b4 Builtins_JSEntry [/Users/joyee/projects/node/out/Release/node]
40: 0x100eeb340 v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/Users/joyee/projects/node/out/Release/node]
41: 0x100eeac64 v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) [/Users/joyee/projects/node/out/Release/node]
42: 0x100db5038 v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) [/Users/joyee/projects/node/out/Release/node]
43: 0x100af58e8 node::InternalMakeCallback(node::Environment*, v8::Local<v8::Object>, v8::Local<v8::Object>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*, node::async_context) [/Users/joyee/projects/node/out/Release/node]
44: 0x100b0c4d4 node::AsyncWrap::MakeCallback(v8::Local<v8::Function>, int, v8::Local<v8::Value>*) [/Users/joyee/projects/node/out/Release/node]
45: 0x100c1d794 node::worker::MessagePort::OnMessage(node::worker::MessagePort::MessageProcessingMode) [/Users/joyee/projects/node/out/Release/node]
46: 0x101733780 uv__async_io [/Users/joyee/projects/node/out/Release/node]
47: 0x10174671c uv__io_poll [/Users/joyee/projects/node/out/Release/node]
48: 0x101733d18 uv_run [/Users/joyee/projects/node/out/Release/node]
49: 0x100af6048 node::SpinEventLoopInternal(node::Environment*) [/Users/joyee/projects/node/out/Release/node]
50: 0x100c98e1c node::worker::Worker::Run() [/Users/joyee/projects/node/out/Release/node]
51: 0x100c9c31c node::worker::Worker::StartThread(v8::FunctionCallbackInfo<v8::Value> const&)::$_3::__invoke(void*) [/Users/joyee/projects/node/out/Release/node]
52: 0x1ac1dbfa8 _pthread_start [/usr/lib/system/libsystem_pthread.dylib]
53: 0x1ac1d6da0 thread_start [/usr/lib/system/libsystem_pthread.dylib]
joyeecheung commented 6 months ago

The worker heap size before message port creation is roughly 2.2MB. I wonder if we should bump it up a little to avoid failures like this. The near heap limit callback is only invoked during GC, but if we just set the heap size to 0, chances are the heap limit is going to be hit straight away and V8 won't even be able to do a GC and will just go straight to OOM.

On the other hand we could probably also just...schedule a throw on the main thread in the OOM handler of the worker?

joyeecheung commented 6 months ago

Hmm actually not, because the OOM handler doesn't allow returning. I guess we could just bump the limit up a little, and/or ask V8 to not crash the entire process when it's only the V8 heap running out?

joyeecheung commented 6 months ago

I wonder if we should just allow the test to crash. In that case, V8 runs out of memory because it's performing some operation that forbids GC and therefore can't invoke our callbacks. Technically that means ERR_WORKER_OUT_OF_MEMORY is just not going to work 100% of the time, but for user heaps maybe it's less likely for the crash to occur.

joyeecheung commented 5 months ago

This may be potentially fixed by https://chromium-review.googlesource.com/c/v8/v8/+/5352288