Open Gabba90 opened 1 month ago
Interestingly we just had all of our production services crash all at once, running on AWS EC2 instances with Node v20 and v22, all at once, all with the same error.
The other information as provided by Gabba holds true for us, but it out of the blue affected everything all at once after all our AWS k8s nodes got restarted by AWS all at the same time.
We're still investigating the cause and solution, but the timing of this issue being created just a few hours ago seems rather suspiciously coincided with our own, makes me wonder what's going on here.
We also run Java services and they were also affected by this, so I don't believe this to be an issue with NodeJS or V8 itself.
The Image where our issue started v1.30.4-eks-16b398d
The Java error was less verbose but looks like
Error occurred during initialization of VM
Failed to mark memory page as executable - check if grsecurity/PaX is enabled
NodeJS Error
#
# Fatal error in , line 0
# Check failed: 12 == (*__errno_location ()).
#
#
#
#FailureMessage Object: 0xffffee1205a0
1: 0xceb064 [node]
2: 0x1f43eb0 V8_Fatal(char const*, ...) [node]
3: 0x1f4e5e8 v8::base::OS::SetPermissions(void*, unsigned long, v8::base::OS::MemoryPermission) [node]
4: 0x10e1974 v8::internal::MemoryAllocator::SetPermissionsOnExecutableMemoryChunk(v8::internal::VirtualMemory*, unsigned long, unsigned long, unsigned long) [node]
5: 0x10e1cb4 v8::internal::MemoryAllocator::AllocateAlignedMemory(unsigned long, unsigned long, unsigned long, v8::internal::AllocationSpace, v8::internal::Executability, void*, v8::internal::VirtualMemory*) [node]
6: 0x10e1eb8 v8::internal::MemoryAllocator::AllocateUninitializedChunkAt(v8::internal::BaseSpace*, unsigned long, v8::internal::Executability, unsigned long, v8::internal::PageSize) [node]
7: 0x10e2488 v8::internal::MemoryAllocator::AllocatePage(v8::internal::MemoryAllocator::AllocationMode, v8::internal::Space*, v8::internal::Executability) [node]
8: 0x10f6e78 v8::internal::PagedSpaceBase::TryExpandImpl() [node]
9: 0x10f98c0 [node]
10: 0x10f9e54 v8::internal::PagedSpaceBase::RefillLabMain(int, v8::internal::AllocationOrigin) [node]
11: 0x1070988 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
12: 0x1050230 v8::internal::Factory::CodeBuilder::AllocateInstructionStream(bool) [node]
13: 0x1050604 v8::internal::Factory::CodeBuilder::BuildInternal(bool) [node]
14: 0xed068c v8::internal::baseline::BaselineCompiler::Build(v8::internal::LocalIsolate*) [node]
15: 0xee2a04 v8::internal::GenerateBaselineCode(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SharedFunctionInfo>) [node]
16: 0xf3c1b0 v8::internal::Compiler::CompileSharedWithBaseline(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SharedFunctionInfo>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [node]
17: 0xf3c734 v8::internal::Compiler::CompileBaseline(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [node]
18: 0xece3cc v8::internal::baseline::BaselineBatchCompiler::CompileBatch(v8::internal::Handle<v8::internal::JSFunction>) [node]
19: 0xf4708c v8::internal::Compiler::Compile(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [node]
20: 0x14587a8 v8::internal::Runtime_CompileLazy(int, unsigned long*, v8::internal::Isolate*) [node]
21: 0x1862a84 [node]
facing the same issue with all the node pods.
Interestingly we just had all of our production services crash all at once, running on AWS EC2 instances with Node v20 and v22, all at once, all with the same error.
The other information as provided by Gabba holds true for us, but it out of the blue affected everything all at once after all our AWS k8s nodes got restarted by AWS all at the same time.
We're still investigating the cause and solution, but the timing of this issue being created just a few hours ago seems rather suspiciously coincided with our own, makes me wonder what's going on here.
We also run Java services and they were also affected by this, so I don't believe this to be an issue with NodeJS or V8 itself.
The Image where our issue started
v1.30.4-eks-16b398d
The Java error was less verbose but looks like
Error occurred during initialization of VM Failed to mark memory page as executable - check if grsecurity/PaX is enabled
NodeJS Error
# # Fatal error in , line 0 # Check failed: 12 == (*__errno_location ()). # # # #FailureMessage Object: 0xffffee1205a0 1: 0xceb064 [node] 2: 0x1f43eb0 V8_Fatal(char const*, ...) [node] 3: 0x1f4e5e8 v8::base::OS::SetPermissions(void*, unsigned long, v8::base::OS::MemoryPermission) [node] 4: 0x10e1974 v8::internal::MemoryAllocator::SetPermissionsOnExecutableMemoryChunk(v8::internal::VirtualMemory*, unsigned long, unsigned long, unsigned long) [node] 5: 0x10e1cb4 v8::internal::MemoryAllocator::AllocateAlignedMemory(unsigned long, unsigned long, unsigned long, v8::internal::AllocationSpace, v8::internal::Executability, void*, v8::internal::VirtualMemory*) [node] 6: 0x10e1eb8 v8::internal::MemoryAllocator::AllocateUninitializedChunkAt(v8::internal::BaseSpace*, unsigned long, v8::internal::Executability, unsigned long, v8::internal::PageSize) [node] 7: 0x10e2488 v8::internal::MemoryAllocator::AllocatePage(v8::internal::MemoryAllocator::AllocationMode, v8::internal::Space*, v8::internal::Executability) [node] 8: 0x10f6e78 v8::internal::PagedSpaceBase::TryExpandImpl() [node] 9: 0x10f98c0 [node] 10: 0x10f9e54 v8::internal::PagedSpaceBase::RefillLabMain(int, v8::internal::AllocationOrigin) [node] 11: 0x1070988 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node] 12: 0x1050230 v8::internal::Factory::CodeBuilder::AllocateInstructionStream(bool) [node] 13: 0x1050604 v8::internal::Factory::CodeBuilder::BuildInternal(bool) [node] 14: 0xed068c v8::internal::baseline::BaselineCompiler::Build(v8::internal::LocalIsolate*) [node] 15: 0xee2a04 v8::internal::GenerateBaselineCode(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SharedFunctionInfo>) [node] 16: 0xf3c1b0 v8::internal::Compiler::CompileSharedWithBaseline(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SharedFunctionInfo>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [node] 17: 0xf3c734 v8::internal::Compiler::CompileBaseline(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [node] 18: 0xece3cc v8::internal::baseline::BaselineBatchCompiler::CompileBatch(v8::internal::Handle<v8::internal::JSFunction>) [node] 19: 0xf4708c v8::internal::Compiler::Compile(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [node] 20: 0x14587a8 v8::internal::Runtime_CompileLazy(int, unsigned long*, v8::internal::Isolate*) [node] 21: 0x1862a84 [node]
I am also facing the same issue with all the node pods. I am on v1.30.4-eks-16b398d
too
Hi! This appears to be a duplicate of https://github.com/nodejs/help/issues/4465. Is that not the case?
Hi! This appears to be a duplicate of nodejs/help#4465. Is that not the case?
It could be considered a duplicate, but I think this issue can be considered valuable from a discoverability POV, I do not believe this is an issue with NodeJS in any way but rather an EKS image release that just hit AWS, since Java services we run are also affected.
I think this is the appropriate place for the ticket https://github.com/aws/eks-distro/issues/3370
This being said, it's really hard to say what the exact root cause is right now
We have moved off Bottlerocket and onto AL2 in order to work around this.
Our nodes were running the image: v1.30.4-eks-16b398d
when things went bad.
Interlinking for future discoverability https://github.com/bottlerocket-os/bottlerocket/issues/4260#issuecomment-2434318801
relevant excerpt from strace log:
1740 mprotect(0x84c0000, 536870912, PROT_READ|PROT_WRITE|PROT_EXEC) = -1 EACCES (Permission denied)
1740 write(2, "\n\n#\n# Fatal error in , line 0\n# ", 32) = 32
1740 write(2, "Check failed: 12 == (*__errno_lo"..., 43) = 43
1740 write(2, "\n#\n#\n#\n#FailureMessage Object: 0"..., 45) = 45
1740 write(2, "\n", 1) = 1
1740 write(2, "----- Native stack trace -----\n\n", 32) = 32
1740 futex(0x7fde3fb9b1f0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
1740 write(2, " 1: 0x107e621 [node]\n", 22) = 22
1740 write(2, " 2: 0x2aba423 V8_Fatal(char cons"..., 48) = 48
1740 write(2, " 3: 0x2ac5066 v8::base::OS::SetP"..., 104) = 104
1740 write(2, " 4: 0x14c1bfc v8::internal::Code"..., 97) = 97
1740 write(2, " 5: 0x155982f v8::internal::Heap"..., 73) = 73
1740 write(2, " 6: 0x149ac92 v8::internal::Isol"..., 142) = 142
1740 write(2, " 7: 0x19ee994 v8::internal::Snap"..., 80) = 80
1740 write(2, " 8: 0x1315af6 v8::Isolate::Initi"..., 93) = 93
1740 write(2, " 9: 0xed9a18 node::NewIsolate(v8"..., 163) = 163
1740 write(2, "10: 0x1043a6d node::NodeMainInst"..., 530) = 530
1740 write(2, "11: 0xf95806 node::Start(int, ch"..., 45) = 45
1740 write(2, "12: 0x7fde3f9bb24a [/lib/x86_64"..., 54) = 54
1740 write(2, "13: 0x7fde3f9bb305 __libc_start_"..., 71) = 71
1740 write(2, "14: 0xecff4e _start [node]\n", 27) = 27
1740 --- SIGTRAP {si_signo=SIGTRAP, si_code=SI_KERNEL, si_addr=NULL} ---
so clearly, the memory protection call did not pass through. this is not a node.js issue, some lower level framework is modifying the memory access / protection for the process, and node.js cannot function with that level of memory attributes.
Yeah, this is almost certainly the MemoryDenyWriteExecute systemd setting, it enforces W^X for memory pages. Good for random apps but node.js and probably most JIT environments aren't compatible with that out of the box.
Running with --jitless
or --noconcurrent_sparkplug
may help.
Hey everyone, this is a reminder that "me too" comments only add more noise to this already noisy issue. Please refrain from commenting unless you have something to add to the conversation
@RedYetiDev - which comment you are referring to as "me too" comment?
This is more of a general statement, I'm not directing this at anyone specific
then pls refrain from making general statements without reason. it confused me.
Version
v23.0.0
Platform
Subsystem
No response
What steps will reproduce the bug?
By repeatedly running Node, as simply installing dependencies of a project through the command
node $(which npm) install
in a while loop (see the script used for testing here).For instance:
How often does it reproduce? Is there a required condition?
Very often on specific platforms such as the one shown above.
What is the expected behavior? Why is that the expected behavior?
Node should not fail and crash.
What do you see instead?
At same point Node crashes giving the following error, extracted from here:
Additional information
Similar issue https://github.com/nodejs/help/issues/4465.
The output of
strace
can be found here.