Fatal error on setting memory permissions (`Fatal error... Check failed: 12 == (*__errno_location ())`)

Gabba90 commented 1 month ago

Version

v23.0.0

Platform

Linux ... 6.6.13-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 20 18:03:28 UTC 2024 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

By repeatedly running Node, as simply installing dependencies of a project through the command node $(which npm) install in a while loop (see the script used for testing here).

For instance:

RETRIES=100
COUNT=0
NPM=$(which npm)

while [ $COUNT -lt $RETRIES ]; do
  node $NPM install
  COUNT=$((COUNT + 1))
done

How often does it reproduce? Is there a required condition?

Very often on specific platforms such as the one shown above.

What is the expected behavior? Why is that the expected behavior?

Node should not fail and crash.

What do you see instead?

At same point Node crashes giving the following error, extracted from here:

# Fatal error in , line 0
# Check failed: 12 == (*__errno_location ()).
#
#
#
#FailureMessage Object: 0x7ffc54684900
----- Native stack trace -----
1: 0x107e621  [node]
 2: 0x2aba423 V8_Fatal(char const*, ...) [node]
 3: 0x2ac5066 v8::base::OS::SetPermissions(void*, unsigned long, v8::base::OS::MemoryPermission) [node]
 4: 0x14c1bfc v8::internal::CodeRange::InitReservation(v8::PageAllocator*, unsigned long) [node]
 5: 0x155982f v8::internal::Heap::SetUp(v8::internal::LocalHeap*) [node]
 6: 0x149ac92 v8::internal::Isolate::Init(v8::internal::SnapshotData*, v8::internal::SnapshotData*, v8::internal::SnapshotData*, bool) [node]
 7: 0x19ee994 v8::internal::Snapshot::Initialize(v8::internal::Isolate*) [node]
 8: 0x1315af6 v8::Isolate::Initialize(v8::Isolate*, v8::Isolate::CreateParams const&) [node]
 9: 0xed9a18 node::NewIsolate(v8::Isolate::CreateParams*, uv_loop_s*, node::MultiIsolatePlatform*, node::SnapshotData const*, node::IsolateSettings const&) [node]
10: 0x1043a6d node::NodeMainInstance::NodeMainInstance(node::SnapshotData const*, uv_loop_s*, node::MultiIsolatePlatform*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) [node]
11: 0xf95806 node::Start(int, char**) [node]
12: 0x7feb3293a24a  [/lib/x86_64-linux-gnu/libc.so.6]
13: 0x7feb3293a305 __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
14: 0xecff4e _start [node]
/scripts-195672-45065031/step_script: line 172:   884 Trace/breakpoint trap   (core dumped) node $NPM install

Additional information

The output of strace can be found here.

RareBodhi commented 1 month ago

Interestingly we just had all of our production services crash all at once, running on AWS EC2 instances with Node v20 and v22, all at once, all with the same error.

The other information as provided by Gabba holds true for us, but it out of the blue affected everything all at once after all our AWS k8s nodes got restarted by AWS all at the same time.

We're still investigating the cause and solution, but the timing of this issue being created just a few hours ago seems rather suspiciously coincided with our own, makes me wonder what's going on here.

We also run Java services and they were also affected by this, so I don't believe this to be an issue with NodeJS or V8 itself.

The Image where our issue started v1.30.4-eks-16b398d

The Java error was less verbose but looks like

Error occurred during initialization of VM
Failed to mark memory page as executable - check if grsecurity/PaX is enabled

NodeJS Error

#
# Fatal error in , line 0
# Check failed: 12 == (*__errno_location ()).
#
#
#
#FailureMessage Object: 0xffffee1205a0
 1: 0xceb064  [node]
 2: 0x1f43eb0 V8_Fatal(char const*, ...) [node]
 3: 0x1f4e5e8 v8::base::OS::SetPermissions(void*, unsigned long, v8::base::OS::MemoryPermission) [node]
 4: 0x10e1974 v8::internal::MemoryAllocator::SetPermissionsOnExecutableMemoryChunk(v8::internal::VirtualMemory*, unsigned long, unsigned long, unsigned long) [node]
 5: 0x10e1cb4 v8::internal::MemoryAllocator::AllocateAlignedMemory(unsigned long, unsigned long, unsigned long, v8::internal::AllocationSpace, v8::internal::Executability, void*, v8::internal::VirtualMemory*) [node]
 6: 0x10e1eb8 v8::internal::MemoryAllocator::AllocateUninitializedChunkAt(v8::internal::BaseSpace*, unsigned long, v8::internal::Executability, unsigned long, v8::internal::PageSize) [node]
 7: 0x10e2488 v8::internal::MemoryAllocator::AllocatePage(v8::internal::MemoryAllocator::AllocationMode, v8::internal::Space*, v8::internal::Executability) [node]
 8: 0x10f6e78 v8::internal::PagedSpaceBase::TryExpandImpl() [node]
 9: 0x10f98c0  [node]
10: 0x10f9e54 v8::internal::PagedSpaceBase::RefillLabMain(int, v8::internal::AllocationOrigin) [node]
11: 0x1070988 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
12: 0x1050230 v8::internal::Factory::CodeBuilder::AllocateInstructionStream(bool) [node]
13: 0x1050604 v8::internal::Factory::CodeBuilder::BuildInternal(bool) [node]
14: 0xed068c v8::internal::baseline::BaselineCompiler::Build(v8::internal::LocalIsolate*) [node]
15: 0xee2a04 v8::internal::GenerateBaselineCode(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SharedFunctionInfo>) [node]
16: 0xf3c1b0 v8::internal::Compiler::CompileSharedWithBaseline(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SharedFunctionInfo>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [node]
17: 0xf3c734 v8::internal::Compiler::CompileBaseline(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [node]
18: 0xece3cc v8::internal::baseline::BaselineBatchCompiler::CompileBatch(v8::internal::Handle<v8::internal::JSFunction>) [node]
19: 0xf4708c v8::internal::Compiler::Compile(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [node]
20: 0x14587a8 v8::internal::Runtime_CompileLazy(int, unsigned long*, v8::internal::Isolate*) [node]
21: 0x1862a84  [node]

anmol-nura commented 1 month ago

facing the same issue with all the node pods.

Interestingly we just had all of our production services crash all at once, running on AWS EC2 instances with Node v20 and v22, all at once, all with the same error.

The other information as provided by Gabba holds true for us, but it out of the blue affected everything all at once after all our AWS k8s nodes got restarted by AWS all at the same time.

We're still investigating the cause and solution, but the timing of this issue being created just a few hours ago seems rather suspiciously coincided with our own, makes me wonder what's going on here.

We also run Java services and they were also affected by this, so I don't believe this to be an issue with NodeJS or V8 itself.

The Image where our issue started v1.30.4-eks-16b398d

The Java error was less verbose but looks like

Error occurred during initialization of VM
Failed to mark memory page as executable - check if grsecurity/PaX is enabled

NodeJS Error

#
# Fatal error in , line 0
# Check failed: 12 == (*__errno_location ()).
#
#
#
#FailureMessage Object: 0xffffee1205a0
 1: 0xceb064  [node]
 2: 0x1f43eb0 V8_Fatal(char const*, ...) [node]
 3: 0x1f4e5e8 v8::base::OS::SetPermissions(void*, unsigned long, v8::base::OS::MemoryPermission) [node]
 4: 0x10e1974 v8::internal::MemoryAllocator::SetPermissionsOnExecutableMemoryChunk(v8::internal::VirtualMemory*, unsigned long, unsigned long, unsigned long) [node]
 5: 0x10e1cb4 v8::internal::MemoryAllocator::AllocateAlignedMemory(unsigned long, unsigned long, unsigned long, v8::internal::AllocationSpace, v8::internal::Executability, void*, v8::internal::VirtualMemory*) [node]
 6: 0x10e1eb8 v8::internal::MemoryAllocator::AllocateUninitializedChunkAt(v8::internal::BaseSpace*, unsigned long, v8::internal::Executability, unsigned long, v8::internal::PageSize) [node]
 7: 0x10e2488 v8::internal::MemoryAllocator::AllocatePage(v8::internal::MemoryAllocator::AllocationMode, v8::internal::Space*, v8::internal::Executability) [node]
 8: 0x10f6e78 v8::internal::PagedSpaceBase::TryExpandImpl() [node]
 9: 0x10f98c0  [node]
10: 0x10f9e54 v8::internal::PagedSpaceBase::RefillLabMain(int, v8::internal::AllocationOrigin) [node]
11: 0x1070988 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
12: 0x1050230 v8::internal::Factory::CodeBuilder::AllocateInstructionStream(bool) [node]
13: 0x1050604 v8::internal::Factory::CodeBuilder::BuildInternal(bool) [node]
14: 0xed068c v8::internal::baseline::BaselineCompiler::Build(v8::internal::LocalIsolate*) [node]
15: 0xee2a04 v8::internal::GenerateBaselineCode(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SharedFunctionInfo>) [node]
16: 0xf3c1b0 v8::internal::Compiler::CompileSharedWithBaseline(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SharedFunctionInfo>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [node]
17: 0xf3c734 v8::internal::Compiler::CompileBaseline(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [node]
18: 0xece3cc v8::internal::baseline::BaselineBatchCompiler::CompileBatch(v8::internal::Handle<v8::internal::JSFunction>) [node]
19: 0xf4708c v8::internal::Compiler::Compile(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Compiler::ClearExceptionFlag, v8::internal::IsCompiledScope*) [node]
20: 0x14587a8 v8::internal::Runtime_CompileLazy(int, unsigned long*, v8::internal::Isolate*) [node]
21: 0x1862a84  [node]

I am also facing the same issue with all the node pods. I am on v1.30.4-eks-16b398d too

RedYetiDev commented 1 month ago

Hi! This appears to be a duplicate of https://github.com/nodejs/help/issues/4465. Is that not the case?

RareBodhi commented 1 month ago

Hi! This appears to be a duplicate of nodejs/help#4465. Is that not the case?

It could be considered a duplicate, but I think this issue can be considered valuable from a discoverability POV, I do not believe this is an issue with NodeJS in any way but rather an EKS image release that just hit AWS, since Java services we run are also affected.

I think this is the appropriate place for the ticket https://github.com/aws/eks-distro/issues/3370

This being said, it's really hard to say what the exact root cause is right now

rarecrumb commented 1 month ago

We have moved off Bottlerocket and onto AL2 in order to work around this.

Our nodes were running the image: v1.30.4-eks-16b398d when things went bad.

RareBodhi commented 1 month ago

Interlinking for future discoverability https://github.com/bottlerocket-os/bottlerocket/issues/4260#issuecomment-2434318801

gireeshpunathil commented 1 month ago

relevant excerpt from strace log:

1740  mprotect(0x84c0000, 536870912, PROT_READ|PROT_WRITE|PROT_EXEC) = -1 EACCES (Permission denied)
1740  write(2, "\n\n#\n# Fatal error in , line 0\n# ", 32) = 32
1740  write(2, "Check failed: 12 == (*__errno_lo"..., 43) = 43
1740  write(2, "\n#\n#\n#\n#FailureMessage Object: 0"..., 45) = 45
1740  write(2, "\n", 1)                 = 1
1740  write(2, "----- Native stack trace -----\n\n", 32) = 32
1740  futex(0x7fde3fb9b1f0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
1740  write(2, " 1: 0x107e621  [node]\n", 22) = 22
1740  write(2, " 2: 0x2aba423 V8_Fatal(char cons"..., 48) = 48
1740  write(2, " 3: 0x2ac5066 v8::base::OS::SetP"..., 104) = 104
1740  write(2, " 4: 0x14c1bfc v8::internal::Code"..., 97) = 97
1740  write(2, " 5: 0x155982f v8::internal::Heap"..., 73) = 73
1740  write(2, " 6: 0x149ac92 v8::internal::Isol"..., 142) = 142
1740  write(2, " 7: 0x19ee994 v8::internal::Snap"..., 80) = 80
1740  write(2, " 8: 0x1315af6 v8::Isolate::Initi"..., 93) = 93
1740  write(2, " 9: 0xed9a18 node::NewIsolate(v8"..., 163) = 163
1740  write(2, "10: 0x1043a6d node::NodeMainInst"..., 530) = 530
1740  write(2, "11: 0xf95806 node::Start(int, ch"..., 45) = 45
1740  write(2, "12: 0x7fde3f9bb24a  [/lib/x86_64"..., 54) = 54
1740  write(2, "13: 0x7fde3f9bb305 __libc_start_"..., 71) = 71
1740  write(2, "14: 0xecff4e _start [node]\n", 27) = 27
1740  --- SIGTRAP {si_signo=SIGTRAP, si_code=SI_KERNEL, si_addr=NULL} ---

so clearly, the memory protection call did not pass through. this is not a node.js issue, some lower level framework is modifying the memory access / protection for the process, and node.js cannot function with that level of memory attributes.

bnoordhuis commented 1 month ago

Yeah, this is almost certainly the MemoryDenyWriteExecute systemd setting, it enforces W^X for memory pages. Good for random apps but node.js and probably most JIT environments aren't compatible with that out of the box.

Running with --jitless or --noconcurrent_sparkplug may help.

RedYetiDev commented 1 month ago

Hey everyone, this is a reminder that "me too" comments only add more noise to this already noisy issue. Please refrain from commenting unless you have something to add to the conversation

gireeshpunathil commented 1 month ago

@RedYetiDev - which comment you are referring to as "me too" comment?

RedYetiDev commented 1 month ago

This is more of a general statement, I'm not directing this at anyone specific

gireeshpunathil commented 1 month ago

then pls refrain from making general statements without reason. it confused me.

nodejs / node