nodejs / help

:sparkles: Need help with Node.js? File an Issue here. :rocket:
1.48k stars 283 forks source link

OOM - Segmentation fault (not ulimit, not cgroups, not max-space, not exhausted RAM) #4474

Open riverego opened 2 months ago

riverego commented 2 months ago

Node.js Version

v22.7.0 & previous

NPM Version

v10.8.2 & previous

Operating System

Linux ip-10-8-1-229 6.1.0-23-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1 (2024-07-15) x86_64 GNU/Linux

Subsystem

Other

Description

The code works as expected on my own computer : it crashes when max-old-space is reached around 32G...

But on cloud VMs (of Outscale) it always runs OOM around 20G.

The problem happens on all images that I have tested : Debian12, Debian 11 & Ubuntu 20 (outscale out of the box images) with same result on 128 and 64Go of RAM Vms and all tested node versions (22, 20 & 16)

$ cat /proc/<pid>/limits
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             257180               257180               processes
Max open files            1048576              1048576              files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       257180               257180               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

I checked ulimits, cgroups (even if cgroups kills a process with oom reaper, it doesn't throws a segfault), I found nothing... I tried to put 50G fixed value on ulimits to see if unlimited hides a low default value and it's the same.

I looked with /proc/sys/vm/overcommit_memory 0,1,2 values and its the same. I tried to recompile nodejs on the VM.... Same.... I exhausted ChatGPT ideas....

I thought maybe this is a host limit applied on processes, so I tried this :

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc,char* argv[]){
        size_t oneG=1024*1048576;
        size_t maxMem=17*oneG;
        void *memPointer = NULL;
        do{
                if(memPointer != NULL){
                        printf("Max Tested Memory = %zi\n",maxMem);
                        memset(memPointer,0,maxMem);
                        free(memPointer);
                }
                maxMem+=oneG;
                memPointer=malloc(maxMem);
        }while(memPointer != NULL);
        maxMem -= oneG;
        printf("Max Usable Memory aprox = %zi\n",maxMem);

        memPointer = malloc(maxMem);
        memset(memPointer,1,maxMem);
        sleep(30);

        return 0;
}

But this can reach the VM RAM limit (64G or 128G) without any problem. Same for the stress command....

stress -m 1 --vm-bytes 32G --vm-keep

So I'm running out of ideas... I can't figure out what makes NodeJS run OOM around 20G on these VMs....

I hope someone here has a clue about what is happening....

Thank you.

Minimal Reproduction

const fill = new Array(1000).fill('o').join('')
const bufs = []
let i = 0
while (true) {
  ++i
  bufs.push(Array.from({ length: 10*1024 * 1024 }, (_,i) => i+fill))
  // console.log(i)
}

The code just have to reach the OOM point.

Output

$ node --max-old-space-size=32000 --trace-gc index.js
[...traces]
[12808:0x6f27120]   146468 ms: Scavenge 19279.2 (19571.3) -> 19263.9 (19571.3) MB, 50.10 / 0.00 ms  (average mu = 0.831, current mu = 0.831) allocation failure;
[12808:0x6f27120]   146787 ms: Scavenge 19317.6 (19610.3) -> 19302.1 (19610.5) MB, 35.85 / 0.00 ms  (average mu = 0.831, current mu = 0.831) allocation failure;
Segmentation fault

Before You Submit

gireeshpunathil commented 2 months ago

I guess this is applicable here - v8 array size is limited: https://stackoverflow.com/questions/70746898/why-cannot-v8-nodejs-allocate-a-max-size-array-if-sufficient-memory . can you pls examine the stack trace from a core file generated with ulimit -c unlimited ?

riverego commented 2 months ago

Hello. Thank you for your answer. No it's not the array limitation. Each iteration create an array of 10 485 760 entries, and each passes. And it needs less than 100 iterations to reach the OOM.

Moreover, this test reaches the old space limit on my computer. The point of crash is random, it really looks like a ulimit reach. But there is no limit set, and the C program can malloc to the RAM limit..... It's just I can't see what can lock NodeJS memory :/

gireeshpunathil commented 2 months ago

@riverego - I was referring to bufs in your code, which grows unboundedly.

but you say it carries only less than 100 entires when OOM is hit, so apparently that is not the cause.

I guess there is a limit on the number of maps (object shape descriptions) in v8, but I am not sure of it, also that cannot explain why it works in one system and not in another.

for this reasons, I would still recommend you to turn on ulimit -c and look at stack trace to see why it failed to allocate (and ofc, console o/p too)