Node.js process rss keeps rising when heapUsed/heapTotal does not (native memory leak)?

eladnava commented 6 years ago

Node.js Version: v8.11.4 LTS
OS: Ubuntu 16.04 LTS

Hi all, First thanks so much for reading this and helping me out.

I've been attempting to fix a stubborn memory leak for a while now and thought that I did manage to finally get it (#1484) but it appears to have resurfaced in the form of a slowly increasing rss while heapUsed and heapTotal remain stable:

screen shot 2018-09-24 at 1 49 29 pm

This graph is from a two-day period. rss does appear to kind of stabilize towards the end of the graph and this appears to be when the system becomes low on memory (free -m reports 283 mb free). But my Node.js process does eventually (after 3-4 days) get terminated by the OOM killer:

Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075062] node invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075066] node cpuset=/ mems_allowed=0
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075073] CPU: 1 PID: 1374 Comm: node Not tainted 4.4.0-1066-aws #76-Ubuntu
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075075] Hardware name: Amazon EC2 t3.large/, BIOS 1.0 10/16/2017
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075167] Mem-Info:
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075172] active_anon:1818480 inactive_anon:20540 isolated_anon:0
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075172]  active_file:33 inactive_file:2 isolated_file:0
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075172]  unevictable:913 dirty:0 writeback:0 unstable:0
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075172]  slab_reclaimable:26440 slab_unreclaimable:92429
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075172]  mapped:1490 shmem:20587 pagetables:10451 bounce:0
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075172]  free:19960 free_pcp:251 free_cma:0
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075177] Node 0 DMA free:15904kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isola$
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075184] lowmem_reserve[]: 0 2961 7850 7850 7850
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075188] Node 0 DMA32 free:44296kB min:24756kB low:30944kB high:37132kB active_anon:2757604kB inactive_anon:20004kB active_file:128kB inactive_file:8kB u$
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075194] lowmem_reserve[]: 0 0 4888 4888 4888
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075198] Node 0 Normal free:19640kB min:40648kB low:50808kB high:60972kB active_anon:4516316kB inactive_anon:62156kB active_file:4kB inactive_file:0kB un$
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075204] lowmem_reserve[]: 0 0 0 0 0
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075207] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075220] Node 0 DMA32: 669*4kB (UME) 325*8kB (UME) 379*16kB (UME) 155*32kB (ME) 110*64kB (ME) 66*128kB (UME) 39*256kB (UME) 5*512kB (UME) 0*1024kB 0*2048$
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075233] Node 0 Normal: 790*4kB (UME) 133*8kB (UME) 374*16kB (UME) 89*32kB (ME) 45*64kB (UME) 20*128kB (UE) 5*256kB (UE) 0*512kB 0*1024kB 0*2048kB 0*4096$
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075246] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075248] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075250] 21239 total pagecache pages
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075252] 0 pages in swap cache
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075254] Swap cache stats: add 0, delete 0, find 0/0
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075255] Free swap  = 0kB
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075257] Total swap = 0kB
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075258] 2070936 pages RAM
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075259] 0 pages HighMem/MovableOnly
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075260] 53159 pages reserved
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075262] 0 pages cma reserved
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075263] 0 pages hwpoisoned
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075335] [ 1368]   999  1368   221212     4138      71      39        0             0 node
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075337] [ 1374]   999  1374   318078    24407     244     158        0             0 node
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075340] [ 1384]   999  1384  1186799   897156    4819    3874        0             0 node
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075343] [ 1389]   999  1389  1151235   877122    4679    3755        0             0 node
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075345] [26992]  1000 26992   300537     9880      87      51        0             0 node
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.075348] Out of memory: Kill process 1384 (node) score 448 or sacrifice child
Sep 23 17:10:48 ip-172-31-62-67 kernel: [309972.077965] Killed process 1384 (node) total-vm:4747196kB, anon-rss:3588624kB, file-rss:0kB

Since Node.js is the only process on the server consuming significant resources, I assumed a growing rss was due to no demand for memory allocation by other processes. But since the OOM killer is invoked this must not be the case.

Here's the second half of that graph zoomed in (after system memory becomes low):

screen shot 2018-09-24 at 1 58 19 pm

The instance in use is an AWS t3.large with 8GB of RAM.

Questions: 1) Is this a native memory leak? Why is the rss amount more than twice the size of heapTotal? This makes no sense I think. 2) Is there actually any leak at all or is the system simply failing to reclaim back memory, eventually invoking the OOM killer? 3) Should I turn off the Linux OOM killer as a workaround if there is no leak? 4) Is there a way to force the system to reclaim back unused memory?

Thanks again for anyone pitching in any advice, tips, observations or pointers. 🙏

eladnava commented 6 years ago

@addaleax In case you get a chance would love your expertise on this 👍

shellberg commented 6 years ago

1A) Yes, you clearly do have a memory leak.... And, by all indications, it is on the native memory side. Native memory allocations would not be reflected within the v8/JS heap; Buffers are a typical source of native memory allocation that are conducted outside of the JS heap. RSS reflects that portion of a process' allocated memory in active use (relative to the rest of the workload as discussed previously in #1484); its not just a subset of memory in the process, e.g. the JS heap. 2A) The system will not reclaim memory whilst it is allocated. Thus, if the rate of allocations vs free is always biased to allocations, then you have an exorable growth in the process size - a single 64-bit process can address a tremendous amount of memory, and consequently, you are going to hit a limitation on your container in terms of practical resources you have access too first. From JS code, you typically are not directly allocating/free-ing memory directly, but nevertheless, GC is a reactive process too; GC will not free storage it finds is still in use (through referential structures). Garbage collection is quite efficient at recycling storage for further allocation... but it can be expensive (hence performance tuning can suggest executing relatively infrequently). If your native memory is backing a (long-lived) JS object, then it will only release that native memory when the JS object is out-of-scope... that will only be detected on a GC cycle (and potentially one scanning older generational content), and that will only be prompted if the JS allocations need more space! If your JS heap is not actually under pressure, you might actually need to create some more stress in the JS heap so that GC activity there is 'more efficient' (in GC terms) managing to keep ontop of the unrestrained native memory allocations. 3A) The evidence suggests you do have an unrestrained growth of allocations... a leak if you will. Linux already features an overcommit memory model; if you just try and disable your OOM killer, you are likely to trade the problem(s) in the Node.js process for an even more random non-deterministic failure should you instead pressure the kernel (it will kill a random running process). 4A) 'Force the system to reclaim unused memory'?! The memory is being used! Its been allocated, but not yet freed/released. It is true that a process that has allocated and freed will retain first-use of those freed resources on its own free list; that is genuinely freed resources that the system could employ elsewhere, albeit it often doesn't get released to other processes unless the process ends... that is about as much as you could notionally force back to the system. The question really is whether it is unused memory... or its memory still in use (allocated). You really need to be tracking the total process size to establish the envelope of allocations/used memory (not just what has actively been recently 'used' (RSS) - includes what might have been scanned by GC, not necessarily 'used' by your application).

eladnava commented 6 years ago

@shellberg Thanks so much for the super detailed response!

Now I understand that there must be a native leak because of the growing rss eventually causing OOM.

Any ideas on how can I inspect native memory to find the leaking Buffers in my Node.js code? Would snapshotting and inspecting the v8 heap help in any way as maybe it will display the Buffer object pointers in the JS code? Otherwise I am unsure of any other tool to debug native memory leaks.

addaleax commented 6 years ago

Buffers should show up in external memory, though, and that seems to be pretty low here.

I’m not an expert on debugging memory leaks, but are you using any native modules in your code?

eladnava commented 6 years ago

@addaleax Thank you so much for that addition. I completely forgot about this being the case (buffers showing up in external) which completely debunks the theory that Buffers are being leaked in native memory. Just saved me a bunch of time trying to hunt down rogue Buffer objects.

No native modules are being used.

The application is actually an MQTT server that handles around 90k concurrent connections at any given moment. The clients constantly connect/disconnect over time but the number of connections never exceeds 90k, therefore the workload stays the same, but rss keeps rising.

So, if rss is still rising over time, and it's not due to Buffer objects (as those would be counted in external), and my app has no native modules, what else could be taking up more and more memory in rss? Is there any tool to find out?

shellberg commented 6 years ago

@eladnava Looking at some other (random) reports of the actions of the oom-killer and a 4.4.0 kernel, I get the strong sense that its a bit trigger-happy when it believes that physical RAM storage is approaching full, and it has no recourse to page, i.e. use any swap. The kernel is the only process that is running truly in your physical memory; all other user jobs run in an abstracted virtual memory system that gains access to some amount of physical memory for efficiency : hence, the kernel is very protective of physical memory!

Of interest, you appear to have configured for no swap resources at all?! That would raise some alarm bells for me; especially based on Linux's optimistic assumptions and over-commit memory model... (You might want to consult some swap setting guidelines too.) Incidentally, swap space is used for paging purposes in a modern kernel; its not just about swapping-out a process (which used to be quite a draconian action.).

Before you go trying to find a problem that might not be present, I suggest you give the kernel the suggestion of more memory resources, for it to believe its got a few more options to consider! And, not just physical RAM!

eladnava commented 6 years ago

@shellberg Thanks for the recommendation. My EC2 servers indeed are not configured to use any swap.

Can we say for sure that we are not talking about a memory leak though? Is it common practice nowadays to configure swap just to avoid OOM killer invocation? I'm scared that swap will just serve as a temporary patch and Node.js process rss will eventually start using swap memory until also eventually running out of swap and causing OOM.

shellberg commented 6 years ago

@eladnava I can't say for sure (re: leak). I don't have any known good profile of an MQTT server to go by - especially not of an MQTT server in JS code (Is this MQTT.js?). Then, there are factors of how many topics, listeners, queue managers, etc. that would define the loading of your server instance. Sorry, I'm not an MQTT expert. But, you have flat JS heap profiles, and as @addaleax observed, the external is flat too. Thus, we only have an RSS profile that is rising - but as I've discussed before that's much more complicated to reason about; its not just intrinsic to your application workload.

I'd say its common for Linux images to need swap, because of their over-commit tendency. That, or you have to configure the kernel to lessen over-commit/confine resources used by the kernel, or both! To be honest, I did find mention of a kernel bug that plagued 4.4.0 kernels, but its difficult to find the corresponding '-aws' version numbering/bug fixes included to know whether you have an affected version. There were some potentially actionable configurations that might lessen the OOM presentation (by differing ways to influence the oom-killer heuristics). If you want to find such actions, then the key is the gfp_mask=0x24201ca, essentially the reason code. This is more about administering a Linux-based container workload than node per se.

Nevertheless, you are being caught between heuristics : that of node, and sizing its heap (based on reported available memory), and that of a (dodgy?) kernel running with only bare-metal semiconductor storage; you might want to configure and fix the JS heap of node for your MQTT server. Introducing some swap (even via a memory disk?!) may be enough to suggest to the system its healthier than it thinks, and gives it more 'rope to hang itself'! And, it provides you with some early warning of further issues based on the behaviour of utilisation of that swap device, and what processes will become largely paged-out.

addaleax commented 6 years ago

@eladnava Is there any chance at all that you could share code in order to debug this? I know that might be a lot to ask for production apps…

addaleax commented 6 years ago

@eladnava Can you check whether this also happens with Node.js 10? Do you pass any particular flags to the Node.js binary (process.execArgv)?

eladnava commented 6 years ago

@shellberg Thanks for the additional insight. Indeed the server is configured with a vm.overcommit_memory default value of 0 which means heuristic overcommit, that can trigger OOM killer if a large allocation is requested when available memory is low.

I've just configured the server with a 1GB swap file and vm.swappiness of 15. Let's see if this prevents the OOM killer from being invoked. This will take a few days to test.

I still don't understand why rss just keeps rising and rising though.

@addaleax Sure, I could share the code, however the leak is not reproducible without having real MQTT devices constantly connecting and disconnecting to/from the server. I was not able to reproduce the leak with "fake" MQTT clients. Therefore it only happens in production with real connections.

I've previously tested with Node v10 but this was during a different stage of testing when I was attempting to fix other possible leaks. I will give it a shot if the swap attempt fails.

I have a really strong feeling about this not being a memory leak, as the rss completely stabilizes towards when system memory becomes low:

screen shot 2018-09-25 at 7 20 05 pm

As per Node.js binary args, here they are:

node --nouse-idle-notification --max-old-space-size=3584

I've just realized that --nouse-idle-notification may be ~~preventing Node.js from running gc and could explain the rising rss~~. Actually this flag seems to be deprecated and impactless since Node v4: https://github.com/nodejs/node/issues/5455.

addaleax commented 6 years ago

$ node --v8-options|grep use-idle
  --use-idle-notification (Use idle notification to reduce memory footprint.)

I don’t know how the internals of V8 work ehre, but I guess there’s a decent chance of that being true given the option description?

eladnava commented 6 years ago

@addaleax I've just realized that this flag (--nouse-idle-notification) seems to be deprecated and impactless since Node v4: https://github.com/nodejs/node/issues/5455

This flag used to be a way to control when the v8 garbage collector would run. Passing it would completely disable automatic GCs and would require the Node.js process to manually call global.gc() on its own terms. But it's no longer respected anyway so it must be unrelated.

Thanks for the fast response.

So it is not the cause of the rising rss symptom. Unless it's undocumented but still prevents v8 GC from running.

eladnava commented 6 years ago

For whatever reason the OS is not reclaiming or Node.js is not releasing its rss memory. Either way the process rss keeps rising up until the point where system memory becomes pretty low (free -m reports this):

              total        used        free      shared  buff/cache   available
Mem:           7881        6566         146          80        1169         556
Swap:             0           0           0

And then rss just magically stops increasing. The server can operate for another few days or so with no rss increase for the exact same load of clients but eventually my theory is that some memory allocation request is causing the OOM killer to be invoked as there is not enough available memory for the large allocation.

Possibly the swap workaround will solve this. Crossing my fingers and will let you guys know. Thanks again. 🙏

free -m with swap now after restarting server:

              total        used        free      shared  buff/cache   available
Mem:           7881        1090        6434           8         356        6420
Swap:          1023           0        1023

In any case I've removed the flag --nouse-idle-notification in case it still prevents v8 GC from running even though it is no longer documented.

Also @shellberg I will consider updating the kernel to rule out the OOM killer bug in 4.4.0 if the issue persists.

shellberg commented 6 years ago

@eladnava The OS is reclaiming physical memory its decided it needs to use! But, its doing so at the expense of the only active process in your workload that you care about!!! Linux heuristics are designed for a mixed workload (rather than a containerised workload), and hence its picked on the one process it considers to have most egregiously overcommitted memory consumption, and the means of reclamation is to reap(kill) that targetted process. But, your workload only consists of one active process, so by definition it is killing the one process you care about. The most recent free report is instructive : your container was really pushing the 8GB envelope of total physical resources; quite possibly in servicing the many MQTT clients that were (re-)connecting, and their networking stacks? (You might want to review the total workload ps -ef and see what else other than the kernel is running?)

In terms of your rising RSS profile of your node process, its related to the way the generational GC works with the pages of memory allocated to the v8/JS heap it is managing (3584MB, or 3.5GB), and also the distinction between an allocation and committing pages of physical backing store to support that allocation as pages are needed to host data, plus some heap fragmentation. (The propensity to accept an allocation without potentially having actual physical pages to host it is precisely what overcommit is!) Nevertheless, you can see that the RSS of the node process is asymptomatically converging on your total process size (code size + total allocations, crudely speaking) that is just under 3.6GB... your configured JS heap (3.5GB)+some native data (some : think 'external' plus a bit)+size of executable/shared libraries+JIT code cache. Still, the substantial size of your node process is determined by your explicit command-line configuration of a 3.5 GB heap : so, do you need that much?! Based on the heap utilisation you've shown so far, you can probably easily trim that some to 2.5 or 3GB? (Is this just a normal MQTT loading, or do you have to maintain capacity for a higher peak load?)

eladnava commented 6 years ago

Thank you @shellberg for the detailed explanation. I now understand a little better what is going on behind the scenes.

If the swap attempt fails I will decrease the --max-old-space-size to see if it has a positive effect on this issue as well. 👍

eladnava commented 6 years ago

It appears I was finally able to fix the memory leak issue! 🎉

Since I tried 4 fixes at once, I don't know which one specifically fixed it, but here are the last 4 things I tried:

Lowered tcp_mem to 4096 16384 16384
Enabled 1GB swap and set vm.swappiness to 60
Decreased --max-old-space-size to something more realistic
Updated AWS Ubuntu 16.04 image kernel to 4.4.0-1067-aws x86_64 (may have fixed erroneous OOM killer invocation due to kernel bug @shellberg mentioned)

It has now been 27 days and none of the Node.js processes have been terminated by OOM (where they would usually be terminated after a week or so)! 😄

Thanks so much @shellberg and @addaleax for all of your helpful tips and suggestions! 😄 👍 🎖 💯

xr commented 2 years ago

I have the same problem and debug for quite a while, in the end, i have also found one solution.

I changed the default memory allocator to "jemalloc", we are using debian distribution which is using glibc malloc as the default memory allocator, after some research, it turns out jemalloc is better than glib malloc for handling memory fragmentations, i can clearly see the rss decrease after the load after changing to jemalloc.

here is the part I added to our dockerfile to change the default memory allocator.

RUN apt-get update && apt-get install -y \
    libjemalloc-dev=5.1.0-3

RUN echo "/usr/lib/x86_64-linux-gnu/libjemalloc.so" >> /etc/ld.so.preload

For MacOS, i use:

DYLD_INSERT_LIBRARIES=/path/to/libjemalloc.dylib node index.js

for development purpose to alter the memory allocator only for node process.

Hope this could help for those people who might have the same problem in the future.

eladnava commented 2 years ago

@xr Thanks so much for sharing! Can confirm this resolves the issue on my end as well! 🎉

The rss no longer rises uncontrollably and the memory leak behavior is gone.

On Ubuntu 18.04 LTS, these commands install and configure jemalloc to be used:

sudo apt-get install libjemalloc-dev
sudo -i
echo "/usr/lib/x86_64-linux-gnu/libjemalloc.so" >> /etc/ld.so.preload

Restart the node process for it to start using jemalloc for allocations:

npm run restart

Check the PID of your running node process and plug it into the command below to verify it is using jemalloc:

sudo cat /proc/<PID OF NODEJS>/smaps | grep jemalloc

thundapete commented 2 years ago

@xr thank you for the post, this also worked for the same type of issue I was having.

primary-1 commented 2 years ago

@xr Really appreciate, I also solve this problem from your post. thanks.

bladexskull commented 2 years ago

Hi @xr @eladnava I have found this helpful, Also can you please tell that if i changed the default memory allocator in docker container, then do i need to install the jemalloc on OS as well? or more precisely since I have installed jemalloc in the docker container and verified by the grep command that jemalloc is being used. then how will installing jemalloc further on OS can help? I am using ubuntu 20.04, Thank you.

eladnava commented 2 years ago

Hi @addaleax, Installing jemalloc and configuring it to be the default allocator inside the Docker instance is all you need to do, no need to install it on the host machine.

jbleyaert commented 1 year ago

We also had this issue, even when using Jemalloc, but we were on Alpine. Changing Docker image from node:18.12.1-alpine3.17 to node:18.15-bullseye fixed our issue.

fengmk2 commented 10 months ago

After using jemalloc, the long-standing OOM caused by the continuous linear growth of RSS has finally been solved!!! If you are using Node.js for online services and running them for a long time, make sure you use jemalloc!

lwmxiaobei commented 10 months ago

My environment is docke, and the image is node:14-alpine3.16, but it doesn't work: RUN apk add --no-cache jemalloc-dev --repository https://mirrors.ustc.edu.cn/alpine/latest-stable/main

RUN echo "/usr/lib/libjemalloc.so.2" >> /etc/ld.so.preload

CMD ["node", "./app.js", "LD_PRELOAD=/usr/lib/libjemalloc.so.2"]

I tried both of the above two methods, but I can’t see any information about jemalloc in /proc/PID/smaps

yunnysunny commented 10 months ago

My environment is docke, and the image is node:14-alpine3.16, but it doesn't work: RUN apk add --no-cache jemalloc-dev --repository https://mirrors.ustc.edu.cn/alpine/latest-stable/main

RUN echo "/usr/lib/libjemalloc.so.2" >> /etc/ld.so.preload

CMD ["node", "./app.js", "LD_PRELOAD=/usr/lib/libjemalloc.so.2"]

I tried both of the above two methods, but I can’t see any information about jemalloc in /proc/PID/smaps

Try to run pldd 1 .

lwmxiaobei commented 10 months ago

LD_PRELOAD

It was normal to increase the environment variables, but I found that rss became higher after startup. The service has no traffic. It is just starting the service. Compared with not using jemalloc, each node process has increased by about 60m. Is this normal?

joebowbeer commented 10 months ago

@lwmxiaobei this issue is closed and your node14 runtime is no longer maintained - since 8 months ago.

If you have an issue with a version of node that is currently maintained, I suggest creating a new issue.

nodejs / help

Node.js process rss keeps rising when heapUsed/heapTotal does not (native memory leak)? #1518