Closed eladnava closed 6 years ago
@addaleax In case you get a chance would love your expertise on this π
1A) Yes, you clearly do have a memory leak.... And, by all indications, it is on the native memory side. Native memory allocations would not be reflected within the v8/JS heap; Buffers are a typical source of native memory allocation that are conducted outside of the JS heap. RSS reflects that portion of a process' allocated memory in active use (relative to the rest of the workload as discussed previously in #1484); its not just a subset of memory in the process, e.g. the JS heap. 2A) The system will not reclaim memory whilst it is allocated. Thus, if the rate of allocations vs free is always biased to allocations, then you have an exorable growth in the process size - a single 64-bit process can address a tremendous amount of memory, and consequently, you are going to hit a limitation on your container in terms of practical resources you have access too first. From JS code, you typically are not directly allocating/free-ing memory directly, but nevertheless, GC is a reactive process too; GC will not free storage it finds is still in use (through referential structures). Garbage collection is quite efficient at recycling storage for further allocation... but it can be expensive (hence performance tuning can suggest executing relatively infrequently). If your native memory is backing a (long-lived) JS object, then it will only release that native memory when the JS object is out-of-scope... that will only be detected on a GC cycle (and potentially one scanning older generational content), and that will only be prompted if the JS allocations need more space! If your JS heap is not actually under pressure, you might actually need to create some more stress in the JS heap so that GC activity there is 'more efficient' (in GC terms) managing to keep ontop of the unrestrained native memory allocations. 3A) The evidence suggests you do have an unrestrained growth of allocations... a leak if you will. Linux already features an overcommit memory model; if you just try and disable your OOM killer, you are likely to trade the problem(s) in the Node.js process for an even more random non-deterministic failure should you instead pressure the kernel (it will kill a random running process). 4A) 'Force the system to reclaim unused memory'?! The memory is being used! Its been allocated, but not yet freed/released. It is true that a process that has allocated and freed will retain first-use of those freed resources on its own free list; that is genuinely freed resources that the system could employ elsewhere, albeit it often doesn't get released to other processes unless the process ends... that is about as much as you could notionally force back to the system. The question really is whether it is unused memory... or its memory still in use (allocated). You really need to be tracking the total process size to establish the envelope of allocations/used memory (not just what has actively been recently 'used' (RSS) - includes what might have been scanned by GC, not necessarily 'used' by your application).
@shellberg Thanks so much for the super detailed response!
Now I understand that there must be a native leak because of the growing rss
eventually causing OOM.
Any ideas on how can I inspect native memory to find the leaking Buffer
s in my Node.js code? Would snapshotting and inspecting the v8
heap help in any way as maybe it will display the Buffer
object pointers in the JS code? Otherwise I am unsure of any other tool to debug native memory leaks.
Buffers should show up in external memory, though, and that seems to be pretty low here.
Iβm not an expert on debugging memory leaks, but are you using any native modules in your code?
@addaleax Thank you so much for that addition. I completely forgot about this being the case (buffers showing up in external
) which completely debunks the theory that Buffers
are being leaked in native memory. Just saved me a bunch of time trying to hunt down rogue Buffer
objects.
No native modules are being used.
The application is actually an MQTT server that handles around 90k concurrent connections at any given moment. The clients constantly connect/disconnect over time but the number of connections never exceeds 90k, therefore the workload stays the same, but rss
keeps rising.
So, if rss
is still rising over time, and it's not due to Buffer
objects (as those would be counted in external
), and my app has no native modules, what else could be taking up more and more memory in rss
? Is there any tool to find out?
@eladnava Looking at some other (random) reports of the actions of the oom-killer and a 4.4.0 kernel, I get the strong sense that its a bit trigger-happy when it believes that physical RAM storage is approaching full, and it has no recourse to page, i.e. use any swap. The kernel is the only process that is running truly in your physical memory; all other user jobs run in an abstracted virtual memory system that gains access to some amount of physical memory for efficiency : hence, the kernel is very protective of physical memory!
Of interest, you appear to have configured for no swap resources at all?! That would raise some alarm bells for me; especially based on Linux's optimistic assumptions and over-commit memory model... (You might want to consult some swap setting guidelines too.) Incidentally, swap space is used for paging purposes in a modern kernel; its not just about swapping-out a process (which used to be quite a draconian action.).
Before you go trying to find a problem that might not be present, I suggest you give the kernel the suggestion of more memory resources, for it to believe its got a few more options to consider! And, not just physical RAM!
@shellberg Thanks for the recommendation. My EC2 servers indeed are not configured to use any swap.
Can we say for sure that we are not talking about a memory leak though? Is it common practice nowadays to configure swap just to avoid OOM killer invocation? I'm scared that swap will just serve as a temporary patch and Node.js process rss
will eventually start using swap
memory until also eventually running out of swap
and causing OOM.
@eladnava I can't say for sure (re: leak). I don't have any known good profile of an MQTT server to go by - especially not of an MQTT server in JS code (Is this MQTT.js
?). Then, there are factors of how many topics, listeners, queue managers, etc. that would define the loading of your server instance. Sorry, I'm not an MQTT expert. But, you have flat JS heap profiles, and as @addaleax observed, the external is flat too. Thus, we only have an RSS profile that is rising - but as I've discussed before that's much more complicated to reason about; its not just intrinsic to your application workload.
I'd say its common for Linux images to need swap, because of their over-commit tendency. That, or you have to configure the kernel to lessen over-commit/confine resources used by the kernel, or both!
To be honest, I did find mention of a kernel bug that plagued 4.4.0 kernels, but its difficult to find the corresponding '-aws' version numbering/bug fixes included to know whether you have an affected version. There were some potentially actionable configurations that might lessen the OOM presentation (by differing ways to influence the oom-killer heuristics). If you want to find such actions, then the key is the gfp_mask=0x24201ca
, essentially the reason code. This is more about administering a Linux-based container workload than node per se.
Nevertheless, you are being caught between heuristics : that of node, and sizing its heap (based on reported available memory), and that of a (dodgy?) kernel running with only bare-metal semiconductor storage; you might want to configure and fix the JS heap of node for your MQTT server. Introducing some swap (even via a memory disk?!) may be enough to suggest to the system its healthier than it thinks, and gives it more 'rope to hang itself'! And, it provides you with some early warning of further issues based on the behaviour of utilisation of that swap device, and what processes will become largely paged-out.
@eladnava Is there any chance at all that you could share code in order to debug this? I know that might be a lot to ask for production appsβ¦
@eladnava Can you check whether this also happens with Node.js 10? Do you pass any particular flags to the Node.js binary (process.execArgv
)?
@shellberg Thanks for the additional insight. Indeed the server is configured with a vm.overcommit_memory
default value of 0
which means heuristic overcommit
, that can trigger OOM killer if a large allocation is requested when available memory is low.
I've just configured the server with a 1GB
swap file and vm.swappiness
of 15
. Let's see if this prevents the OOM killer from being invoked. This will take a few days to test.
I still don't understand why rss
just keeps rising and rising though.
@addaleax Sure, I could share the code, however the leak is not reproducible without having real MQTT devices constantly connecting and disconnecting to/from the server. I was not able to reproduce the leak with "fake" MQTT clients. Therefore it only happens in production with real connections.
I've previously tested with Node v10 but this was during a different stage of testing when I was attempting to fix other possible leaks. I will give it a shot if the swap
attempt fails.
I have a really strong feeling about this not being a memory leak, as the rss
completely stabilizes towards when system memory becomes low:
As per Node.js binary args, here they are:
node --nouse-idle-notification --max-old-space-size=3584
I've just realized that --nouse-idle-notification
may be preventing Node.js from running . Actually this flag seems to be deprecated and impactless since Node v4: https://github.com/nodejs/node/issues/5455.gc
and could explain the rising rss
$ node --v8-options|grep use-idle
--use-idle-notification (Use idle notification to reduce memory footprint.)
I donβt know how the internals of V8 work ehre, but I guess thereβs a decent chance of that being true given the option description?
@addaleax I've just realized that this flag (--nouse-idle-notification
) seems to be deprecated and impactless since Node v4:
https://github.com/nodejs/node/issues/5455
This flag used to be a way to control when the v8 garbage collector would run. Passing it would completely disable automatic GCs and would require the Node.js process to manually call global.gc()
on its own terms. But it's no longer respected anyway so it must be unrelated.
Thanks for the fast response.
So it is not the cause of the rising rss
symptom. Unless it's undocumented but still prevents v8 GC from running.
For whatever reason the OS is not reclaiming or Node.js is not releasing its rss
memory. Either way the process rss
keeps rising up until the point where system memory becomes pretty low (free -m
reports this):
total used free shared buff/cache available
Mem: 7881 6566 146 80 1169 556
Swap: 0 0 0
And then rss
just magically stops increasing. The server can operate for another few days or so with no rss
increase for the exact same load of clients but eventually my theory is that some memory allocation request is causing the OOM killer to be invoked as there is not enough available memory for the large allocation.
Possibly the swap
workaround will solve this. Crossing my fingers and will let you guys know. Thanks again. π
free -m
with swap
now after restarting server:
total used free shared buff/cache available
Mem: 7881 1090 6434 8 356 6420
Swap: 1023 0 1023
In any case I've removed the flag --nouse-idle-notification
in case it still prevents v8 GC from running even though it is no longer documented.
Also @shellberg I will consider updating the kernel to rule out the OOM killer bug in 4.4.0
if the issue persists.
@eladnava The OS is reclaiming physical memory its decided it needs to use! But, its doing so at the expense of the only active process in your workload that you care about!!! Linux heuristics are designed for a mixed workload (rather than a containerised workload), and hence its picked on the one process it considers to have most egregiously overcommitted memory consumption, and the means of reclamation is to reap(kill) that targetted process. But, your workload only consists of one active process, so by definition it is killing the one process you care about. The most recent free
report is instructive : your container was really pushing the 8GB envelope of total physical resources; quite possibly in servicing the many MQTT clients that were (re-)connecting, and their networking stacks? (You might want to review the total workload ps -ef
and see what else other than the kernel is running?)
In terms of your rising RSS profile of your node process, its related to the way the generational GC works with the pages of memory allocated to the v8/JS heap it is managing (3584MB, or 3.5GB), and also the distinction between an allocation and committing pages of physical backing store to support that allocation as pages are needed to host data, plus some heap fragmentation. (The propensity to accept an allocation without potentially having actual physical pages to host it is precisely what overcommit is!) Nevertheless, you can see that the RSS of the node process is asymptomatically converging on your total process size (code size + total allocations, crudely speaking) that is just under 3.6GB... your configured JS heap (3.5GB)+some native data (some : think 'external' plus a bit)+size of executable/shared libraries+JIT code cache. Still, the substantial size of your node process is determined by your explicit command-line configuration of a 3.5 GB heap : so, do you need that much?! Based on the heap utilisation you've shown so far, you can probably easily trim that some to 2.5 or 3GB? (Is this just a normal MQTT loading, or do you have to maintain capacity for a higher peak load?)
Thank you @shellberg for the detailed explanation. I now understand a little better what is going on behind the scenes.
If the swap
attempt fails I will decrease the --max-old-space-size
to see if it has a positive effect on this issue as well. π
It appears I was finally able to fix the memory leak issue! π
Since I tried 4 fixes at once, I don't know which one specifically fixed it, but here are the last 4 things I tried:
tcp_mem
to 4096 16384 16384
vm.swappiness
to 60
--max-old-space-size
to something more realistic4.4.0-1067-aws x86_64
(may have fixed erroneous OOM killer invocation due to kernel bug @shellberg mentioned)It has now been 27 days and none of the Node.js processes have been terminated by OOM (where they would usually be terminated after a week or so)! π
Thanks so much @shellberg and @addaleax for all of your helpful tips and suggestions! π π π π―
I have the same problem and debug for quite a while, in the end, i have also found one solution.
I changed the default memory allocator to "jemalloc", we are using debian distribution which is using glibc malloc as the default memory allocator, after some research, it turns out jemalloc is better than glib malloc for handling memory fragmentations, i can clearly see the rss decrease after the load after changing to jemalloc.
here is the part I added to our dockerfile to change the default memory allocator.
RUN apt-get update && apt-get install -y \
libjemalloc-dev=5.1.0-3
RUN echo "/usr/lib/x86_64-linux-gnu/libjemalloc.so" >> /etc/ld.so.preload
For MacOS, i use:
DYLD_INSERT_LIBRARIES=/path/to/libjemalloc.dylib node index.js
for development purpose to alter the memory allocator only for node process.
Hope this could help for those people who might have the same problem in the future.
@xr Thanks so much for sharing! Can confirm this resolves the issue on my end as well! π
The rss
no longer rises uncontrollably and the memory leak behavior is gone.
On Ubuntu 18.04 LTS, these commands install and configure jemalloc
to be used:
sudo apt-get install libjemalloc-dev
sudo -i
echo "/usr/lib/x86_64-linux-gnu/libjemalloc.so" >> /etc/ld.so.preload
Restart the node process for it to start using jemalloc
for allocations:
npm run restart
Check the PID of your running node
process and plug it into the command below to verify it is using jemalloc
:
sudo cat /proc/<PID OF NODEJS>/smaps | grep jemalloc
@xr thank you for the post, this also worked for the same type of issue I was having.
@xr Really appreciate, I also solve this problem from your post. thanks.
Hi @xr @eladnava I have found this helpful, Also can you please tell that if i changed the default memory allocator in docker container, then do i need to install the jemalloc on OS as well? or more precisely since I have installed jemalloc in the docker container and verified by the grep command that jemalloc is being used. then how will installing jemalloc further on OS can help? I am using ubuntu 20.04, Thank you.
Hi @addaleax,
Installing jemalloc
and configuring it to be the default allocator inside the Docker instance is all you need to do, no need to install it on the host machine.
We also had this issue, even when using Jemalloc, but we were on Alpine. Changing Docker image from node:18.12.1-alpine3.17 to node:18.15-bullseye fixed our issue.
After using jemalloc, the long-standing OOM caused by the continuous linear growth of RSS has finally been solved!!! If you are using Node.js for online services and running them for a long time, make sure you use jemalloc!
My environment is docke, and the image is node:14-alpine3.16, but it doesn't work: RUN apk add --no-cache jemalloc-dev --repository https://mirrors.ustc.edu.cn/alpine/latest-stable/main
RUN echo "/usr/lib/libjemalloc.so.2" >> /etc/ld.so.preload
CMD ["node", "./app.js", "LD_PRELOAD=/usr/lib/libjemalloc.so.2"]
I tried both of the above two methods, but I canβt see any information about jemalloc in /proc/PID/smaps
My environment is docke, and the image is node:14-alpine3.16, but it doesn't work: RUN apk add --no-cache jemalloc-dev --repository https://mirrors.ustc.edu.cn/alpine/latest-stable/main
RUN echo "/usr/lib/libjemalloc.so.2" >> /etc/ld.so.preload
CMD ["node", "./app.js", "LD_PRELOAD=/usr/lib/libjemalloc.so.2"]
I tried both of the above two methods, but I canβt see any information about jemalloc in /proc/PID/smaps
Try to run pldd 1
.
LD_PRELOAD
It was normal to increase the environment variables, but I found that rss became higher after startup. The service has no traffic. It is just starting the service. Compared with not using jemalloc, each node process has increased by about 60m. Is this normal?
@lwmxiaobei this issue is closed and your node14 runtime is no longer maintained - since 8 months ago.
If you have an issue with a version of node that is currently maintained, I suggest creating a new issue.
Hi all, First thanks so much for reading this and helping me out.
I've been attempting to fix a stubborn memory leak for a while now and thought that I did manage to finally get it (#1484) but it appears to have resurfaced in the form of a slowly increasing
rss
whileheapUsed
andheapTotal
remain stable:This graph is from a two-day period.
rss
does appear to kind of stabilize towards the end of the graph and this appears to be when the system becomes low on memory (free -m
reports283
mb free). But my Node.js process does eventually (after 3-4 days) get terminated by the OOM killer:Since Node.js is the only process on the server consuming significant resources, I assumed a growing
rss
was due to no demand for memory allocation by other processes. But since the OOM killer is invoked this must not be the case.Here's the second half of that graph zoomed in (after system memory becomes low):
The instance in use is an AWS
t3.large
with8GB
of RAM.Questions: 1) Is this a native memory leak? Why is the
rss
amount more than twice the size ofheapTotal
? This makes no sense I think. 2) Is there actually any leak at all or is the system simply failing to reclaim back memory, eventually invoking the OOM killer? 3) Should I turn off the Linux OOM killer as a workaround if there is no leak? 4) Is there a way to force the system to reclaim back unused memory?Thanks again for anyone pitching in any advice, tips, observations or pointers. π