jhuckaby / performa

A multi-server monitoring system with a web based UI.
Other
431 stars 22 forks source link

Out of memory #1

Open asyslinux opened 5 years ago

asyslinux commented 5 years ago

Hello, here generated report about out of memory:

Performa Version: 1.0.11

root@monitoring:/opt/performa# cat report.20190621.000715.4697.0.001.json

{ "header": { "event": "Allocation failed - JavaScript heap out of memory", "trigger": "FatalError", "filename": "report.20190621.000715.4697.0.001.json", "dumpEventTime": "2019-06-21T00:07:15Z", "dumpEventTimeStamp": "1561090035977", "processId": 4697, "cwd": "/opt/performa", "commandLine": [ "/usr/bin/node", "/opt/performa/lib/main.js" ], "nodejsVersion": "v11.15.0", "glibcVersionRuntime": "2.24", "glibcVersionCompiler": "2.12", "wordSize": 64, "arch": "x64", "platform": "linux", "componentVersions": { "node": "11.15.0", "v8": "7.0.276.38-node.19", "uv": "1.27.0", "zlib": "1.2.11", "brotli": "1.0.7", "ares": "1.15.0", "modules": "67", "nghttp2": "1.37.0", "napi": "4", "llhttp": "1.1.1", "http_parser": "2.8.0", "openssl": "1.1.1b", "cldr": "34.0", "icu": "63.1", "tz": "2018e", "unicode": "11.0" }, "release": { "name": "node", "headersUrl": "https://nodejs.org/download/release/v11.15.0/node-v11.15.0-headers.tar.gz", "sourceUrl": "https://nodejs.org/download/release/v11.15.0/node-v11.15.0.tar.gz" }, "osName": "Linux", "osRelease": "4.15.18-16-pve", "osVersion": "#1 SMP PVE 4.15.18-41 (Tue, 18 Jun 2019 07:36:54 +0200)", "osMachine": "x86_64", "host": "monitoring" }, "javascriptStack": { "message": "No stack.", "stack": [ "Unavailable." ] }, "nativeStack": [ { "pc": "0x0000000000a702fd", "symbol": "report::TriggerNodeReport(v8::Isolate, node::Environment, char const, char const, std::string const&, v8::Local) [Performa Server]" }, { "pc": "0x000000000095ccb3", "symbol": "node::OnFatalError(char const, char const) [Performa Server]" }, { "pc": "0x0000000000b3dbde", "symbol": "v8::Utils::ReportOOMFailure(v8::internal::Isolate, char const, bool) [Performa Server]" }, { "pc": "0x0000000000b3de14", "symbol": "v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate, char const, bool) [Performa Server]" }, { "pc": "0x0000000000f3ce52", "symbol": " [Performa Server]" }, { "pc": "0x0000000000f3cf58", "symbol": "v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [Performa Server]" }, { "pc": "0x0000000000f49678", "symbol": "v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [Performa Server]" }, { "pc": "0x0000000000f4a18b", "symbol": "v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [Performa Server]" }, { "pc": "0x0000000000f4cec1", "symbol": "v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [Performa Server]" }, { "pc": "0x0000000000f170f4", "symbol": "v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) [Performa Server]" }, { "pc": "0x00000000011cd3fe", "symbol": "v8::internal::Runtime_AllocateInNewSpace(int, v8::internal::Object*, v8::internal::Isolate) [Performa Server]" }, { "pc": "0x000005315f3cfc5d", "symbol": "" } ], "javascriptHeap": { "totalMemory": 1505017856, "totalCommittedMemory": 1502567104, "usedMemory": 1435376056, "availableMemory": 31663760, "memoryLimit": 1526909922, "heapSpaces": { "read_only_space": { "memorySize": 524288, "committedMemory": 42224, "capacity": 515584, "used": 33520, "available": 482064 }, "new_space": { "memorySize": 16777216, "committedMemory": 16261328, "capacity": 8249344, "used": 632936, "available": 7616408 }, "old_space": { "memorySize": 1480265728, "committedMemory": 1478831968, "capacity": 1431881072, "used": 1430015880, "available": 1865192 }, "code_space": { "memorySize": 2097152, "committedMemory": 2078368, "capacity": 1683584, "used": 1683584, "available": 0 }, "map_space": { "memorySize": 1585152, "committedMemory": 1584896, "capacity": 589840, "used": 589840, "available": 0 }, "large_object_space": { "memorySize": 3768320, "committedMemory": 3768320, "capacity": 24120392, "used": 2420296, "available": 21700096 }, "new_large_object_space": { "memorySize": 0, "committedMemory": 0, "capacity": 0, "used": 0, "available": 0 } } }, "resourceUsage": { "userCpuSeconds": 6817.81, "kernelCpuSeconds": 1117.16, "cpuConsumptionPercent": 9.71482, "maxRss": 21958963200, "pageFaults": { "IORequired": 41, "IONotRequired": 367632972 }, "fsActivity": { "reads": 1750128, "writes": 84390768 } }, "uvthreadResourceUsage": { "userCpuSeconds": 2821.63, "kernelCpuSeconds": 607.251, "cpuConsumptionPercent": 4.198, "fsActivity": { "reads": 5416, "writes": 2163960 } }, "libuv": [ ], "environmentVariables": { "LANGUAGE": "en_US:en", "USER": "root", "HOME": "/root", "OLDPWD": "/", "LOGNAME": "root", "JOURNAL_STREAM": "9:20772", "PATH": "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "INVOCATION_ID": "807b0074e6224f72b1cc2cc12b63c5a2", "LANG": "en_US.UTF-8", "SHELL": "/bin/sh", "PWD": "/opt/performa", "__daemon": "true" }, "userLimits": { "core_file_size_blocks": { "soft": 0, "hard": "unlimited" }, "data_seg_size_kbytes": { "soft": "unlimited", "hard": "unlimited" }, "file_size_blocks": { "soft": "unlimited", "hard": "unlimited" }, "max_locked_memory_bytes": { "soft": 65536, "hard": 65536 }, "max_memory_size_kbytes": { "soft": "unlimited", "hard": "unlimited" }, "open_files": { "soft": 524288, "hard": 524288 }, "stack_size_bytes": { "soft": 8388608, "hard": "unlimited" }, "cpu_time_seconds": { "soft": "unlimited", "hard": "unlimited" }, "max_user_processes": { "soft": 512602, "hard": 512602 }, "virtual_memory_kbytes": { "soft": "unlimited", "hard": "unlimited" } }, "sharedObjects": [ "linux-vdso.so.1", "/lib/x86_64-linux-gnu/libdl.so.2", "/lib/x86_64-linux-gnu/librt.so.1", "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "/lib/x86_64-linux-gnu/libm.so.6", "/lib/x86_64-linux-gnu/libgcc_s.so.1", "/lib/x86_64-linux-gnu/libpthread.so.0", "/lib/x86_64-linux-gnu/libc.so.6", "/lib64/ld-linux-x86-64.so.2" ] }

jhuckaby commented 5 years ago

How strange, I've never experienced an out of memory error before. The Performa daemon never uses more than 100MB on all my servers, and I've run it 24x7 on 50+ servers for months, but your JSON report above looks like it grew up to 1.5GB 😲, which is the default Node.js maximum memory limit.

There is a way to increase the limit (see below), but something is very wrong here, because it should really never eat up anywhere close to that amount of memory.

First, can you send me a copy of your /opt/performa/conf/config.json file from your Performa server? Feel free to scrub the secret key and any other sensitive information like SMTP hostname, AWS keys, etc. My e-mail address is jhuckaby at gmail dot com.

Second, can you provide more information about the hardware and OS software you are using? Which flavor and version of Linux (CentOS? RedHat? Ubuntu?), also total RAM on server, etc.

If all else fails, we can try increasing your Node.js memory limit, but this is a last resort. In your /opt/performa/conf/config.json file, you can add this top-level property:

"inject_cli_args": ["--max_old_space_size=4096"]

Then restart Performa with /opt/performa/bin/control.sh restart. This should set the Node.js maximum memory to 4 GB (up from the default of 1.5 GB). However, I am afraid if we have a leak of some kind this will simply delay the problem, and you'll hit the new 4 GB limit eventually anyway. Also, your server needs at least 4 GB of available memory for this to even work. I recommend we try to find the root cause of the issue before increasing the memory limit.

Thanks.

jhuckaby commented 5 years ago

Thank you @asyslinux for your e-mail. I believe this is the culprit right here, in your config.json file:

"cache": {
    "enabled": true,
    "maxItems": 32786,
    "maxBytes": 536870912
}

Unfortunately the cache system isn't exact, and it actually uses a LOT more memory than the value you put into maxBytes. This is due to a bug in an external library which is fixed as of 4 days ago, but Performa hasn't been updated yet. I believe this is what is causing your Node.js to hit the maximum of 1.5 GB RAM and crashing. Also, I should mention that even with the bug fixed the cache system uses a Node.js hash which has a lot of overhead memory, so it will naturally use more than the specified maxBytes.

I highly recommend scaling this way back down to only use 10 MB or so, until I can release a new version with more accurate cache memory measurement. Recommend:

"cache": {
    "enabled": true,
    "maxItems": 1000,
    "maxBytes": 10485760
}

Then /opt/performa/bin/control.sh restart.

asyslinux commented 5 years ago

Thank You :)