louislam / uptime-kuma

A fancy self-hosted monitoring tool
https://uptime.kuma.pet
MIT License
59.01k stars 5.3k forks source link

[Real Browser] High CPU/Memory Usage #3788

Open Ali-Razmjoo opened 1 year ago

Ali-Razmjoo commented 1 year ago

⚠️ Please verify that this bug has NOT been raised before.

🛡️ Security Policy

Description

Hi.

I have been using the new feature using web drive to browse a URL; and today I noticed my status page is down; when I tried to connect to my SSH I got time out and my monitoring indicates there is high CPU usage of chromium inside of docker.

Here is how service is affected.

image

Current status from processes usages sort by highest size / usage :

image

Full command (there are many of these) list from my current processes:

/usr/lib/chromium/chromium --type=renderer --no-sandbox --disable-dev-shm-usage --disable-background-timer-throttling --disable-breakpad --enable-automation --file-url-path-alias=/gen=/usr/lib/chromium/gen --force-color-profile=srgb --remote-debugging-pipe --allow-pre-commit-input --blink-settings=primaryHoverType=2,availableHoverTypes=2,primaryPointerType=4,availablePointerTypes=4 --field-trial-handle=10896847204213107046,12334856278103598370,131072 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=AcceptCHFrame,AutoExpandDetailsElement,AvoidUnnecessaryBeforeUnloadCheckSync,CertificateTransparencyComponentUpdater,DestroyProfileOnBrowserClose,DialMediaRouteProvider,GlobalMediaControls,ImprovedCookieControls,LazyFrameLoading,MediaRouter,Translate --disable-databases --disable-gpu-compositing --lang=en-US --headless --export-tagged-pdf --num-raster-threads=1 --renderer-client-id=5747 --shared-files=v8_context_snapshot_data:100

I think the implementation has some issues with killing the previous process or clearing cache. I have only one check which uses webdriver, with interval every 20 seconds and retrying 3 times before getting alert.

👟 Reproduction steps

Just setup a chrome check and wait a couple of days.

👀 Expected behavior

Not crashing / less cpu usage.

😓 Actual Behavior

Crashing and high CPU usage.

🐻 Uptime-Kuma Version

1.23.1

💻 Operating System and Arch

Ubuntu 22.04.3 LTS

🌐 Browser

Chromium 90.0.4430.212 (in docker)

🐋 Docker Version

Docker version 24.0.6, build ed223bc

🟩 NodeJS Version

No response

📝 Relevant log output

No response

chakflying commented 1 year ago

Do you have a chart for the RAM used? I'm guessing it ran out of memory and started swapping.

Ali-Razmjoo commented 1 year ago

I exported the table on photo via WAZUH XDR, I assume the Size is the memory size, and the virtual server had 4GB of memory.

Here is hardware information Hetzner CPX21.

Ali-Razmjoo commented 1 year ago

I sent a PR that might fix this issue; I had around 40 chrome processes open so I think closing them is a big help.

chakflying commented 1 year ago

Running the docker container version, using 1 browser monitor type. Memory usage does slowly increase.

image

A group of chromium processes stay running in background. When the monitor heartbeat runs, another group of chromium processes spawn, and die when heartbeat is finished.

idle running
image image
aliuq commented 9 months ago

same issue, I deployed 3 hours ago, it contains 3 monitors with HTTP(s) - Browser Engine (Chrome/Chromium) (Beta) and 1 docker monitor

/usr/lib/chromium/chromium --type=renderer --no-sandbox --disable-dev-shm-usage --disable-background-timer-throttling --disable-breakpad --enable-automation --file-url-path-alias=/gen=/usr/lib/chromium/gen --force-color-profile=srgb --remote-debugging-pipe --allow-pre-commit-input --blink-settings=primaryHoverType=2,availableHoverTypes=2,primaryPointerType=4,availablePointerTypes=4 --field-trial-handle=14242269732325328734,17121546340593549101,131072 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=AcceptCHFrame,AutoExpandDetailsElement,AvoidUnnecessaryBeforeUnloadCheckSync,CertificateTransparencyComponentUpdater,DestroyProfileOnBrowserClose,DialMediaRouteProvider,GlobalMediaControls,ImprovedCookieControls,LazyFrameLoading,MediaRouter,Translate --disable-databases --lang=en-US --headless --export-tagged-pdf --num-raster-threads=2 -

image

panaris commented 7 months ago

same problem on docker desktop and on fly.io

devneok commented 6 months ago

Older version ran smooth. This new version Very High CPU usage. One there's core it will use all of it.

peterbo commented 6 months ago

Same here - Chromium processes eating up all server resources (RAM/CPU/IO) over time. around 4 Headless tests are set up. Server is mostly just for monitoring, so the specs of 4vCores, 16GB RAM should be more than enough. kuma-chromium

turtur1234 commented 3 months ago

I want to add on to this, I just had a system lock/ crash due to what I believe is a chromium based issue. If someone can guide me on how to extract logs from the container, I would love to provide them. I'm running a x86 unraid system w/ 32GB of RAM and uptime kuma ate it all in about 2 minutes.

The logs I do have follow and is all I know how to get now:

    at listOnTimeout (node:internal/timers:538:9)
    at process.processTimers (node:internal/timers:512:7)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:572:22)
    at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:558:22)
    at async RedBeanNode.getCell (/app/node_modules/redbean-node/dist/redbean-node.js:593:19)
    at async Settings.get (/app/server/settings.js:54:21) {
  sql: 'SELECT `value` FROM setting WHERE `key` = ?  limit ?',
  bindings: [ 'trustProxy', 1 ]
}
    at process.unexpectedErrorHandler (/app/server/server.js:1905:13)
    at process.emit (node:events:517:28)
    at emit (node:internal/process/promises:149:20)
    at processPromiseRejections (node:internal/process/promises:283:27)
    at processTicksAndRejections (node:internal/process/task_queues:96:32)
    at runNextTicks (node:internal/process/task_queues:64:3)
    at listOnTimeout (node:internal/timers:538:9)
    at process.processTimers (node:internal/timers:512:7)
If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at listOnTimeout (node:internal/timers:538:9)
    at process.processTimers (node:internal/timers:512:7)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:572:22)
    at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:558:22)
    at async RedBeanNode.getCell (/app/node_modules/redbean-node/dist/redbean-node.js:593:19)
    at async Settings.get (/app/server/settings.js:54:21) {
  sql: 'SELECT `value` FROM setting WHERE `key` = ?  limit ?',
  bindings: [ 'trustProxy', 1 ]
}
    at process.unexpectedErrorHandler (/app/server/server.js:1905:13)
    at process.emit (node:events:517:28)
    at emit (node:internal/process/promises:149:20)
    at processPromiseRejections (node:internal/process/promises:283:27)
    at processTicksAndRejections (node:internal/process/task_queues:96:32)
    at runNextTicks (node:internal/process/task_queues:64:3)
    at listOnTimeout (node:internal/timers:538:9)
    at process.processTimers (node:internal/timers:512:7)
If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at listOnTimeout (node:internal/timers:538:9)
    at process.processTimers (node:internal/timers:512:7)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:572:22)
    at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:558:22)
    at async RedBeanNode.getCell (/app/node_modules/redbean-node/dist/redbean-node.js:593:19)
    at async Settings.get (/app/server/settings.js:54:21) {
  sql: 'SELECT `value` FROM setting WHERE `key` = ?  limit ?',
  bindings: [ 'trustProxy', 1 ]
}
    at process.unexpectedErrorHandler (/app/server/server.js:1905:13)
    at process.emit (node:events:517:28)
    at emit (node:internal/process/promises:149:20)
    at processPromiseRejections (node:internal/process/promises:283:27)
    at processTicksAndRejections (node:internal/process/task_queues:96:32)
    at runNextTicks (node:internal/process/task_queues:64:3)
    at listOnTimeout (node:internal/timers:538:9)
    at process.processTimers (node:internal/timers:512:7)
If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at listOnTimeout (node:internal/timers:538:9)
    at process.processTimers (node:internal/timers:512:7)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:572:22)
    at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:558:22)
    at async RedBeanNode.getCell (/app/node_modules/redbean-node/dist/redbean-node.js:593:19)
    at async Settings.get (/app/server/settings.js:54:21) {
  sql: 'SELECT `value` FROM setting WHERE `key` = ?  limit ?',
  bindings: [ 'trustProxy', 1 ]
}
    at process.unexpectedErrorHandler (/app/server/server.js:1905:13)
    at process.emit (node:events:517:28)
    at emit (node:internal/process/promises:149:20)
    at processPromiseRejections (node:internal/process/promises:283:27)
    at processTicksAndRejections (node:internal/process/task_queues:96:32)
    at runNextTicks (node:internal/process/task_queues:64:3)
    at listOnTimeout (node:internal/timers:538:9)
    at process.processTimers (node:internal/timers:512:7)
If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
2024-08-05T13:24:34-05:00 [MONITOR] WARN: Monitor #2 'Uptime Kuma': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? | Max retries: 2 | Retry: 1 | Retry Interval: 60 seconds | Type: http

Additionally here are some screenshots of my grafana dash:

CPU Usage

image

Memory Usage

image

Memory Usage Detail:

image

To my untrained eye, it appears that upon container restart (I restart all my containers daily for backup and to try to avoid memory leaks from containers) it began using a weirdly high amount of CPU (20% of my i5), which it continued to use for ~2.5 hrs until it increased from .5gb RAM usage to 17gb in 2 minutes. This is where my system locked up/ crashed. Is there any way to add a hard ram limit for chromium, I would not expect even a power user to want to allocate more then 4-8 gb of RAM to uptime kuma.