louislam / uptime-kuma

A fancy self-hosted monitoring tool
https://uptime.kuma.pet
MIT License
56.05k stars 5.04k forks source link

Unable to start monitor with KnexTimeoutError: Knex: Timeout acquiring a connection. #4153

Closed lnagel closed 7 months ago

lnagel commented 9 months ago

⚠️ Please verify that this bug has NOT been raised before.

πŸ›‘οΈ Security Policy

Description

Startup crash with Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?

πŸ‘Ÿ Reproduction steps

Start container with Uptime Kuma, log in and wait for data to load on the dashboard. Check logs for errors.

πŸ‘€ Expected behavior

Dashboard would load, monitor checks would be run, log not full of errors.

πŸ˜“ Actual Behavior

Dashboards do not load any data, monitor checks are not being run, log is full of errors.

🐻 Uptime-Kuma Version

1.23.7

πŸ’» Operating System and Arch

MikroTik RouterOS 7.11.2

🌐 Browser

Firefox latest

πŸ‹ Docker Version

No response

🟩 NodeJS Version

18.18.2

πŸ“ Relevant log output

Welcome to Uptime Kuma 
Your Node.js version: 18.18.2 
2023-12-02T07:28:50Z [SERVER] INFO: Welcome to Uptime Kuma 
2023-12-02T07:28:50Z [SERVER] INFO: Node Env: production 
2023-12-02T07:28:50Z [SERVER] INFO: Inside Container: true 
2023-12-02T07:28:50Z [SERVER] INFO: Importing Node libraries 
2023-12-02T07:28:50Z [SERVER] INFO: Importing 3rd-party libraries 
2023-12-02T07:29:55Z [SERVER] INFO: Creating express and socket.io instance 
2023-12-02T07:29:55Z [SERVER] INFO: Server Type: HTTP 
2023-12-02T07:29:55Z [SERVER] INFO: Importing this project modules 
2023-12-02T07:29:56Z [NOTIFICATION] INFO: Prepare Notification Providers 
2023-12-02T07:29:56Z [SERVER] INFO: Version: 1.23.7 
2023-12-02T07:29:56Z [DB] INFO: Data Dir: ./data/ 
2023-12-02T07:29:56Z [SERVER] INFO: Connecting to the Database 
2023-12-02T07:29:57Z [DB] INFO: SQLite config: 
[ { journal_mode: 'wal' } ] 
[ { cache_size: -12000 } ] 
2023-12-02T07:29:57Z [DB] INFO: SQLite Version: 3.41.1 
2023-12-02T07:29:57Z [SERVER] INFO: Connected 
2023-12-02T07:29:57Z [DB] INFO: Your database version: 10 
2023-12-02T07:29:57Z [DB] INFO: Latest database version: 10 
2023-12-02T07:29:57Z [DB] INFO: Database patch not needed 
2023-12-02T07:29:57Z [DB] INFO: Database Patch 2.0 Process 
2023-12-02T07:29:57Z [SERVER] INFO: Load JWT secret from database. 
2023-12-02T09:29:57+02:00 [SERVER] INFO: Adding route 
2023-12-02T09:29:57+02:00 [SERVER] INFO: Adding socket handler 
2023-12-02T09:29:57+02:00 [SERVER] INFO: Init the server 
2023-12-02T09:29:57+02:00 [SERVER] INFO: Listening on 3001 
2023-12-02T09:29:57+02:00 [SERVICES] INFO: Starting nscd 
2023-12-02T09:30:07+02:00 [AUTH] INFO: Login by token. IP=10.20.0.20 
2023-12-02T09:30:07+02:00 [AUTH] INFO: Username from JWT: lenno 
2023-12-02T09:30:07+02:00 [AUTH] INFO: Successfully logged in user lenno. IP=10.20.0.20 
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? 
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26) 
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28) 
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19) 
    at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:572:22) 
    at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:558:22) 
    at async Monitor.getParent (/app/server/model/monitor.js:1587:16) 
    at async Monitor.isParentActive (/app/server/model/monitor.js:1670:24) 
    at async Monitor.isParentActive (/app/server/model/monitor.js:1676:30) 
    at async Proxy.toJSON (/app/server/model/monitor.js:106:29) 
    at async UptimeKumaServer.getMonitorJSONList (/app/server/uptime-kuma-server.js:150:34) { 
  sql: '\n' + 
    '            SELECT parent.* FROM monitor parent\n' + 
    '    \t\tLEFT JOIN monitor child\n' + 
    '    \t\t\tON child.parent = parent.id\n' + 
    '            WHERE child.id = ?\n' + 
    '        ', 
  bindings: [ 64 ] 
} 
    at process.unexpectedErrorHandler (/app/server/server.js:1895:13) 
    at process.emit (node:events:517:28) 
    at emit (node:internal/process/promises:149:20) 
    at processPromiseRejections (node:internal/process/promises:283:27) 
    at process.processTicksAndRejections (node:internal/process/task_queues:96:32) 
If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues 
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? 
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26) 
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28) 
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19) 
    at async RedBeanNode.findOne (/app/node_modules/redbean-node/dist/redbean-node.js:499:19) 
    at async beat (/app/server/model/monitor.js:337:32) 
    at async safeBeat (/app/server/model/monitor.js:1026:17) { 
  sql: undefined, 
  bindings: undefined 
} 
    at safeBeat (/app/server/model/monitor.js:1028:25) 
2023-12-02T09:32:07+02:00 [MONITOR] ERROR: Please report to https://github.com/louislam/uptime-kuma/issues 

///--cut--//

2023-12-02T09:32:37+02:00 [MONITOR] ERROR: Please  SUM all duration, also trim off the beat out of time windowreport to https://github.com/louislam/uptime-kuma/issues 
2023-12-02T09:32:37+02:00 [MONITOR] INFO: Try to restart the monitor 
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? 
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26) 
    at runNextTicks (node:internal/process/task_queues:60:5) 
    at process.processTimers (node:internal/timers:509:9) 
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28) 
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19) 
    at async RedBeanNode.findOne (/app/node_modules/redbean-node/dist/redbean-node.js:499:19) 
    at async beat (/app/server/model/monitor.js:337:32) 
    at async safeBeat (/app/server/model/monitor.js:1026:17) { 
  sql: undefined, 
  bindings: undefined 
} 
    at safeBeat (/app/server/model/monitor.js:1028:25) 
    at runNextTicks (node:internal/process/task_queues:60:5) 
    at process.processTimers (node:internal/timers:509:9) 
2023-12-02T09:32:37+02:00 [MONITOR] ERROR: Please report to https://github.com/louislam/uptime-kuma/issues 
2023-12-02T09:32:37+02:00 [MONITOR] INFO: Try to restart the monitor 
chakflying commented 9 months ago

Considering you are using a router, it's likely you are using a USB attached storage? It's likely that the database has gotten too large that the increased IO latency is too much for the application to handle.

lnagel commented 9 months ago

Considering you are using a router, it's likely you are using a USB attached storage? It's likely that the database has gotten too large that the increased IO latency is too much for the application to handle.

It was running USB attached storage, but it's a fast drive (DataTraveler Max USB 3.2 Gen 2), it wasn't even utilizing the disk too much.

I moved the container with its data folder to a 8th gen Intel NUC running the latest Docker on a dedicated NVME drive hosting the docker filesystem. The issue persists unfortunately.

Is there any advice how to reset the app's accumulated data history without losing all 89 configured monitors? I am mostly using HTTP, PING and MQTT checks.

chakflying commented 9 months ago

That's very strange.

If you have worked with databases before, you can stop the application, and open the database file data.db in any SQLite compatible application (dbeaver, beekeeper, etc). You can then manually delete old rows in the heartbeat table.

ilogus commented 9 months ago

Hello, I seem to have a similar problem.

This morning, without having received any particular notification, I saw that all the monitors had a problem during the night, in the docker logs, the line Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? is present on all monitors.

I'm on a VPS running AlmaLinux 8.9 (Midnight Oncilla) with a 4.18.0-513.5.1.el8_9.x86_64 kernel. Uptime kuma version: 1.23.8 and Docker version 24.0.7, build afdd53b

Regards,

lnagel commented 9 months ago

That's very strange.

If you have worked with databases before, you can stop the application, and open the database file data.db in any SQLite compatible application (dbeaver, beekeeper, etc). You can then manually delete old rows in the heartbeat table.

Thanks. I did the following which reduced the size of the database from 297M down to 252K.

root@nuc:/var/lib/docker/volumes/uptime-kuma_data/_data# sqlite3 kuma.db 
SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
sqlite> delete from heartbeat;
sqlite> vacuum;
sqlite> 

At least uptime-kuma will start up now, no errors in the logs so far, let's see......

Changed "Keep monitor history data" from 180 days to 7 days. Still, I installed it about 3 weeks ago at most.

gering commented 8 months ago

I see a very similar error, running in docker on my Synology NAS:

2023-12-30T12:49:44+01:00 [MONITOR] INFO: Try to restart the monitor
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:572:22)
    at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:558:22)
    at async RedBeanNode.getCell (/app/node_modules/redbean-node/dist/redbean-node.js:593:19)
    at async Settings.get (/app/server/settings.js:54:21)
    at async exports.setting (/app/server/util-server.js:610:12)
    at async /app/server/server.js:199:13 {
  sql: 'SELECT `value` FROM setting WHERE `key` = ?  limit ?',
  bindings: [ 'trustProxy', 1 ]
}
    at process.unexpectedErrorHandler (/app/server/server.js:1899:13)
    at process.emit (node:events:517:28)
    at emit (node:internal/process/promises:149:20)
    at processPromiseRejections (node:internal/process/promises:283:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:96:32)
If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
sethvoltz commented 7 months ago

Piling on: I have also been seeing this issue for over a year when I am moving around the app a lot (looking across monitors, editing the dashboards) but it usually self-resolves after a couple minutes of letting things catch up. Today it has not resolved and lead me to this issue.

Edit 1: Adding current DB size stats via the backups before clearing out the heartbeat table as suggested by @lnagel

507M Feb 11 21:55 kuma.db
 60K Oct 17  2021 kuma.db.bak0
6.6M Oct 26  2021 kuma.db.bak20211026165215
 10M Oct 31  2021 kuma.db.bak20211031220900
115M Jan  3  2022 kuma.db.bak20220103141319
211M May 27  2022 kuma.db.bak20220527215144
211M Jul  9  2022 kuma.db.bak20220709110215
211M Sep 11  2022 kuma.db.bak20220911174428
410M Dec 31  2022 kuma.db.bak20221231132150
452M Feb 18  2023 kuma.db.bak20230218182325
484M Mar 22  2023 kuma.db.bak20230322220702

Edit 2: The truncate was taking longer than expected so I killed it and did a select count(*) from heartbeat; and there were still 3.23M records in the table. I have kicked it off again but that does seem awfully high for 12 monitors with 2-5m checks.

Edit 3: Yep, the truncate worked and everything about the app is snappy again. Post-vacuum the table is also much more reasonable:

568K Feb 11 22:33 kuma.db
chakflying commented 7 months ago

We implemented incremental_vacuum in 1.23 which should have mitigated this issue. Can you check if you are running the latest version, and if so, are there items in the logs that indicate the incremental_vacuum task has failed?

sethvoltz commented 7 months ago

@chakflying just confirmed, 1.23.11. I have docker image update monitors, so I would have updated within days to a week from release.

sethvoltz commented 7 months ago

I think the problem is less about vacuum and more that the heartbeat table was huge at >3 million records for a small number of monitors.

CommanderStorm commented 7 months ago

What is your retention time in the settings set to?

sethvoltz commented 7 months ago

@CommanderStorm Took me a few to find that as I'd never changed it. It was set to the default (I assume) of 180 days. I set it to 14 days just now to hopefully avoid this issue again

CommanderStorm commented 7 months ago

Okay. I am assuming your issue is the same @lnagel We know that lowering retention is not a good solution for the long term, but for 1.23.X that is everything that we can offer as a quick "remedy".

A lot of performance improvements (using aggregated vs non-aggregated tables to store heartbeats, enabling users to choose mariadb as a db-backend, pagination of important events) have been made in v2.0 (our next release) resolvingℒ️ this problem-area. => I'm going to close this issue

You can subscribe to our releases and get notified when a new release (such as v2.0-beta.0) gets made. See https://github.com/louislam/uptime-kuma/pull/4171 for the bugs that need addressing before that can happen.

Meanwhile (the issue is with SQLite not reading data fast enough to keep up):