louislam / uptime-kuma

A fancy self-hosted monitoring tool
https://uptime.kuma.pet
MIT License
59.45k stars 5.32k forks source link

Monitors are stopped after `database is locked` error #3130

Closed deluxghost closed 1 year ago

deluxghost commented 1 year ago

⚠️ Please verify that this bug has NOT been raised before.

πŸ›‘οΈ Security Policy

Description

I have updated uptime kuma to the latest version recently. I use push monitors only, they are all created in an old version.

One day my push clients are offline for about 1 hour because my local firewall, but when I check the uptime dashboard, there is no DOWN record at all.

The heartbeat intervals of monitors are 60s/30s an retry is 5/3, they should go down in 5mins normally without any push request.

the chart of affected monitors looks like this:

image

I stopped the push client for about 10 mins, no request in the period. In chart, there is no data in the period too but it's green all the time.

Editing an affected monitor without actually changing any settings and saving directly will fix this issue for that one monitor. (Mumble in the logs below)

New monitor (test in the logs below) works well

πŸ‘Ÿ Reproduction steps

  1. updated from an old version with push monitors (I cant remember the actual version, sorry)
  2. stop the push client
  3. wait until the monitor should go down
  4. it's not going down
  5. start the push client
  6. you can see the missing data in chart, but no down event

πŸ‘€ Expected behavior

push monitors from old version should work as expected (going down without push requests) without re-saving

πŸ˜“ Actual Behavior

such monitors never going down

🐻 Uptime-Kuma Version

1.21.2

πŸ’» Operating System and Arch

Ubuntu 20.04.5 LTS

🌐 Browser

Chrome 112.0.5615.49

πŸ‹ Docker Version

20.10.18

🟩 NodeJS Version

No response

πŸ“ Relevant log output

uptime-kuma  | 2023-05-06T03:14:01+08:00 [PLUGIN] WARN: Warning: In order to enable plugin feature, you need to use the default data directory: ./data/
uptime-kuma  | 2023-05-06T03:14:01+08:00 [DB] INFO: Data Dir: data/
uptime-kuma  | 2023-05-06T03:14:02+08:00 [DB] INFO: SQLite config:
uptime-kuma  | [ { journal_mode: 'wal' } ]
uptime-kuma  | [ { cache_size: -12000 } ]
uptime-kuma  | 2023-05-06T03:14:02+08:00 [DB] INFO: SQLite Version: 3.41.1
uptime-kuma  | {
uptime-kuma  |   name: 'clear-old-data',
uptime-kuma  |   message: 'Clearing Data older than 180 days...'
uptime-kuma  | }
uptime-kuma  | { name: 'clear-old-data', message: 'done' }
uptime-kuma  | 2023-05-06T14:52:29+08:00 [AUTH] INFO: Login by token. IP=#############
uptime-kuma  | 2023-05-06T14:52:29+08:00 [AUTH] INFO: Username from JWT: #############
uptime-kuma  | 2023-05-06T14:52:29+08:00 [AUTH] INFO: Successfully logged in user #############. IP=#############
uptime-kuma  | 2023-05-06T16:05:55+08:00 [AUTH] INFO: Login by token. IP=#############
uptime-kuma  | 2023-05-06T16:05:55+08:00 [AUTH] INFO: Username from JWT: #############
uptime-kuma  | 2023-05-06T16:05:55+08:00 [AUTH] INFO: Successfully logged in user #############. IP=#############
uptime-kuma  | 2023-05-06T16:07:02+08:00 [MONITOR] INFO: Get Monitor: 12 User ID: 1
uptime-kuma  | 2023-05-06T16:08:30+08:00 [AUTH] INFO: Login by token. IP=#############
uptime-kuma  | 2023-05-06T16:08:30+08:00 [AUTH] INFO: Username from JWT: #############
uptime-kuma  | 2023-05-06T16:08:30+08:00 [AUTH] INFO: Successfully logged in user #############. IP=#############
uptime-kuma  | 2023-05-06T16:08:37+08:00 [MONITOR] INFO: Get Monitor: 12 User ID: 1
uptime-kuma  | 2023-05-06T16:09:18+08:00 [MANAGE] INFO: Resume Monitor: 28 User ID: 1
uptime-kuma  | 2023-05-06T16:09:18+08:00 [MONITOR] INFO: Added Monitor: undefined User ID: 1
uptime-kuma  | 2023-05-06T16:09:27+08:00 [MONITOR] INFO: Get Monitor: 28 User ID: 1
uptime-kuma  | 2023-05-06T16:09:38+08:00 [MONITOR] WARN: Monitor #28 'test': Pending: No heartbeat in the time window | Max retries: 1 | Retry: 1 | Retry Interval: 20 seconds | Type: push
uptime-kuma  | 2023-05-06T16:10:06+08:00 [MONITOR] WARN: Monitor #28 'test': Pending: No heartbeat in the time window | Max retries: 1 | Retry: 1 | Retry Interval: 20 seconds | Type: push
uptime-kuma  | 2023-05-06T16:10:08+08:00 [MONITOR] INFO: Get Monitor: 12 User ID: 1
uptime-kuma  | 2023-05-06T16:10:26+08:00 [MONITOR] WARN: Monitor #28 'test': Failing: No heartbeat in the time window | Interval: 20 seconds | Type: push | Down Count: 0 | Resend Interval: 0
uptime-kuma  | 2023-05-06T16:10:34+08:00 [MANAGE] INFO: Delete Monitor: 28 User ID: 1
uptime-kuma  | 2023-05-06T16:10:38+08:00 [MONITOR] INFO: Get Monitor: 12 User ID: 1
uptime-kuma  | 2023-05-06T16:10:43+08:00 [MANAGE] INFO: Resume Monitor: 12 User ID: 1
uptime-kuma  | 2023-05-06T16:10:43+08:00 [MONITOR] INFO: Get Monitor: 12 User ID: 1
uptime-kuma  | 2023-05-06T16:11:13+08:00 [MONITOR] WARN: Monitor #12 'Mumble': Pending: No heartbeat in the time window | Max retries: 3 | Retry: 1 | Retry Interval: 30 seconds | Type: push
uptime-kuma  | 2023-05-06T16:11:43+08:00 [MONITOR] WARN: Monitor #12 'Mumble': Pending: No heartbeat in the time window | Max retries: 3 | Retry: 2 | Retry Interval: 30 seconds | Type: push
uptime-kuma  | 2023-05-06T16:12:07+08:00 [MONITOR] INFO: Get Monitor: 8 User ID: 1
uptime-kuma  | 2023-05-06T16:12:12+08:00 [MONITOR] INFO: Get Monitor: 26 User ID: 1
uptime-kuma  | 2023-05-06T16:12:13+08:00 [MONITOR] WARN: Monitor #12 'Mumble': Pending: No heartbeat in the time window | Max retries: 3 | Retry: 3 | Retry Interval: 30 seconds | Type: push
uptime-kuma  | 2023-05-06T16:12:24+08:00 [MONITOR] INFO: Get Monitor: 14 User ID: 1
uptime-kuma  | 2023-05-06T16:12:44+08:00 [MONITOR] WARN: Monitor #12 'Mumble': Failing: No heartbeat in the time window | Interval: 30 seconds | Type: push | Down Count: 0 | Resend Interval: 0
uptime-kuma  | 2023-05-06T16:13:16+08:00 [MONITOR] INFO: Get Monitor Beats: 11 User ID: 1
uptime-kuma  | 2023-05-06T16:13:33+08:00 [MONITOR] INFO: Get Monitor Beats: 11 User ID: 1
uptime-kuma  | 2023-05-06T16:14:00+08:00 [AUTH] INFO: Login by token. IP=#############
uptime-kuma  | 2023-05-06T16:14:00+08:00 [AUTH] INFO: Username from JWT: #############
uptime-kuma  | 2023-05-06T16:14:00+08:00 [AUTH] INFO: Successfully logged in user #############. IP=#############
uptime-kuma  | 2023-05-06T16:45:11+08:00 [MONITOR] INFO: Get Monitor Beats: 11 User ID: 1
uptime-kuma  | 2023-05-06T16:45:14+08:00 [MONITOR] INFO: Get Monitor Beats: 11 User ID: 1
uptime-kuma  | 2023-05-06T16:56:42+08:00 [MONITOR] INFO: Get Monitor: 11 User ID: 1
chakflying commented 1 year ago

Does the issue reappear if you restart the server?

deluxghost commented 1 year ago

Does the issue reappear if you restart the server?

I think I restarted the server during the updating process... but yeah, restarting the server seems to fix this issue

deluxghost commented 1 year ago

Does the issue reappear if you restart the server?

weird, I think I have already restarted the server, but when I shutted down one of my node yesterday, the issue is still there. All services on the node should go down but not

image

chakflying commented 1 year ago

Unfortunately a lot of the useful logging are hidden unless you run with NODE_ENV=development. Is this the same monitor that you previously "Saved without changing", or is this another one?

I thought about this but there really shouldn't be any database changes that would affect the behavior of the push monitor.

deluxghost commented 1 year ago

Ah, so I need to add the variable to my compose file to get more info. I will do more test later

deluxghost commented 1 year ago

Looks like it might not be about the version. I have enabled debug logs, and noticed some of the monitors gone after these errors:

^^^ There are more same errors above ^^^

uptime-kuma  | Trace: [Error: insert into `heartbeat` (`down_count`, `duration`, `important`, `monitor_id`, `msg`, `status`, `time`) values (0, 61, false, 8, 'No heartbeat in the time window', 2, '2023-05-17 19:14:33.005') - SQLITE_BUSY: database is locked] {
uptime-kuma  |   errno: 5,
uptime-kuma  |   code: 'SQLITE_BUSY'
uptime-kuma  | }
uptime-kuma  |     at process.<anonymous> (/app/server/server.js:1804:13)
uptime-kuma  |     at process.emit (node:events:513:28)
uptime-kuma  |     at emit (node:internal/process/promises:140:20)
uptime-kuma  |     at processPromiseRejections (node:internal/process/promises:274:27)
uptime-kuma  |     at processTicksAndRejections (node:internal/process/task_queues:97:32)
uptime-kuma  | If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
uptime-kuma  | Trace: [Error: insert into `heartbeat` (`down_count`, `duration`, `important`, `monitor_id`, `msg`, `status`, `time`) values (0, 62, false, 10, 'No heartbeat in the time window', 2, '2023-05-17 19:14:34.030') - SQLITE_BUSY: database is locked] {
uptime-kuma  |   errno: 5,
uptime-kuma  |   code: 'SQLITE_BUSY'
uptime-kuma  | }
uptime-kuma  |     at process.<anonymous> (/app/server/server.js:1804:13)
uptime-kuma  |     at process.emit (node:events:513:28)
uptime-kuma  |     at emit (node:internal/process/promises:140:20)
uptime-kuma  |     at processPromiseRejections (node:internal/process/promises:274:27)
uptime-kuma  |     at processTicksAndRejections (node:internal/process/task_queues:97:32)
uptime-kuma  | If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
uptime-kuma  | Trace: [Error: insert into `heartbeat` (`down_count`, `duration`, `important`, `monitor_id`, `msg`, `status`, `time`) values (0, 62, false, 1, 'No heartbeat in the time window', 2, '2023-05-17 19:14:34.033') - SQLITE_BUSY: database is locked] {
uptime-kuma  |   errno: 5,
uptime-kuma  |   code: 'SQLITE_BUSY'
uptime-kuma  | }
uptime-kuma  |     at process.<anonymous> (/app/server/server.js:1804:13)
uptime-kuma  |     at process.emit (node:events:513:28)
uptime-kuma  |     at emit (node:internal/process/promises:140:20)
uptime-kuma  |     at processPromiseRejections (node:internal/process/promises:274:27)
uptime-kuma  |     at processTicksAndRejections (node:internal/process/task_queues:97:32)
uptime-kuma  | If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
uptime-kuma  | 2023-05-18T03:14:56+08:00 [UPTIMECACHELIST] DEBUG: clearCache: 10
uptime-kuma  | 2023-05-18T03:14:56+08:00 [MONITOR] DEBUG: No clients in the room, no need to send stats
uptime-kuma  | { name: 'clear-old-data', message: 'done' }
uptime-kuma  | 2023-05-18T03:14:56+08:00 [UPTIMECACHELIST] DEBUG: clearCache: 1
uptime-kuma  | 2023-05-18T03:14:56+08:00 [MONITOR] DEBUG: No clients in the room, no need to send stats
uptime-kuma  | 2023-05-18T03:14:56+08:00 [ROUTER] DEBUG: /api/push/ called at 2023-05-18 03:14:56.428

I'm using docker local bind (/local/path:/app/data), my db preserve setting is 180 days, and current db size is about 1400M

I have seen another database is locked error before, but the log messages are different:

uptime-kuma  | Trace: [Error: insert into `heartbeat` (`down_count`, `duration`, `important`, `monitor_id`, `msg`, `status`, `time`) values (0, 61, false, 21, 'No heartbeat in the time window', 0, '2023-05-16 19:14:25.968') - SQLITE_BUSY: database is locked] {
uptime-kuma  |   errno: 5,
uptime-kuma  |   code: 'SQLITE_BUSY'
uptime-kuma  | }
uptime-kuma  |     at Timeout.safeBeat [as _onTimeout] (/app/server/model/monitor.js:811:25)
uptime-kuma  | 2023-05-17T03:14:35+08:00 [MONITOR] INFO: Try to restart the monitor

one of the message mentioned the monitor is restarted, another message didn't

after the batch of errors, I haven't seen the Checking monitor at <date time> of most of the monitors

deluxghost commented 1 year ago

The two error handlers:

https://github.com/louislam/uptime-kuma/blob/9dc02bb8e26c0086d73d6032672747e337d00a79/server/server.js#L1802-L1807

https://github.com/louislam/uptime-kuma/blob/9dc02bb8e26c0086d73d6032672747e337d00a79/server/model/monitor.js#L809-L822

deluxghost commented 1 year ago

I have also encountered the KnexTimeoutError issue, maybe waiting for 2.0 with external db is a better solution

chakflying commented 1 year ago

Those errors will show up without setting debug logging. If you haven't seen them before, it maybe because printing the debug logs increased server load too much.

deluxghost commented 1 year ago

The log was too long so I didn't find them before starting checking Checking monitor at... logs

chakflying commented 1 year ago

Thank you for your investigation. This is a separate issue and a fix is now open in #3174.