louislam / uptime-kuma

A fancy self-hosted monitoring tool
https://uptime.kuma.pet
MIT License
59.76k stars 5.34k forks source link

No data and no alerts if site goes down #1219

Closed katsil closed 9 months ago

katsil commented 2 years ago

⚠️ Please verify that this bug has NOT been raised before.

πŸ›‘οΈ Security Policy

Description

At some point in time, my monitoring stops checking the availability of the site. This is expressed in the fact that it "freezes" and displays the latest data for a certain period of time, the same thing happens when I use the prometheus exporter.

πŸ‘Ÿ Reproduction steps

This also happened on version 1.1 (before switching to the new one), I even put monitoring on a separate server, but the problem still repeats. I have about 15-20 HTTP checks, and they all don't work

πŸ‘€ Expected behavior

Operational monitoring and display of correct data

πŸ˜“ Actual Behavior

At some point in time, monitoring "freezes" and it's not clear to me how to fix it. This happens until I stop/enable monitoring or restart the server with monitoring. Here is some screenshots:

how it looks like from uptime-kuma:

if 24h:

grafana:

So as you can see it stopped posting status after 00:00 24.01.22

How to fix this? My monitoring VM have about 100Gb NVMe space free

🐻 Uptime-Kuma Version

1.11.3

πŸ’» Operating System and Arch

Ubuntu 18.04

🌐 Browser

Safari

πŸ‹ Docker Version

No response

🟩 NodeJS Version

No response

πŸ“ Relevant log output

No response

katsil commented 2 years ago

Again i see error, after restart

chakflying commented 2 years ago

Are there any logs in the server output?

katsil commented 2 years ago

Here is logs from docker container, maybe you can tell me where to find some other debug logs?

https://pb0.superhub.xyz/?fdff20a2a4ba2348#KNj9yiwtEjvqlqkkBtTBBJIqUpFOwLQok1xJvYnRCjw=

katsil commented 2 years ago

also error.log inside container:

[2022-01-20 06:49:43] KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:305:26)
    at runMicrotasks (<anonymous>)
    at runNextTicks (internal/process/task_queues.js:60:5)
    at listOnTimeout (internal/timers.js:526:9)
    at processTimers (internal/timers.js:500:7)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:259:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.storeCore (/app/node_modules/redbean-node/dist/redbean-node.js:166:26)
    at async RedBeanNode.store (/app/node_modules/redbean-node/dist/redbean-node.js:126:20)
    at async beat (/app/server/model/monitor.js:417:13) {
  sql: undefined,
  bindings: undefined
}
[2022-01-20 06:49:43] KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:305:26)
    at runMicrotasks (<anonymous>)
    at runNextTicks (internal/process/task_queues.js:60:5)
    at processTimers (internal/timers.js:497:9)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:259:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.storeCore (/app/node_modules/redbean-node/dist/redbean-node.js:166:26)
    at async RedBeanNode.store (/app/node_modules/redbean-node/dist/redbean-node.js:126:20)
    at async beat (/app/server/model/monitor.js:417:13)
    at async Timeout.safeBeat [as _onTimeout] (/app/server/model/monitor.js:443:17) {
  sql: undefined,
  bindings: undefined
}
katsil commented 2 years ago

Hey, guys, any news please?

chakflying commented 2 years ago

Relevant previous discussion in #218. Unfortunately it's a generic database connection error and there isn't much to go on.

katsil commented 2 years ago

generic database connection error

But im using native sqlite database inside docker container, how it may be error with connecting to database?

louislam commented 2 years ago

It may causes by a busy database.

The monitor should be restarted if there is any error in general. Unfortunately, I don't know why, most Knex errors are not catch by try-catch.

CommanderStorm commented 9 months ago

v1.23.X included some improvements in the direction of using incremental_vaccum => improving the situation.

A lot of performance improvements (using aggregated vs non-aggregated tables to store heartbeats, enabling users to choose mariadb as a db-backend, pagination of important events) have been made in v2.0 (our next release) resolvingℒ️ this problem-area. => I'm going to close this issue

You can subscribe to our releases and get notified when a new release (such as v2.0-beta.0) gets made. See https://github.com/louislam/uptime-kuma/pull/4171 for the bugs that need addressing before that can happen.

Meanwhile (the issue is with SQLite not reading data fast enough to keep up):