louislam / uptime-kuma

A fancy self-hosted monitoring tool
https://uptime.kuma.pet
MIT License
59.76k stars 5.34k forks source link

Timeout of notification services causes monitor to freeze #4058

Open shalak opened 12 months ago

shalak commented 12 months ago

⚠️ Please verify that this bug has NOT been raised before.

🛡️ Security Policy

Description

~getaddrinfo ENOTFOUND errors make the monitors stop working~

It appears that the TCP timeout of signal notification service is causing this.

👟 Reproduction steps

When my local DNS service is down, all my monitors turn red and show errors - as expected:

getaddrinfo ENOTFOUND myhost.mydomain.com

At this stage, the signal REST API is called to send a notification. Due to a bug in the API the notification call times out, even though the notification itself is received.

At this sage, all the monitors that are configured to notify via Signal are stuck in "down". Even restarting the uptime-kuma docker container doesn't help. What helps is either

👀 Expected behavior

Monitor is not frozen after notification alert service times out.

😓 Actual Behavior

Monitor is frozen after notification alert service times out.

🐻 Uptime-Kuma Version

1.23.6

💻 Operating System and Arch

Ubuntu 22.04.3 LTS

🌐 Browser

Firefox 119.0 (64-bit)

🐋 Docker Version

Docker version 24.0.7, build afdd53b

🟩 NodeJS Version

No response

📝 Relevant log output

No response

shalak commented 12 months ago

I just noticed something, I do not know if it's relevant, but I have two alert services:

It appears that only the monitors that have signal alerting are affected. The monitor that has email notification (i.e. signal REST API itself in my case) seems to be re-starting itself...

shalak commented 12 months ago

Huh, it may not at all be related to ENOTFOUND issue, but to the fact that the Signal REST API has a timeout bug:

https://github.com/bbernhard/signal-cli-rest-api/issues/453

So - the monitor detects an issue, calls signal REST API to send a message, the message is received, however the POST itself times out, thus hanging the monitor.

I've edited the original bug report description.

lyup commented 12 months ago

So - the monitor detects an issue, calls signal REST API to send a message, the message is received, however the POST itself times out, thus hanging the monitor.

Ran into this exactly today. Wasn't sure what was going on until I read this. I debugged a bit and found what was happening is

louislam commented 12 months ago

sendNotificatin can sometimes timeout or never returns https://github.com/louislam/uptime-kuma/blob/master/server/model/monitor.js#L952

Good point, I think we should not add await before Monitor.sendNotification(isFirstBeat, this, bean);.