louislam / uptime-kuma

A fancy self-hosted monitoring tool
https://uptime.kuma.pet
MIT License
59.67k stars 5.33k forks source link

Down monitor notification is delayed and not immediately sent #3410

Open bbarman4u opened 1 year ago

bbarman4u commented 1 year ago

⚠️ Please verify that this bug has NOT been raised before.

🛡️ Security Policy

Description

Use case: Email Notification should be sent immediately after a monitor is detected as being down if there is no retry set. if there is a retry interval set, the notification should be immediately sent after the monitor is down and the retry interval and attempt has passed. The behavior was also observed for MS Teams notifications.

Example scenario 1 with no retry:

Example scenario 1 with one retry:

👟 Reproduction steps

See above.

👀 Expected behavior

Email Notification should be sent immediately after a monitor is detected as being down if there is no retry set. if there is a retry interval set, the notification should be immediately sent after the monitor is down and the retry interval and attemp has passed.

😓 Actual Behavior

Email notification is delayed by the amount of the heart beat interval setting. Note: verified with MS teams notification as well and similar behavior is happening.

🐻 Uptime-Kuma Version

1.22.1

💻 Operating System and Arch

debian docker

🌐 Browser

Google Chrome

🐋 Docker Version

19.0.35

🟩 NodeJS Version

NA

📝 Relevant log output

NA
CommanderStorm commented 1 year ago

Could this be your email provider delaying your message to do spam filtering? (F.ex. my provider silently drops send emails and only accepts them on retries to reduce spam)

bbarman4u commented 1 year ago

Please see some of the details requested.

[!NOTE] The UP messages are coming pretty consistent with when it was found to be up. Example: we received email at 3:37 PM CDT with the following message body.

[******] [✅ Up] 200 - Time (America/Chicago): 2023-07-12 15:36:49
CommanderStorm commented 1 year ago

Possibly, this is a duplicate of https://github.com/louislam/uptime-kuma/issues/3058 which will be fixed by https://github.com/louislam/uptime-kuma/pull/3072 in 2.0 (for further details on said release, read this https://github.com/louislam/uptime-kuma/pull/2720)

Timing is close enough to fit imo, but not as close to be directly attributable. @chakflying Given that you did debug this previously, do you think this could be another symptom?

@bbarman4u What caused the timeout? Could this be reproduced? (for a testcase)

bbarman4u commented 1 year ago

Yes this issue is reproducible on many of the http(s) alert monitors we have set up.

I am assuming for a local testing, one can update the monitored url to a URL that responds slowly in about 1 min and simulate a down time that triggers an alert notification. I will try to find a publicly available url that can be used to simulate this use case and report back.

bbarman4u commented 1 year ago

Here is an example website which seems to be slow enough that can be used for reproducing the use cases outlined above that produces the delay in notifications: https://api.instantwebtools.net/v1/airlines

louislam commented 1 year ago

Actual: Email is sent after 6 minutes of the monitor being down.

There is no delay logic behind this. Uptime Kuma should immediately send out the notification.

And I cannot reproduce:

bbarman4u commented 1 year ago

@louislam I see from your example, the monitor was a SSH port check. Do you think the issue is more around the http(s) or keyword monitors that hit a website or api which takes longer to respond? I have tested the use cases where if the failed response is immediately received without a long delay for some reason the notification works fine.

bbarman4u commented 1 year ago

Here are screenshots of latest test with the slow website:

louislam commented 1 year ago

URL - https://api.instantwebtools.net/v1/airlines

This website is indeep very slow. It took 59 seconds to get the response. I think that is the problem. Uptime Kuma have to wait for the response before sending out the notification, but the response is too long.

Since the timeout is not configurable. By default, the timeout value is (interval * 0.8).

Your use case could be resolved by this pr.

louislam commented 1 year ago

I also checked what exactly is the "time". It is actually the start time of the check, not the end time. Since the response is really long, the time gap is also very big in this case.

https://github.com/louislam/uptime-kuma/blob/ac68a35d3a43c9e1abce0790080507040cc14302/server/model/monitor.js#L274

bbarman4u commented 1 year ago

@louislam Thank you for providing more details on the background and possible options to mitigate this.

CommanderStorm commented 5 months ago

@louislam you added this to the v2.0 milestone, but I don't know what the "bug" here is. => Think this is resolved by https://github.com/louislam/uptime-kuma/pull/2142, or am I missing something?

kaystrobach commented 2 months ago

I'd opt to have the waiting / resend options per notification per sensor.

This would make so much more sense. Sometimes you want to inform the devops team instantly. But the CIO / CEO only, when the downtime is 30+min ...

CommanderStorm commented 2 months ago

@kaystrobach escalation is a completely different issue altogether..

kaystrobach commented 2 months ago

@kaystrobach escalation is a completely different issue altogether..

Would not call it escalation yet, escalation implies no reaction before, this was not part of my request. Just informing more people if the problem persists.