Open bbarman4u opened 1 year ago
Could this be your email provider delaying your message to do spam filtering? (F.ex. my provider silently drops send emails and only accepts them on retries to reduce spam)
uptime-kuma
,Please see some of the details requested.
Is there something in the logs of uptime-kuma The logs are at info level, and are consistent with the timing we are seeing. If you can point me to any documentation on how to enable more debug logs to troubleshoot, I can probably dig into it. But here is an example of the timing reported in the logs when it supposedly detected the issue - (removed sensitive part of the information here such as hostname, ip, etc.)
2023-07-12T15:31:49-05:00 [MONITOR] WARN: Monitor #17 '*****': Failing: connect ETIMEDOUT *****:**** | Interval: 300 seconds | Type: http | Down Count: 0
when was the email send according to the headers of the email and The email I received at 3:32 PM shows in the message header a time stamp of 3:32 PM CDT which seems to be consistent with the above log message. So it is highly unlikely that the exchange server delayed it. In fact we don't have any delays with other emails from other type of notifications or monitoring services.
when should it have been send? From the email body message it looks like it detected the app was down at 3:29 PM CDT so it should have sent the notification at that time and not 3 mins later.
[****] [🔴 Down] connect ETIMEDOUT ******:*** Time (America/Chicago): 2023-07-12 15:29:35
[!NOTE] The
UP
messages are coming pretty consistent with when it was found to be up. Example: we received email at 3:37 PM CDT with the following message body.[******] [✅ Up] 200 - Time (America/Chicago): 2023-07-12 15:36:49
Possibly, this is a duplicate of https://github.com/louislam/uptime-kuma/issues/3058 which will be fixed by https://github.com/louislam/uptime-kuma/pull/3072 in 2.0 (for further details on said release, read this https://github.com/louislam/uptime-kuma/pull/2720)
Timing is close enough to fit imo, but not as close to be directly attributable. @chakflying Given that you did debug this previously, do you think this could be another symptom?
@bbarman4u What caused the timeout? Could this be reproduced? (for a testcase)
Yes this issue is reproducible on many of the http(s) alert monitors we have set up.
I am assuming for a local testing, one can update the monitored url to a URL that responds slowly in about 1 min and simulate a down time that triggers an alert notification. I will try to find a publicly available url that can be used to simulate this use case and report back.
Here is an example website which seems to be slow enough that can be used for reproducing the use cases outlined above that produces the delay in notifications: https://api.instantwebtools.net/v1/airlines
Actual: Email is sent after 6 minutes of the monitor being down.
There is no delay logic behind this. Uptime Kuma should immediately send out the notification.
And I cannot reproduce:
![](https://github.com/louislam/uptime-kuma/assets/1336778/dda69976-1b1d-4ac1-97fe-c278576bccd7)
![](https://github.com/louislam/uptime-kuma/assets/1336778/792388ec-1a24-406d-9e6c-cc5e161d7288)
![](https://github.com/louislam/uptime-kuma/assets/1336778/1b081545-0bd5-4272-bced-c220ce1031c2)
![](https://github.com/louislam/uptime-kuma/assets/1336778/82283a48-465d-4fb6-bd5a-f21ca48c7061)
@louislam I see from your example, the monitor was a SSH port check. Do you think the issue is more around the http(s) or keyword monitors that hit a website or api which takes longer to respond? I have tested the use cases where if the failed response is immediately received without a long delay for some reason the notification works fine.
Here are screenshots of latest test with the slow website:
![image](https://github.com/louislam/uptime-kuma/assets/20643454/6fbb0d84-37db-4db8-9618-cf242e1986a2)
![image](https://github.com/louislam/uptime-kuma/assets/20643454/a06ead54-e97c-4115-88fa-de2ae040922c)
This website is indeep very slow. It took 59 seconds to get the response. I think that is the problem. Uptime Kuma have to wait for the response before sending out the notification, but the response is too long.
Since the timeout is not configurable. By default, the timeout value is (interval * 0.8).
Your use case could be resolved by this pr.
I also checked what exactly is the "time". It is actually the start time of the check, not the end time. Since the response is really long, the time gap is also very big in this case.
@louislam Thank you for providing more details on the background and possible options to mitigate this.
@louislam you added this to the v2.0 milestone, but I don't know what the "bug" here is. => Think this is resolved by https://github.com/louislam/uptime-kuma/pull/2142, or am I missing something?
I'd opt to have the waiting / resend options per notification per sensor.
This would make so much more sense. Sometimes you want to inform the devops team instantly. But the CIO / CEO only, when the downtime is 30+min ...
@kaystrobach escalation is a completely different issue altogether..
@kaystrobach escalation is a completely different issue altogether..
Would not call it escalation yet, escalation implies no reaction before, this was not part of my request. Just informing more people if the problem persists.
⚠️ Please verify that this bug has NOT been raised before.
🛡️ Security Policy
Description
Use case: Email Notification should be sent immediately after a monitor is detected as being down if there is no retry set. if there is a retry interval set, the notification should be immediately sent after the monitor is down and the retry interval and attempt has passed. The behavior was also observed for MS Teams notifications.
Example scenario 1 with no retry:
Example scenario 1 with one retry:
👟 Reproduction steps
See above.
👀 Expected behavior
Email Notification should be sent immediately after a monitor is detected as being down if there is no retry set. if there is a retry interval set, the notification should be immediately sent after the monitor is down and the retry interval and attemp has passed.
😓 Actual Behavior
Email notification is delayed by the amount of the heart beat interval setting. Note: verified with MS teams notification as well and similar behavior is happening.
🐻 Uptime-Kuma Version
1.22.1
💻 Operating System and Arch
debian docker
🌐 Browser
Google Chrome
🐋 Docker Version
19.0.35
🟩 NodeJS Version
NA
📝 Relevant log output