Lots of 'timeout of 48000ms exceeded' during the day, but the site is still up?

Oaktribe commented 3 years ago

Is it a duplicate question? Do not thing so.

Describe the bug During the day numerous alerts of a monitor timing out with the message 'timeout of 48000ms exceeded' is reported. Then it works again, and then another timeout message is reported.

I setup a custom .js script checking the site the same way Kuma does outside of the docker, and nothing is reported there during the day. Could it be some issue with Docker for Desktop on Windows 10 with WSL2 producing false timeouts?

To Reproduce Add monitor to site, and wait. That's how I do it.

Expected behavior Not to throw a timeout since the site does respond.

Info

Uptime Kuma Version: 1.3.2
Using Docker?: Yes
OS: Windows
Browser: Firefox

Screenshots

Error Log Nothing is logged to the docker log window.

louislam commented 3 years ago

Honestly I think Docker Desktop for Windows is buggy sometimes. Worth to try without Docker.

git clone https://github.com/louislam/uptime-kuma.git
cd uptime-kuma
npm run setup

npm run start-server

Oaktribe commented 3 years ago

You were correct, I have let it run for about 16 hours now normally (not with Docker), and no errors has been reported. Before there would always be some timeout errors.

So Docker for Windows + WSL2 was the culprit here.

Oaktribe commented 3 years ago

Did some Googling and found this. https://github.com/microsoft/Windows-Containers/issues/145

So I ran that command on my Windows Host machine, restarted and fired up the docker again. So far it seems to have helped, before there would be at least 10 timeouts a day. It's been running for 48 hours now without any timeouts.

-- EDIT Spoke to soon, timeouts are back. Oh well.

CallMeTerdFerguson commented 3 years ago

I'm experiencing the same issue, but mine is running in Docker on Ubuntu 20.04.3. Like OP, the performance/response times that kuma is reporting it's getting aren't consistent with the actual response times of my services. I get an alternating graph of timeouts and outlandish response times like 30-40s when the actual service(s) are responding in <3s consistently outside of Kuma. Wondering if I should open a new issue or if this one needs re-opened since it's not just specific to Docker in Windows. @louislam preferences?

louislam commented 3 years ago

@Oaktribe @agrider Because I cannot reproduce the problem, it really hard to address the problem.

However, one of our contributions discovered that it could be Alpine Docker problem. https://github.com/louislam/uptime-kuma/issues/294#issuecomment-909353979

I will build a Debian/Ubuntu docker later for you guys testing.

@agrider If possible, try to run it without Docker: https://github.com/louislam/uptime-kuma/wiki/%F0%9F%94%A7-How-to-Install#-without-docker-recommended-for-x86x64-only

CallMeTerdFerguson commented 3 years ago

@louislam I don't install basically anything bare metal on my server if I can help it, so I can't help with the non-docker option, but I did clone down the repo to my Gitlab instance and rebuilt the image on Debian Bullseye instead to see if it's an Alpine issue. I'm going to leave it up and running a day or two and see what the result is.

louislam commented 3 years ago

@louislam I don't install basically anything bare metal on my server if I can help it, so I can't help with the non-docker option, but I did clone down the repo to my Gitlab instance and rebuilt the image on Debian Bullseye instead to see if it's an Alpine issue. I'm going to leave it up and running a day or two and see what the result is.

Thank you so mcuh!

CallMeTerdFerguson commented 3 years ago

Ok, so I don't need to wait a day, I can pretty conclusively say that the issue is not alpine related as my times were still atrocious, which doesn't surprise me as I use numerous other alpine based images with no latency issues. I did resolve my issue though. When I stood up my instance in my compose file, I forgot to apply my default configs anchor, one of which is setting DNS directly on the container to point to my local unbound instance. As soon as I set DNS to go directly to the DNS server instead of routing through the docker gateway for DNS resolution (which still ultimately landed at unbound), my times dropped to the expected values of <100ms, under both the image I made and the latest alpine based image. My guess is that the if you have a large number of containers like I do (>80), or a high check rate set, that you can overwhelm the DNS resolver in the Docker gateway. Long story short, it's not a conclusive fix necessarily, but I'd try setting the DNS on your kuma container to your preferred upstream server directly instead of letting the docker gateway manage it.

That said, is uptime-kuma set to cache DNS results and respect the TTL's of DNS records it requests? I have the vast majority of my local DNS entries set to an enormously large TTL, so even with the DNS going through the docker gateway and a high check rate, there should have been a relatively small number of DNS queries, but it looks like that wasn't the case, based on the traffic I was seeing it seemed like Kuma was making a DNS request per check.

louislam commented 3 years ago

I was seeing it seemed like Kuma was making a DNS request per check.

Thank you for your finding, it seems that it is worth to implement dns cache in Uptime Kuma. I always thought dns cache is managed by OS.

CallMeTerdFerguson commented 3 years ago

It usually is, though Alpine is a pretty sparse distro, so it may not have a caching resolver configured/enabled by default? I'm really not familiar enough and haven't encountered dns issues like this with it before.

It also might and my conjecture about root cause could be totally wrong, in which case I'm unsure why switching to direct dns config resolved the issue.

yasharne commented 2 years ago

same problem on 1.10.2, switching to debian based image fixed the problem

csakaszamok commented 1 year ago

v1.21.3 unfortunately this false positive error still occurs every day

UtechtDustin commented 1 year ago

We have the same issue (multiple times per day) with Kuma Version: 1.21.3, Kuma is installed without docker.
So the docker-image can't be the issue, it's running on our infrastructure server which also includes an DNS-Server.

CommanderStorm commented 1 year ago

@csakaszamok @UtechtDustin @yasharne Have you activeted the DNS-cache?

csakaszamok commented 1 year ago

It was disabled. Now I've set to enable, thx the tip

UtechtDustin commented 1 year ago

It was disabled, lets see if that fixes it.

UtechtDustin commented 1 year ago

@CommanderStorm it seems that option don't fixed the issue.
Last night we got a lot of "spam"-messages(32 Messages within a few Minutes) with the same errror timeout of 48000ms.
I'm not sure why the DNS should be the cause of an timeout of 48 Seconds.

solracsf commented 1 year ago

https://github.com/louislam/uptime-kuma/pull/3472 could help here. To be shipped on 1.23.0.

Aj7Ay commented 7 months ago

@csakaszamok @UtechtDustin @yasharne Have you activeted the DNS-cache?

Its Deprecated any new option adding ?

CommanderStorm commented 7 months ago

What are you asking?

I think you are asking what we replaced this with. We have replaced this with the Name service caching daemon if you are using the docker container. For a native installation, you will have to install this yourself.

UPSOKen commented 4 months ago

I am still experiencing this issue on Ubuntu docker. Majority of my sites with a specific hosting company do this ALL day, even though the sites are all working fine.

af7567 commented 3 months ago

edit: Nevermind - my problem was that my ISP screwed up IPv6 routing. The reason curl worked is because curl tries both the ipv6 and ipv4 address. It might be useful if Kuma did the same?

I am suddenly getting this issue today in Linux docker and can't get the monitor to go green again. The site is up and I can run curl from within the container to access the site OK.
I also tried deleting and recreating the container and changing all the DNS cache options.
It has been working fine for months and there have been no system changes/updates today which could have caused it.

CommanderStorm commented 3 months ago

@af7567 Happy eyeballs v2 is a feature that node implemented in a more recent version. V2.0 has switched to said version. We need a few things to be able to release that. See #4500 for further context and how to get involved.

GordonHannan commented 1 week ago

I am still experiencing this issue on Ubuntu docker. Majority of my sites with a specific hosting company do this ALL day, even though the sites are all working fine.

You're not alone, I'm seeing it too. Name service caching daemon is enabled and I've tried on the public release and the beta. Using Ubuntu with docker (using cloudflare tunnel though). Not sure if that is causing issues or not?

puregraphx commented 1 week ago

Could it help if you set up Retries to 1 or higher? I had 1 site on a Hetzner server that also hosts other WP sites. This one site got occasional 48s time outs, but it was the only site on Kuma without Retries.

louislam / uptime-kuma

Lots of 'timeout of 48000ms exceeded' during the day, but the site is still up? #275