meichthys / uptime_kuma

Uptime Kuma HACS integration
105 stars 24 forks source link

added error caching option #65

Closed PhilippMundhenk closed 1 year ago

PhilippMundhenk commented 1 year ago

Here is something rather controversial: My (and it seems others') Uptime Kuma has performance issues. This seems to be a common thing, unfortunately. Every once-in-a-while (usually once every 2-5min) it does not load immediately, but runs into a rather long timeout, leading to a rather spotty coverage in Home Assistant, despite monitors being updated and checked: grafik

Of course, the proper way to deal with this is fixing Uptime Kuma. However, it seems there are over 800 open issues and more than 70 pull requests, so I don't forsee this happening any time soon. I would like to work around this here in a simple manner.

My approach: In case a connection can't be established a defined number of times, don't immediately throw an exception, but rather return cached values. Only if connection errors persist, throw error. This is the controversial part, as errors that actually happen are hidden.

Currently, the value for the cache size is hard-coded to a value that works for my setup. I tried to make this configurable via config flow, but failed. Might need some help here.

PhilippMundhenk commented 1 year ago

Scratch that, doesn't help anything (in the long run), still receiving a lot of gateway timeouts to uptime kuma, even when using it to watch itself...

meichthys commented 1 year ago

IMO, the better wah to fix this is on the uptimekuma side by increasing the required number of failed connections before the service is considered down:

Screenshot_20230820-082930.png

PhilippMundhenk commented 1 year ago

Thanks for the hint! For other services it is. But my connection between HA and UK is breaking up. It might be a Traefik issue and not at all UK-related, investigating... Not sure why all other services are fine, though...

meichthys commented 1 year ago

I see. I haven't noticed that, so it's probably an isolated issue with your environment: image

PhilippMundhenk commented 1 year ago

Finally fixed it! Turns out I had a container running rogue, restarting constantly, leading to Traefik to re-read the Docker config for labels and re-setup all routers, middleware, etc. Its stable now.

meichthys commented 1 year ago

Thanks for following up 👍