louislam / uptime-kuma

A fancy self-hosted monitoring tool
https://uptime.kuma.pet
MIT License
59.01k stars 5.3k forks source link

Distributed mode #1259

Closed mabed-fr closed 1 year ago

mabed-fr commented 2 years ago

⚠️ Please verify that this feature request has NOT been suggested before.

🏷️ Feature Request Type

New Monitor

πŸ”– Feature description

Is it possible to have several instances of uptime-kuma controlled by a central point? In distributed mode? Connected by wirguard ?

Regards,

βœ”οΈ Solution

Is it possible to have several instances of uptime-kuma controlled by a central point? In distributed mode? Connected by wirguard ?

❓ Alternatives

No response

πŸ“ Additional Context

Congratulations for this project that I will support if one of my skills can help you.

mamiu commented 2 years ago

I like the idea of a "distributed mode" or HA mode (high availability mode), multi instance mode, multi hosts mode, fail safe mode, etc. (a few keywords so that this ticket can be found easily).

But what is "wirguard"? If you mean wireguard: You don't need a VPN tunnel to achieve something like that. Additional instances could be added via a private token (similar to how nodes are added to a Kubernetes cluster).

adyanth commented 2 years ago

A distributed install definitely makes sense for something that monitors uptime for other software. Would not want it to go down along with the other apps.

jptechnical commented 2 years ago

I would love this as a feature, if I could have a small instance running on a site and relaying to a master instance somewhere.

For example, say you are an MSP, and you have a few line of business applications you want to monitor inside the network of reach customer without exposing the endpoints directly or vpns. Then the local instance relays or reports the stats to a central instance. Each client site may have and internal status page, but the MSP could have those status pages published centrally for all sites and customers.

onedr0p commented 2 years ago

It's kind of strange that an application to monitor other applications wouldn't support running in high availability but maybe that's not part of the scope of this project. Uptime Kuma would need to support a external DB for data and something like redis for session cache. Also I'm not aware if uptime kuma writes anything else to disk but if so that would be to be changed as well to run HA.

mabed-fr commented 2 years ago

The project is brand new compared to what is on the market, it takes time to develop.

the main idea for my part was to have satellites in several countries but the HA is also possible.

If you want this functionality do not hesitate to comment.

officiallymarky commented 2 years ago

Yes!

snth commented 1 year ago

We just started using uptime-kuma and it's awesome! Thank you so much for creating this and making it available!

Like many others in this thread, the thought naturally arose of "who will watch the watchers"? A distributed/high-availability configuration would be the Bee's Knees.

Until then we're thinking about having uptime-kuma monitored by BetterUptime or healthchecks.io, which given that it's a single service should fit in the free tier.

MaxamServices commented 1 year ago

This would be awesome! and if it would be possible to make the nodes agree certain instance is down and then send the notification

cheuklam commented 1 year ago

It would be great! And if possible, better create a config that allow the notification to be sent "if 2/3 of the depoloyment detail downtime".

I just got a case yesterday that the kuma non stop sending notification (timeout every minute), but when I access the application (which host on AWS and has CloudWatch), it is completly fine. I guess there is some routing issue in between. Only 2 out of 50 application monitored by kuma has such issue.....but then it keep me awaking since 5am in the morning....

wokawoka commented 1 year ago

I agree, it would be great

Computroniks commented 1 year ago

Just going to link #84 here as it looks similar

simcmoi commented 1 year ago

it would be great. I have 1 server and 1 nas. If i can install 2 uptime in HA it will be awesome !

cheuklam commented 1 year ago

Distributed across avaliable zone maybe a difficult task, but I think we can do it in a simple way. My request for distributed mode is of 3 reasons:

1) current uptime kuma (UK) node is down, it will think all my monitoring site was down and up again when the UK service back online, which didn't looks good; It is not the site down but the UK service down 2) We wanna use UK becoz we wanna ensure every service is up and running, and we will have emergency plan for such cases. When the service is down itself, our alert is gone. We can do HA / MultiAZ avaliable for the website but not for the monitoring service, which is a bit weird I would said 3) Network issue which makes the site down in part of the world. Sometimes due to CDN service of network operator, the site maybe avaliable at Europe but stopped working at US. Personally I run some lowest level VM on cloud in different region (using free tier) to check such cases.

We can fix the above issue running multiple instance on different server, but the data is not united. That's why I am thinking of the following suggestion, which should be very simple to implement and fix all the above issue:

P.S. HA mode sounds fancy but it is hard to do HA across multiAZ without a lot of virtual IP, SDWAN which involved a lot of Infra thing. I think the method I mentioned above can minimze the dependency on network infra yet fixed the issue I listed. HA setup only means to keep the servie up and running, I dont think we need ot make things too complicated as DB cluster and heartbeat service together will already be more complicated then the whole project. I like UK for the simplicity yet achieve the purpose.

officiallymarky commented 1 year ago

It's really not difficult, all the commercial services do this. You have multiple agents that report back, and only when x number agents fail do you report a failure.

cheuklam commented 1 year ago

It's really not difficult, all the commercial services do this. You have multiple agents that report back, and only when x number agents fail do you report a failure.

This is not difficult but also not HA, once the main service is down, all client have no where to report. But as I mentioned in solution point 2, it did solved some other issue.

snth commented 1 year ago

I would also really like this feature because I just had my node with Uptime on it go down the other day and while most of my things don't require HA, it would be good to have that in a monitoring solution.

I don't know much high availability setups or Uptime's internal architecture but can't you push the difficult distributed consensus problem into some other component? For example whatever your underlying storage layer is, for things like Redis, Postgres, SQLite, ... there are usually already high availability solutions available so can't you perhaps leverage that?

snth commented 12 months ago

I thought about this again and I think it might really not be that difficult, at least a basic High Availability mode that would be sufficient for my purposes.

Since uptime-kuma already comes with at docker-compose.yml file, my HA setup would be:

Since GlusterFS says it's fully POSIX compliant that should work fine. If a node goes down, Docker Swarm should redeploy uptime on another node and the data backend should be available there thanks to GlusterFS.

WDYT?


It would be nicer to have a storage backend like HA Postgres or CockroachDB but since uptime-kuma currently only seems to support file system storage, this will have to do.

CommanderStorm commented 12 months ago

It would be nicer to have a storage backend like HA Postgres or CockroachDB but since uptime-kuma currently only seems to support file system storage

Actually, v2 does support (external+internal) mariadb next to sqlite and therefore also more complex setups like mariadb-galera see the progress here: https://github.com/louislam/uptime-kuma/milestone/24

For Postgres as a data backend see https://github.com/louislam/uptime-kuma/issues/959

snth commented 12 months ago

Thanks @CommanderStorm . That's great to hear.

Where can I read more about the sqlite setup? Is the connection string for that configurable because then I could probably just use Dqlite for the backend. That would be great because I would really like to avoid the GlusterFS route if possible.

CommanderStorm commented 12 months ago

I don't know what you need. The sqlite database is stored at db/kuma.db. SQLite does not really have a connection string I know of... you just point at the file and go..

We have never looked into if dqlite is a possibility or if this should be a thing we should support (currently, I would argue that mariadb is enough, but I am not a maintainer) => currently not officially supported => we won't consider changes to this part of the system breaking

Here is our contribution guide https://github.com/louislam/uptime-kuma/blob/5b6522a54edad9737fccf195f9eaa25c6fb9d0f6/CONTRIBUTING.md

officiallymarky commented 12 months ago

I thought about this again and I think it might really not be that difficult, at least a basic High Availability mode that would be sufficient for my purposes.

Since uptime-kuma already comes with at docker-compose.yml file, my HA setup would be:

Since GlusterFS says it's fully POSIX compliant that should work fine. If a node goes down, Docker Swarm should redeploy uptime on another node and the data backend should be available there thanks to GlusterFS.

WDYT?

It would be nicer to have a storage backend like HA Postgres or CockroachDB but since uptime-kuma currently only seems to support file system storage, this will have to do.

Unless it is located geographically on a different Internet connection it really doesn’t improve the situation much.

babytof commented 7 months ago

Hello,

Any news on that ?

CommanderStorm commented 7 months ago

There have not been any news in the last four months. We are still working out the kinks of V2.0

JaneX8 commented 6 months ago

I would love to see this feature. It would be great if multiple nodes of Uptime Kuma can be linked. And that for each check you add there is an option to select which nodes this check should run on. And also use it as a fail condition. As in "report if all fail", "report if N fails". A syncing of tasks would be better, because this way each node can keep running in standalone mode if another is down. Which makes it kind of a distributed network of individual instances that can work standalone as well as cooperate, rather than for example workers that still depends on a master to be online.

This way I would add Uptime Kuma on many of my geographically separated servers and simply make sure my checks work on all of them, without having to configure many different individual instances.

CommanderStorm commented 6 months ago

@JaneX8 You can subscribe to https://github.com/louislam/uptime-kuma/issues/84 for updates. Currently, our priorities are on different items such as #4500 and refactoring the monitoring items for better maintainability.

pareis commented 1 month ago

I think linking 2 nodes might not be sufficient, typically, 3 nodes are required for 2 remaining nodes to be able to figure out which node is disconnected and which are still "live". It's like distributed systems work if you want it reliable.

I've been thinking maybe we don't need this distributed mode in uptime-kuma itself, the same can be achieved by let's say running 2 kumas in different regions with the same checks, both alerting via webhooks or similar into an alerting tool that is able to combine the different state using an OR or an AND operation. Like: source A says down, source B says up => up (depending on what you want). A sustained source B down could still be used to trigger a slower alert. This would be more in the context of a on-call system where such system is in use. In the hobbyist space where we use kuma to send alerts via email for example, this wouldn't be possible easily.

officiallymarky commented 1 month ago

Ideally it would allow n nodes, but it's clear from the comments that this isn't a feature that will be added.