Closed duylong closed 7 years ago
I don't plan to implement something like this, due to the following reasons:
The more interesting question is, why are you looking for a solution like this?
With the node_name
all Statusengine Worker are able to write to the same database. The interface can send external commands to the corresponding cluster node, via the node_name
identifier.
If you need to scale out the database, I recommend to use CrateDB.
I guess you are asking, because you have two locations that have a bad connection to each other. If so, most of the time its better to split up the systems to keep everything alive even over network outtakes. For example, setup two Statusengine Interface instances like:
berlin.monitoring.lan
and
ny.monitoring.lan
Or, but this is a bit hacky: You can create a script, that fetch data of two or more databases, and dumps into one big database. Than you have to independent systems with one view, that may be lie to you, if you have connection issues.
I will test CrateDB to see, Thank you for your explanation :)
How many Hosts and Services do you plan to monitor? :)
Currently I test on my personal servers (20 hosts: docker, raspberry...), the best being to hold with more than 500 hosts O:)
As I have several sites, I would like to make the tools (mysql ...) independent between them. If I lose a database on one site, the others sites continue to live. I could put a failover, but it adds a complexity.
After thinking, I do not think cratedb or elasticsearch are intended for multi-site (https://www.elastic.co/blog/clustering_across_multiple_data_centers). I think that the possibility to integrate several organizations and therefore several different bases must be privileged. I tried to play with Docker but it is very complex (non-heterogeneous servers, firewall, stunnel, node communication ...)
You do not need to merge/sort the data, a display per organization / datacenter is sufficient.
Build up a database or disk cluster across multiple locations is most of the time a bad idea. Can you please provide more information about, what you exactly plan to do?
You can also build a setup like this: Setup your locations, with own database, own interface, own monitoring, own cleanup cronjob and other stuff. In the next step you put a second gearman-job-server (localhost:4730 and localhost:4731), a second Statusengine Worker and load the Statusengine Event Broker twice on every side. Keep the gearman servers local to reduce the latency and avoid that the monitoring core gets blocked.
The first Statusengine Worker will save the data to the local database and the local interface will only show local information.
The second Statusengine Worker will fetch the data from the second gearman server and save the data into the remote database, using a VPN connection. May be you should increase the max_bulk_delay
value in the worker config, to compensate the higher latency.
So you can build up a remote database, that will show the data of all remote locations, and all remote locations will only display own data. With this setup also the external command routing will work.
For my test, I have 2 dedicated servers outside and servers at home. Each datacenter has its own internal architecture with its own Naemon. The servers are not heterogeneous, the databases are not adapted in the same way. The must for a supervisory interface is to have a single unified interface.
The interface Sensu (Uchiwa) allows this idea. Sensu's architecture is based on RabbitMQ to exchange messages. A worker enriches his own Redis base. Uchiwa retrieves the data in this database via an API.
To centralize exchanges, it takes flexibility, queue management and having a single entry. A project exists to allow Nagios to communicate in AMQP (https://github.com/capensis/canopsis-nagios). With the worker, why not also consume the data in an AMQP queue?
I do not know if it's a good idea to put datacenters with Docker in VPN. I can centralize the data, but my Dockers do not make much sense if I have to allow all applications to access my services individually.
I also play with the idea to replace the gearman queue with something newer, may be some solution from Apache, but I don't know when and if this will happen. How ever, the reason why I wrote my own event broker is simple. I want full control over my data and how the data gets exported.
I'm not familiar with the Sensu Uchiwa project, but after reading the docs for 5 minutes or so, i try to make the long story short.
What I thought, you would like to have: You select which database should be used, you can select all databases, just one or two. Now the PHP Backend of the Statusengine Interface needs to query the following databases for example: localhost:3306 remotedatacenter2:3306 Fetch data from both databases, merge results, sort, return data to the Interface.
What I now think, you would like to have: You can select, which API end point the interface should use, to get the data from. So the Statusengine UI communicates with the Statusengine Ui PHP Backend over an HTTP API.
You can only select one data center as source and only see the data of the selected data center. Your browser will than send the HTTP-API requests directly to the remote web server, or maybe the backend will be used as proxy or so.
Issues:
Is this what you are looking for?
PS: Even the second method is still a lot of work, maybe more then the first on ;) (but a clean solution) PPS: If I really going to implement something like this, I know there will some day the request, to make it possible to select multiple data centers.
I just want to mention, you can achieve the same, by let the remote Statusengine Workers push all results to one centralized database. Or not?
//Edit
Changing the API end point, is more or less the same than browse to berlin.monitoring.lan
and ny.monitoring.lan
- so not the effort worth to implement this
For management of queues, RabbitMQ and Kafka are very popular. Kafka has more dependency and complexity, I use it for logs but for supervision, I will privilege RabbitMQ.
It is true that we can always centralize the data, but it is better to separate a logical view by datacenter. In addition, if a datacenter is inaccessible, others will be able to live normally and Statusengine UI continues to play its role. And this greatly simplifies the level of security and management with containers in NAT.
The implementation of the first solution is indeed difficult because the data must be merged. The second solution is simpler visually and suits me first (we can then improve for fusion and sorts if necessary). In addition, there is no need to merge data between accessible and inaccessible datacenters, which can remove display problems in data recovery.
Thanks again for your consideration to the problem ;)
Here's the deal. At the moment I don't have time for any bigger architecture changes. The main focus of the Statusengine Project is, to store Nagios and Naemon events to a database, and to provide a way to scale this across multiple nodes. Split up a Naemon with 200k Services into two Naemon boxes with 100k Services, scale out the database etc...
Implementing an Method to change the API Endpoint of the Ui, will be the same, than add a bookmark to your browser.
If this feature is super important for you, and you don't want to do this over a bookmark or an own landing page or so, you may should take a close look to some other interfaces available.
https://www.thruk.org/ for example is able to handle multiple livestatus backends in the same interface.
I mark this as advertisement, because it's the project of my employer, but also https://github.com/it-novum/openITCOCKPIT has an own solution for distributed monitoring.
The complexity in multi-site has always been complicated in the implementations, it was a bonus of having. Your approach is not bad too, I can content myself in my case :)
I already use Thruk for the BI generation, but it's less sexy as interface for un dashboard :p I will stay on your solution and I continue to offer you my ideas, good or not :-)
You are welcome :)
Hi,
Just a question, how to for using "_commands" in statusengine ui when I write to the same database ? (multi-site) what are the URLs to redirect? Sorry I did not have time to search in queries O:-)
What? :D
Sorry I badly described my situation.
I tried to centralize to a single entry point. Now how to use commands from the UI for ACK (for example). I have a NAT at home for example.
When you trigger an external command via Statusengine Ui, the interface will call this API end points: https://github.com/statusengine/interface/blob/master/public/api/index.php#L338-L404 This depends on the command and if it require to pass arguments or not.
The Ui backend will create a new record to the statusengine_tasks
table in the database. The routing, to the corresponding cluster node, will by done by the field node_name
The Statusengine Worker, will fetch all external commands out of this table, filtered by node_name. https://github.com/statusengine/worker/blob/1c0339109f73cb7eb84428b13b61fd66c03c09cf/bin/StatusengineWorker.php#L215-L226
Process the command and delete it.
Okay, I have nothing to do. I thought the UI was directly contacting the broker.
Thanks for the explanation.
Hi,
I have independent MySQL databases on multiple sites. Instead of sending worker data to a single database, is it possible for the interface to query multiple MySQL databases?
What do you think is the most functional?
or