kadalu / gluster-metrics-exporter

Lightweight and efficient Prometheus exporter for Gluster metrics
GNU General Public License v3.0
22 stars 7 forks source link

Comunication between exporters in cluster #34

Open frenkye opened 2 years ago

frenkye commented 2 years ago

Hi,

why do exporters need to see each other in the cluster? If we had a firewall in place just for our prometheus instance the exporter did not return any metrics and hang, but when I enabled LAN for peer connection exporters started responding.

aravindavk commented 2 years ago

To collect local metrics(CPU, Memory, Uptime and others) that are not available via Gluster CLI commands. Enable Firewall only for within the cluster communication.

I think we can enhance the exporter by disabling the local metrics collection. Let me check the possibility and update here.

frenkye commented 2 years ago

I see it now in the metrics what you mean the metrics glusterd_*. I think this should not be the case at all to collect it from other nodes, because it is an unwanted duplicity.

These metrics are of the glusterd process on the server, so the exporter should return this only for that server. You will scrape those metrics for others on other exporters.

If i have 3 node cluster:

# HELP glusterd_memory_percentage Glusterd Memory Percentage
# TYPE glusterd_memory_percentage gauge
glusterd_memory_percentage{hostname="node1"} 0.0
glusterd_memory_percentage{hostname="node2} 0.0
glusterd_memory_percentage{hostname="node3"} 0.0

Then in this setup these are 3x3 time series in prometheus in our case. If I do any alert for this then all nodes will be firing and even if the other exporter will be down this metric will disappear from others. To be honest I have never seen behavior like this in any other exporter.

Also this should be mentioned README or help message to ease debugging, for people with strict networking policies in place.

aravindavk commented 2 years ago

Communication between the nodes added to support exporting all Gluster metrics from one node of the cluster. Gluster CLIs are not very friendly if run from all nodes because of locking and other issues. To export cluster level metrics (Metrics coming from Gluster CLI), leader selection is required if we want to avoid exporting Cluster metrics from all the nodes. That was the issue I faced when worked on https://github.com/gluster/gluster-prometheus project.

Configure any one node in Prometheus server to get all the metrics. To get metrics even if the node goes down, you can use virtual IP or nginx based load balancer setup.

I will add the details in README.

frenkye commented 2 years ago

Ah, I misunderstood the first time. The locking argument and reduce calls over CLI make sense. When docs are updated, then this is ready to close.

Thank you.