sassoftware / viya4-monitoring-kubernetes

Provides simple scripts and customization options to deploy monitoring, alerts, and log aggregation for Viya 4 running on Kubernetes
Apache License 2.0
54 stars 32 forks source link

sas-opendistro vs. sas-opensearch ServiceMonitor #550

Closed tynsh closed 11 months ago

tynsh commented 1 year ago

Hi,

we can't seem to get the sas-opensearch ServiceMonitor to get metrics. There is also the sas-openditsro serviceMonitor available, that scrapes an opensearch exporter. Can you clarify what the difference between these two is?

Best

Tobias Wackenhut

gsmith-sas commented 1 year ago

A change was made in the 2023.08 stable version of SAS Viya with regards to how the OpenSearch instance within SAS Viya surfaces metrics. Prior to change, metrics for this instance of OpenSearch were collected via an additional component, the Elasticsearch Exporter. The sas-opendistro serviceMonitor allows Prometheus to collect metrics from this component (and, thus, from the OpenSearch instance within SAS Viya). With this change, metrics from this instance of OpenSearch are made available from an (internal) plug-in to OpenSearch rather than via the (external) Elasticsearch Exporter component. The new sas-opensearch serviceMonitor allows Prometheus to collect metrics via the new plug-in. Our project added a new dashboard, called "OpenSearch", to Grafana to visualize the OpenSearch metrics available via this new mechanism.

Our project continues to use the Elasticsearch Exporter to make metrics available for the instance of OpenSearch used by our project. And, the existing "Elasticsearch" dashboard in Grafana continues to be the one to use for visualizing metrics related to our project's instance of OpenSearch.

I hope that clarifies things for you. If you are working with SAS Viya 2023.08, please check the new Grafana dashboard for metrics about the OpenSearch instance within SAS Viya. Please let us know if that resolves things for you.

Regards, Greg

tynsh commented 1 year ago

I see. Thanks for the quick reply. I'll test this on Tuesday and will get back to you or close this issue. I'm working with the last Viya LTS release, which must be older than 2023.08, so I suspect that the scraping issues I was facing with the sas-opensearch serviceMonitor were due to this version difference.

gsmith-sas commented 1 year ago

If you're working with the latest LTS release (i.e. SAS Viya 2023.03 LTS release), I would expect you to see:

tynsh commented 1 year ago

Hi, I've checked and as far as I understand this, the situation is as follows:

This leads to alerts based on the up metric.

gsmith-sas commented 1 year ago

Ah 💡 I think I understand your concern now. Since you are running SAS Viya 2023.03 LTS, the OpenSearch instance within SAS Viya does NOT have the metrics plug-in deployed. But, since the service has the opendistro.sas.com/service-name=sas-opendistro label, it matches the sas-opensearch serviceMonitor...and, since there is no plugin, Prometheus cannot collect metrics and reports that end-point as "unhealthy".

The "good" news is that you do have metrics available for OpenSearch via the other serviceMonitor and Grafana dashboard. So, this is mostly just a case of misleading clutter showing up in Prometheus.

We will have to consider the best way to handle this moving forward. I'll discuss it with the team supporting that OpenSearch instance and see how they want to handle things. Sorry it took a while for me to fully understand the issue.

gsmith-sas commented 11 months ago

The OpenSearch instance deployed within SAS Viya was updated to include a new Kubernetes label (opendistro.sas.com/metrics-plugin) to indicate that it uses the plugin for metrics. This change was part of the 2023.10 Stable release of SAS Viya which became available last week. We have updated the sas-opensearch serviceMonitor deployed as part of this project to use the new label. This should eliminate cases where the serviceMonitor was finding an incorrect match in older versions of SAS Viya and Prometheus was reporting an "unhealthy" end-point. The updated serviceMonitor is available now on our 'main' branch and will be available on our 'stable' branch as part of our next release (due in mid-November). Thank you for reporting the issue. Fixed in #566.