Open frit0-rb opened 1 year ago
I am experiencing the same bug/issue on this environment:
Linux 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
I suspect the node of which metrics are collected is the one Prometheus is currently running on (single pod) but am not 100% sure.
Edit:
The longhorn-backend
service has 3 endpoints, one for each node. I see now, by looking at the logs of the manager pods which act as endpoints, that it depends on which ever one the request from Prometheus gets sent to.
This makes me think there is some sort of communication issue, but I would not know why. The Prometheus endpoint to scrape is set to longhorn-backend.<namespace>:9500
. I have confirmed the metrics are available by using wget
to fetch the metrics using another pod in the same namespace Prometheus is running in.
The metrics controller is currently collecting metrics for only the current node, as seen in the codes provided in this link (https://github.com/longhorn/longhorn-manager/blob/master/metrics_collector/node_collector.go#L201). @PhanLe1010, do you know why collect metrics for only the current node?
@tomwiggers @frit0-rb Can you check the ticket https://longhorn.io/docs/1.4.1/monitoring/integrating-with-rancher-monitoring/ rather that getting the metrics from http://longhorn-backend.longhorn-system:9500/metrics? Thank you.
@derekbit I don't use Rancher to manage the cluster and don't use Rancher monitoring so I don't have the monitoring.coreos.com/v1
API.
It seems that the ServiceMonitor used in the example targets each manager seperately. If that works, then we need to do that instead of using the service which load balances requests over the manager pods. Would this not require a change in k8s manifests and Helm charts to not create a service for this (as we cannot use it for metrics anyway)?
I'm in the same boat here as @tomwiggers - I'd like to see all metrics exposed without having to install Rancher, thanks.
Hi! Is there any news? I need to collect all metrics without using rancher. Thanks
I'm facing the same issue where I receive one node although I have three nodes, they look all good in the longhorn dashboard but not in the metrics as it only shows one node. UPDATE: Fixed the issue by not scraping the service from prometheus but added these annotations to the helm chart values so that prometheus can scrape the pods themselves
annotations:
prometheus.io/path: "/metrics"
prometheus.io/port: "9500"
prometheus.io/scrape: "true"
Hello everyone, have you tried to create a service monitor like this _https://longhorn.io/docs/1.6.0/monitoring/integrating-with-rancher-monitoring/#add-longhorn-metrics-to-the-rancher-monitoring-system ?
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: longhorn-prometheus-servicemonitor
namespace: longhorn-system
labels:
name: longhorn-prometheus-servicemonitor
spec:
selector:
matchLabels:
app: longhorn-manager
namespaceSelector:
matchNames:
- longhorn-system
endpoints:
- port: manager
Then setup the Prometheus instance to scrape the service monitor? This will make sure Prometheus collect the metrics from ALL Longhorn nodes
Another approach is like @ahmedhassanahmedwasfy mentioned above. Telling Prometheus to scrape the longhorn-manager pods directly by adding annotation:
annotations:
prometheus.io/path: "/metrics"
prometheus.io/port: "9500"
prometheus.io/scrape: "true"
to the longhorn-manager daesmonset by setting this value https://github.com/longhorn/longhorn/blob/21a538d10198746515de9e0c0f87ccf660738393/chart/values.yaml#L487-L488
Describe the bug
When targeting the metrics URL, in this case http://longhorn-backend.longhorn-system:9500/metrics, the response contains metrics only from one of the nodes. However, when targeting the backing pods, as shown by the service endpoints, each of them returns information for a different single node.
Steps to reproduce
To see metrics from the service:
kubectl exec log_metrics --image alpine --command -- wget -O - http://longhorn-backend.longhorn-system:9500/metrics
kubectl logs log_metrics
To see the metrics of each pod:
kubectl get endpoints longhorn-backend -n longhorn-system -o json | jq '.subsets[] | .addresses[] | .ip'
ip
:kubectl exec log_metrics --image alpine --command -- wget -O - [http://$ip:9500/metrics
](http://%24ip:9500/metrics%60)kubectl logs log_metrics
log_metrics
pod if necessary.Expected behavior
The output should include information from all nodes. At the very least, there would be 4x the number of nodes
longhorn_node_status, with the
node` attribute visible for each node.Actual behavior
If you filter metrics by
longhorn_node_status
, metrics for only onenode
are shown.longhorn_node_storage_capacity_bytes
also shows only results for one node, as doeslonghorn_volume_state
, etc.Environment
• Longhorn version: v1.4.0 • Installation method kubectl • Kubernetes distro: Rancher-managed RKE2/k3s o Number of management nodes in the cluster: 1 o Number of worker nodes in the cluster: 2 • Node config o OS type and version: Rocky Linux 9 (also seen on RHEL 8.7) o CPU per node: 4 o Memory per node: 16Gi o Disk type: VMWare o Network bandwidth between the nodes: n/a • Underlying Infrastructure: VMWare/ESXi (also seen on HyperV) • Number of Longhorn volumes in the cluster: 12 (also seen with 26)
Additional context