lightninglabs / lndmon

🔎lndmon: A drop-in monitoring solution for your lnd node using Prometheus+Grafana
MIT License
151 stars 47 forks source link

Kubernetes compatible dashboards (alt) #60

Closed mrfelton closed 3 years ago

mrfelton commented 3 years ago

Alternate version of https://github.com/lightninglabs/lndmon/pull/42 that allows filtering by node and namespace.

Roasbeef commented 3 years ago

Set of changes looks sound, but will need to test this out both in the normal config, as well as how we deploy this in our own k8s instance as well to ensure nothing breaks.

lispmeister commented 3 years ago

Looking at this PR I'm wondering why this needs to be done by lndmon itself. Usually you want the data that's being collected by Prometheus to be as uniform as possible and then use labels to categorize multiple instances of the same data from different contexts.

Example from one of our Helm charts that exposes a scraping service to Prometheus:

apiVersion: v1
kind: Service
metadata:
  name: {{ template "nautilus.fullname" . }}-prometheus
  labels:
    app: {{ template "nautilus.name" . }}
    chart: {{ template "nautilus.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
    lnd_instance: {{ .Values.nautilus.prometheus.lndInstance }}
    cluster_name: {{ .Values.clusterName }}
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/path: "/metrics"
    prometheus.io/port: "{{ .Values.nautilus.prometheus.port }}"
spec:
  ports:
    - name: prometheus
      protocol: TCP
      port: {{ .Values.nautilus.prometheus.port }}
      targetPort: {{ .Values.nautilus.prometheus.port }}
  selector:
    app: {{ template "nautilus.name" . }}
    release: {{ .Release.Name }}

Notice you we label data collected from nautilus by lnd_instance and clusterName. The annotations tell fluentd how to scrape the data.

mrfelton commented 3 years ago

@lispmeister this PR doesn't change how lndmon collects the data, it alters how grafana makes the data available.

guggero commented 3 years ago

I agree that some doc would be nice. Here's how we relabel for those dashboards in prometheus.yml:

scrape_configs:
  - job_name: kubernetes-service-endpoints
    metric_relabel_configs:                                                                                                                                                                                  
    - source_labels:
      - kubernetes_namespace
      target_label: namespace
    - source_labels:
      - lnd_instance
      target_label: pod

Feel free to add that as an example.

mrfelton commented 3 years ago

I've rebased this PR, again. Would be good to see it merged. I don't fully understand the point about relabelling or how or where it should be documented. Maybe someone else that better understand that can add that documentation in a follow up PR.

carlaKC commented 3 years ago

Thanks for sticking with this one @mrfelton! Nice improvement 🎉