nats-io / prometheus-nats-exporter

A Prometheus exporter for NATS metrics
Apache License 2.0
371 stars 136 forks source link

gnatsd Prometheus alerts #165

Open sommerit opened 2 years ago

sommerit commented 2 years ago

Hello Guys,

I implement the Nats Exporter into my K8s / Prometheus Stack and ever things works like charm Thanks for that community

Now I look for some Monitoring Rules because my experience with Nats is not that big.

For other services I like to use https://awesome-prometheus-alerts.grep.to/rules.

Have, maybe someone experiences and can provide some Rules?

I will ofc research and if I find something put here.

Thanks

Greetings

manuelottlik commented 2 years ago

Hey, I was also looking for some prometheus alerts for JetStream but did not find anything yet. I am really inexperienced when it comes to PQL and alerts, but this is what I came up with:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: i3t-nats
spec:
  groups:
    - name: nats.rules
      rules:
        - alert: NatsConsumerPendingMessagesTooHigh
          expr: nats_consumer_num_pending > {{ .Values.alerting.rules.natsMessagesPendingThreshold }}
          for: 3m
          labels:
            severity: critical
          annotations:
            description: {{` Consumer "{{$labels.consumer_name}}" has {{ $value }} pending messages. `}}
            summary: {{` The amount of pending messages is too high for 3 minutes. `}}
        - alert: NatsConsumerPendingMessagesIncreasing
          expr: deriv(nats_consumer_num_pending[1m]) > 0
          for: 3m
          labels:
            severity: critical
          annotations:
            description: {{` Consumer "{{$labels.consumer_name}}" is receiving more messages than it can process. `}}
            summary: {{` The amount of pending messages has increased for more than 3 minutes. `}}
        - alert: NatsConsumerRedeliveredMessagePercentageTooHigh
          expr: rate(nats_consumer_num_redelivered[1m]) / rate(nats_consumer_delivered_stream_seq[1m]) > {{ .Values.alerting.rules.natsMessagesRedeliveredPercentageThreshold }}
          for: 1m
          labels:
            severity: critical
          annotations:
            description: {{` Consumer "{{$labels.consumer_name}}" gets {{ $value }} of its messages redelivered. `}}
            summary: {{` The percentage of redelivered messages is too high. `}}

Its written to be processed by helm, so if you use it directly you probably want to remove the {{` and the .Values... stuff.

If anyone has more experience or other ideas for prometheus rules I would love to see them!