pennsignals / legacy-system-services

2 stars 0 forks source link

Targets missing in Prometheus #12

Closed darrylmendillo closed 4 years ago

darrylmendillo commented 4 years ago

Event:

Evidence:

Prometheus 15APR2020 Targets missing.pdf

Resolution:

darrylmendillo commented 4 years ago

Possibly due to using this hack to bypass consul sd

{{ range services }}{{if in .Tags "monitoring"}}
  - job_name: {{ .Name }}
    scrape_interval: 5s
    static_configs:
      - targets: [{{range $index, $service := service .Name }}{{if ne $index 0}},{{end}}'{{$service.Address}}:{{$service.Port}}'{{end}}]
        labels:
          group: 'monitoring'
{{end}}{{ end }}

{{ range services }}{{if in .Tags "metrics"}}
  - job_name: {{ .Name }}
    scrape_interval: 5s
    {{ range service .Name }}static_configs:
    - targets: ['{{.Address }}:{{.Port}}']
      labels:
        group: 'application'
        app: '{{ .Name}}'{{end}}
{{end}}{{ end }}

working sd block used for nomad_metrics:

- job_name: 'nomad_metrics'

    scheme: https
    tls_config:
      insecure_skip_verify: true

    consul_sd_configs:
    - server: {{ env "CONSUL_ADDR" }}
      scheme: "https"
      tls_config:
        insecure_skip_verify: true
      services: ['nomad-client', 'nomad']

    relabel_configs:
    - source_labels: [__meta_consul_tags]
      separator: ;
      regex: (.*)http(.*)
      replacement: $1
      action: keep

    - source_labels: [__meta_consul_address]
      separator: ;
      regex: (.*)
      target_label: __meta_consul_service_address
      replacement: $1
      action: replace

    scrape_interval: 5s      
    metrics_path: /v1/metrics
    params:
      format: ['prometheus']
darrylmendillo commented 4 years ago

The hack would iterate through all services when prometheus started and ONLY when prometheus started. It is not able to discover dropped or added nomad services.

Replacing this with consul service discovery creates a more robust and dynamic solution. This should eliminate lost prometheus services.