Closed darrylmendillo closed 4 years ago
Possibly due to using this hack to bypass consul sd
{{ range services }}{{if in .Tags "monitoring"}}
- job_name: {{ .Name }}
scrape_interval: 5s
static_configs:
- targets: [{{range $index, $service := service .Name }}{{if ne $index 0}},{{end}}'{{$service.Address}}:{{$service.Port}}'{{end}}]
labels:
group: 'monitoring'
{{end}}{{ end }}
{{ range services }}{{if in .Tags "metrics"}}
- job_name: {{ .Name }}
scrape_interval: 5s
{{ range service .Name }}static_configs:
- targets: ['{{.Address }}:{{.Port}}']
labels:
group: 'application'
app: '{{ .Name}}'{{end}}
{{end}}{{ end }}
working sd block used for nomad_metrics:
- job_name: 'nomad_metrics'
scheme: https
tls_config:
insecure_skip_verify: true
consul_sd_configs:
- server: {{ env "CONSUL_ADDR" }}
scheme: "https"
tls_config:
insecure_skip_verify: true
services: ['nomad-client', 'nomad']
relabel_configs:
- source_labels: [__meta_consul_tags]
separator: ;
regex: (.*)http(.*)
replacement: $1
action: keep
- source_labels: [__meta_consul_address]
separator: ;
regex: (.*)
target_label: __meta_consul_service_address
replacement: $1
action: replace
scrape_interval: 5s
metrics_path: /v1/metrics
params:
format: ['prometheus']
The hack would iterate through all services when prometheus started and ONLY when prometheus started. It is not able to discover dropped or added nomad services.
Replacing this with consul service discovery creates a more robust and dynamic solution. This should eliminate lost prometheus services.
Event:
Evidence:
Prometheus 15APR2020 Targets missing.pdf
Resolution: