kubernetes-sigs / prometheus-adapter

An implementation of the custom.metrics.k8s.io API using Prometheus
Apache License 2.0
1.9k stars 551 forks source link

How to debug a missing external metric? #605

Open pablokbs opened 1 year ago

pablokbs commented 1 year ago

Hello, I've been trying to expose a new (external) metric for a few days now, I can't figure out why is it missing:

This is my current externalRules section in my configmap:

externalRules:
    - metricsQuery: label_replace(label_replace(node_nf_conntrack_entries, "internal_ip",
        "$1", "instance", "([^:]+)(:[0-9]+)?"),"node", "ip-$1-$2-$3-$4.ec2.internal",
        "internal_ip", "(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})") * on (node)
        group_left(label_node_kubernetes_io_node_type) kube_node_labels{label_node_kubernetes_io_node_type="nginx"}
      name:
        as: node_nf_conntrack_entries_nginx
        matches: ""
      resources:
        template: <<.Resource>>
      seriesQuery: '{__name__=~"node_nf_conntrack_entries"}'
    - metricsQuery: label_replace(aws_networkelb_tcp_client_reset_count_sum, "load_balancer_name",
        "${2}_${3}", "load_balancer", "(.*)/(.*)/(.*)")
      name:
        as: aws_networkelb_tcp_client_reset_count_underscore_sum
        matches: ""
      resources:
        template: <<.Resource>>
      seriesQuery: '{__name__=~"aws_networkelb_tcp_client_reset_count_sum"}'
    - metricsQuery: '${1}'
      seriesQuery: '{__name__=~"confluent_kafka_server_consumer_lag_offsets"}'
      resources:
        overrides:
          pod:
            resource: pod
      name:
        matches: ""
        as: "kafka_consumer_lag"

In this example, I have 3 rules, but only 2 of the metrics are showing up when I query the list of metrics:

 ➜  ~ curl -k http://127.0.0.1:8001/apis/external.metrics.k8s.io/v1beta1/ | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   536  100   536    0     0    811      0 --:--:-- --:--:-- --:--:--   822
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "node_nf_conntrack_entries_nginx",
      "singularName": "",
      "namespaced": true,
      "kind": "ExternalMetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "aws_networkelb_tcp_client_reset_count_underscore_sum",
      "singularName": "",
      "namespaced": true,
      "kind": "ExternalMetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

I enabled the debug logs in prometheus-adapter, and I see how prometheus-adapter is querying prometheus for the metric and getting a 200.:

I0830 14:38:08.269232       1 api.go:74] GET http://prometheus-operated.monitoring.svc:9090/api/v1/series?match%5B%5D=%7B__name__%3D~%22confluent_kafka_server_consumer_lag_offsets%22%7D&start=1693406228.268 200 OK

Besides that, I don't see anything else in the logs (for example, grepping confluent) that shows me why the metric is not showing up in the external metric list.

How can I debug this?

dgrisonnet commented 1 year ago

/kind support /assign

dgrisonnet commented 1 year ago

/triage accepted

houms-sony commented 9 months ago

@pablokbs make sure to turn your logLevel to 10 (-v=10). We were having same issue and were able to determine empty data was being returned which is why no metrics is listed in external metrics endpoint