kubernetes-sigs / prometheus-adapter

An implementation of the custom.metrics.k8s.io API using Prometheus
Apache License 2.0
1.92k stars 554 forks source link

Metrics labelSelector not filtering external metrics #255

Closed GaruGaru closed 3 years ago

GaruGaru commented 4 years ago

Given the following call result

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/*/sqs_queue_messages"

{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/%2A/sqs_queue_messages"
  },
  "items": [
    {
      "metricName": "sqs_queue_messages",
      "metricLabels": {
        "__name__": "sqs_queue_messages",
        ...
        "queue_name": "temp-queue"
      },
      "timestamp": "2019-11-07T21:14:44Z",
      "value": "0"
    },
    {
      "metricName": "sqs_queue_messages",
      "metricLabels": {
        "__name__": "sqs_queue_messages",
        ...
        "queue_name": "random-queue"
      },
      "timestamp": "2019-11-07T21:14:44Z",
      "value": "0"
    }
  ]
}

I expect that the call to the same api endpoint with the labelSelector query param kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/*/sqs_queue_messages?labelSelector=queue_name%3Dtemp-queue" should return just the metrics with the specified queue_name metricsLabel, but all the series are returned.

mpeamma commented 4 years ago

I am having this issue as well. Just to add a little more context for this issue, my metric is being exposed via this rule:

- seriesQuery: '{__name__=~"kafka_minion_topic_partition_message_count"}'
      seriesFilters: []
      resources:
        template: <<.Resource>>
      name:
        matches: ""
        as: "kafka_custom_metric"
      metricsQuery: 'kafka_minion_topic_partition_message_count{}'

It can be queried like this:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/kafka_custom_metric?labelSelector=topic%3Dtopic1" | jq 

{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/kafka_custom_metric"
  },
  "items": [
    {
      "metricName": "kafka_custom_metric",
      "metricLabels": {
        "__name__": "kafka_minion_topic_partition_message_count",
        "endpoint": "metrics",
        "instance": "10.1.0.66:8080",
        "job": "kafka-minion",
        "namespace": "default",
        "partition": "0",
        "pod": "kafka-minion-67d99688cd-m2rpd",
        "service": "kafka-minion",
        "topic": "topic1"
      },
      "timestamp": "2019-11-20T18:41:06Z",
      "value": "7"
    },
    {
      "metricName": "kafka_custom_metric",
      "metricLabels": {
        "__name__": "kafka_minion_topic_partition_message_count",
        "endpoint": "metrics",
        "instance": "10.1.0.66:8080",
        "job": "kafka-minion",
        "namespace": "default",
        "partition": "0",
        "pod": "kafka-minion-67d99688cd-m2rpd",
        "service": "kafka-minion",
        "topic": "topic2"
      },
      "timestamp": "2019-11-20T18:41:06Z",
      "value": "4"
    }
  ]
}

However, it is not being filtered.

In addition, when creating the following HPA, all the items are summed together for a value of 11 in this example:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my-kafka-hpa
spec:
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: my-kafka-consumer
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: External
    external:
      metric:
        name: kafka_custom_metric
        selector:
          matchLabels:
            topic: "topic1"
      target:
        type: AverageValue
        averageValue: 300
caarlos0 commented 4 years ago

your metricsQuery needs to be:

kafka_minion_topic_partition_message_count{<<.LabelMatchers>>}
yciabaud commented 4 years ago

I am facing the very same issue and adding {<<.LabelMatchers>>} to the metricsQuery made existing metric items empty. The promql query containing the labelSelector is working on the prometheus side.

Do you have any suggestion on how to solve this?

fistan684 commented 4 years ago

I was having the same issue as well. All of the metrics were getting aggregated together. I also, added the {<<.LabelMatchers>>} and that caused all of the metrics to drop out. After bumping the log level and examining it, it looks like the root issue is that the namespace gets passed in as part of the LabelMatchers on the external end point. Considering the external end point is supposed to represent metrics that are NOT associated to any type of k8s object is this a bug? If not a bug is there a way to disable the namespace from being part of the LabelMatchers?

In our case we have metrics that are being scraped that have no namespace or equivalent label that we would like to use as part of our HPA scaling. Any suggestions would be great, thanks.

UPDATE: If anyone else runs into this issue. I was eventually able to work around this by changing the metricsQuery to use the LabelValuesByName and filtering out the passed in namespace argument. I still feel like it's a bug that it gets included but maybe there's some reason for it. 🤷‍♂ Here's an example to save some folks some googling on go templates:

max(<<.Series>>{<< range $key, $value := .LabelValuesByName >><< if ne $key "namespace" >><< $key >>="<< $value >>",<< end >><< end >>} * 100) by (group, topic)

rbadillo commented 4 years ago

@fistan684 thanks a lot for sharing your solution, it helped us as well. How can we connect? so we can help each other in the future. Thanks again!

domderen commented 4 years ago

@fistan684 Would you mind explaining how you increased the log level to see that the namespace label was being added? I'm running into the same issue as you, but your fix doesn't work in my case unfortunately, so I wanted to dig deeper into what might be the problem.

lli-hiya commented 4 years ago

This is indeed an issue. https://github.com/DirectXMan12/k8s-prometheus-adapter/issues/282

Using 0.5.0 resolves it.

john-delivuk commented 4 years ago

@domderen when you use the debugger you can trace it. #282 has a link to the line of code that's jamming it in. Like others have stated, the easiest work around is to get namespace=default on your metric.

domderen commented 4 years ago

@john-delivuk Thanks for your help! I ended up doing, it's just not very clean solution, so I was hoping for something cleaner, but decided to just move on to other things :)

ryan-dyer-sp commented 4 years ago

Any update on this? I resolved our issue by downgrading to 0.5.0 and ensuring <<.LabelMatchers>> was part of the query.

Seems odd to me that this has been broken for two releases now. Would think this is a pretty normal usecase that should be tested as part of the release.

Ping-Lin-108 commented 4 years ago

I was having the same issue as well. All of the metrics were getting aggregated together. I also, added the {<<.LabelMatchers>>} and that caused all of the metrics to drop out. After bumping the log level and examining it, it looks like the root issue is that the namespace gets passed in as part of the LabelMatchers on the external end point. Considering the external end point is supposed to represent metrics that are NOT associated to any type of k8s object is this a bug? If not a bug is there a way to disable the namespace from being part of the LabelMatchers?

In our case we have metrics that are being scraped that have no namespace or equivalent label that we would like to use as part of our HPA scaling. Any suggestions would be great, thanks.

UPDATE: If anyone else runs into this issue. I was eventually able to work around this by changing the metricsQuery to use the LabelValuesByName and filtering out the passed in namespace argument. I still feel like it's a bug that it gets included but maybe there's some reason for it. 🤷‍♂ Here's an example to save some folks some googling on go templates:

max(<<.Series>>{<< range $key, $value := .LabelValuesByName >><< if ne $key "namespace" >><< $key >>="<< $value >>",<< end >><< end >>} * 100) by (group, topic)

my labels disappear if i put <<.LableMatchers>> in the metric query, but this works for me!!!

GabMgt commented 3 years ago

Same problem here. This issue is opened since 1 year :(

But the solution "Downgrade to adapter 0.5.0 and queries like 'kafka_minion_topic_partition_message_count{<<.LabelMatchers>>}' works for me.

s-urbaniak commented 3 years ago

cc @dgrisonnet maybe you can have a peek at this one? thank you!

dgrisonnet commented 3 years ago

If I understand this issue correctly, it only happens with external metrics that are not namespaced (don't have a namespace label) and it is caused by the namespace enforcement for external metrics.

Even though the problem surface with the use of labelSelectors, the issue seems to be related to the fact that the prometheus-adapter doesn't handle non-namespaced external metrics properly.

Recently, there have been some activities around this topic in this feature request: #324. The solution that was chosen was to allow users to specify whether the external metric is namespaced or not (https://github.com/DirectXMan12/k8s-prometheus-adapter/issues/324#issuecomment-720353976). For non-namespaced external metrics, the namespace enforcement would be dropped.

I am confident to say that adding the feature mentioned above should solve the problem we are seeing here.

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

fejta-bot commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

k8s-ci-robot commented 3 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/prometheus-adapter/issues/255#issuecomment-820708178): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
Borvik commented 4 months ago

I know this is old, but I'm adding this here in case someone else stumbles across this

We were having this issue too on v0.11.2 and apparently the query string parameter is either different from the custom metric side or it changed since the last documentation we found for querying for metrics.

We were using something like: ?metricLabelSelector=queue%3Dtest-reports

When we should have been using: ?labelSelector=queue%3Dtest-reports

Hope that helps anyone else who stumbles across this