Resource-scope metric endpoints

mrueg commented 2 years ago

What would you like to be added: Currently kube-state-metrics offers a single endpoint to gather all metrics. Ideally, there would be a way to offer multiple endpoints or a filter on the endpoint to limit metrics.

Why is this needed: As a user I want to be able to scrape specific metrics at different intervals to reduce the resource usage and amount of metrics generated per scrape.
Describe the solution you'd like Either multiple endpoints could be introduced e.g. host:port/ingress/metrics or a filter host:port/metrics?filter=ingress could be introduced. The first option might be a bit easier to configure in Prometheus, the second option is more flexible if we ever want to allow advanced filtering (e.g. "only resources with this label").

This way users can define different scraping intervals and probably add some more user specific changes to a resource.

Additional context This allows more opportunities to reduce cost on kube-state-metrics as there's a more efficient way to only scrape a subset of metrics instead of ingesting all of them. https://github.com/kubernetes/kube-state-metrics#a-note-on-costing

liggitt commented 2 years ago

/remove-label api-review /kind api-change

liggitt commented 2 years ago

(relabeling, api-review indicates a design or PR is ready for API review)

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

fpetkovski commented 2 years ago

/remove-lifecycle stale

Serializator commented 2 years ago

I dove into this and found out that KSM uses reflectors to populate and re-sync the metric store with metrics from the server's contents. Which is an interesting discovery by the way and personally helps me a lot to understand KSM more!

"reduce the resource usage"; Adding per-request based filtering wouldn't impact resource usage significantly since the metric store is kept up-to-date either way. This would only really impact the response size I think?

"amount of metrics generated per scrape"; Prometheus can drop metrics at scrape time, allowing the end-user to configure different scrape configurations with different scrape interval which scrape different metrics. This would most-likely benefit metrics with a high cardinality which you might want to scrape less often than others. This can be achieved without per-request based filtering.

//cc @mrueg @fpetkovski; what do you think? I might be missing something so please tell me if that's the case!

fpetkovski commented 2 years ago

The way I understand it, KSM can generate very large responses which have to be fetched and parsed by Prometheus (or some other compatible scrape client). In large clusters, response sizes can be in the hundreds of megabytes.

Having client-side filtering of metrics would help increase scraping performance when a subset of metrics needs to be scraped.

Serializator commented 2 years ago

Got it. I approached it from KSM's perspective and not from Prometheus's.

rexagod commented 2 years ago

/assign

rexagod commented 2 years ago

(@Serializator feel free to assign this to yourself if you're currently working on this)

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rexagod commented 1 year ago

/remove-lifecycle stale /label lifecycle-frozen

k8s-ci-robot commented 1 year ago

@rexagod: The label(s) /label lifecycle-frozen cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to [this](https://github.com/kubernetes/kube-state-metrics/issues/1690#issuecomment-1421994602): >/remove-lifecycle stale >/label lifecycle-frozen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

rexagod commented 1 year ago

/lifecycle frozen

rexagod commented 1 year ago

@mrueg I'm trying to establish the format the parameters will be passed in. By host:port/metrics?filter=ingress, do you mean host:port/metrics?filter=kind?

Would host:port/metrics?group=foo&kind=baz&filter=[metric_name] (version intentially left blank so it'd cover all of them, also metric_name is there to filter the same metrics for same GVK, in case of G** resolution) be better?

kubernetes / kube-state-metrics

Resource-scope metric endpoints #1690