kubernetes / ingress-nginx

Ingress-NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.03k stars 8.16k forks source link

Remove old ingress-rules metrics for prometheus scraping #11047

Open SilentEntity opened 5 months ago

SilentEntity commented 5 months ago

What happened:

Once you update the ingress rule. The Ingress controller is still providing metrics for old rules (plus new rules), which increases cardinality and generates not-useful (dumb) data (for old removed rules) while Prometheus scrapes on the pod.

What you expected to happen:

Once the rules are updated or removed, the metrics from the old data should be removed, which reduces the cardinality and avoids providing not-useful data (for old removed/updated rules).

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

Kubernetes version (use kubectl version): Not relevant

Environment:

How to reproduce this issue:

Add 100 rules, update the same rule, or reduce them to 10. The Ingress controller will provide the metrics data for old and new rules.

Increase in cardinality:

cat metrics | grep -v "#" |cut -d "{" -f1  | sort | uniq -c | sort -rn | head -n40
3048 nginx_ingress_controller_request_duration_seconds_bucket
2988 nginx_ingress_controller_response_duration_seconds_bucket
2988 nginx_ingress_controller_connect_duration_seconds_bucket
2820 nginx_ingress_controller_header_duration_seconds_bucket
2794 nginx_ingress_controller_response_size_bucket
2794 nginx_ingress_controller_request_size_bucket
2032 nginx_ingress_controller_bytes_sent_bucket
 254 nginx_ingress_controller_response_size_sum
 254 nginx_ingress_controller_response_size_count
 254 nginx_ingress_controller_requests
 254 nginx_ingress_controller_request_size_sum
 254 nginx_ingress_controller_request_size_count
 254 nginx_ingress_controller_request_duration_seconds_sum
 254 nginx_ingress_controller_request_duration_seconds_count
 254 nginx_ingress_controller_bytes_sent_sum
 254 nginx_ingress_controller_bytes_sent_count
 249 nginx_ingress_controller_response_duration_seconds_sum
 249 nginx_ingress_controller_response_duration_seconds_count
 249 nginx_ingress_controller_connect_duration_seconds_sum
 249 nginx_ingress_controller_connect_duration_seconds_count
 235 nginx_ingress_controller_header_duration_seconds_sum
 235 nginx_ingress_controller_header_duration_seconds_count

After you restart the pod:

cat metrics | grep -v "#" |cut -d "{" -f1  | sort | uniq -c | sort -rn | head -n40
 288 nginx_ingress_controller_response_duration_seconds_bucket
 288 nginx_ingress_controller_request_duration_seconds_bucket
 288 nginx_ingress_controller_header_duration_seconds_bucket
 288 nginx_ingress_controller_connect_duration_seconds_bucket
 264 nginx_ingress_controller_response_size_bucket
 264 nginx_ingress_controller_request_size_bucket
 192 nginx_ingress_controller_bytes_sent_bucket
  24 nginx_ingress_controller_response_size_sum
  24 nginx_ingress_controller_response_size_count
  24 nginx_ingress_controller_response_duration_seconds_sum
  24 nginx_ingress_controller_response_duration_seconds_count
  24 nginx_ingress_controller_requests
  24 nginx_ingress_controller_request_size_sum
  24 nginx_ingress_controller_request_size_count
  24 nginx_ingress_controller_request_duration_seconds_sum
  24 nginx_ingress_controller_request_duration_seconds_count
  24 nginx_ingress_controller_header_duration_seconds_sum
  24 nginx_ingress_controller_header_duration_seconds_count
  24 nginx_ingress_controller_connect_duration_seconds_sum
  24 nginx_ingress_controller_connect_duration_seconds_count
  24 nginx_ingress_controller_bytes_sent_sum
  24 nginx_ingress_controller_bytes_sent_count
  21 nginx_ingress_controller_ingress_upstream_latency_seconds
  19 nginx_ingress_controller_orphan_ingress
   7 nginx_ingress_controller_ingress_upstream_latency_seconds_sum
   7 nginx_ingress_controller_ingress_upstream_latency_seconds_count

Anything else we need to know:

k8s-ci-robot commented 5 months ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 5 months ago

/help

@SilentEntity thanks for reporting this.

So I don't think this is a bug unless we can discuss and triage it to be a bug. So lets wait for expert comments and opinions

/assign

k8s-ci-robot commented 5 months ago

@longwuyuan: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/11047): >/help > >@SilentEntity thanks for reporting this. > >- Yes, you are right and this has been going on for a long time >- Another typical example is expired cert will continue showing up, even after the related ingress is deleted >- But personally I am waiting for clarity from someone on the aspect of the data being a timeseries. The context being, the old rule metrics being present and the metrics from a deleted ingress's cert being present are timeseries data that a user may continue to view in grafana (or get from raw prometheus), in future > >So I don't think this is a bug unless we can discuss and triage it to be a bug. So lets wait for expert comments and opinions > >/assign Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 5 months ago

/remove-kind bug

github-actions[bot] commented 4 months ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

SilentEntity commented 3 months ago

Old or expired metrics data, anyhow won't be present in the new pod(while scaling) or restarted pod which will create discrepancies in the metrics or grafana dashboard.

jakuboskera commented 3 months ago

+1