My Ingress nginx metrics lost some metrics，for example nginx_ingress_controller_requests

Yelijah commented 2 years ago

My Ingress nginx metrics lost some metrics，for example nginx_ingress_controller_requests。anyone can help me?

my helm chart version is 4.1.4, and ingress nginx controller version is 1.2.1.

here is my mrtics:

bash-5.1$ curl ingress-nginx-controller-metrics:10254/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.5283e-05
go_gc_duration_seconds{quantile="0.25"} 5.5251e-05
go_gc_duration_seconds{quantile="0.5"} 6.844e-05
go_gc_duration_seconds{quantile="0.75"} 9.29e-05
go_gc_duration_seconds{quantile="1"} 0.003785549
go_gc_duration_seconds_sum 1.258379343
go_gc_duration_seconds_count 8778
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 124
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.18.2"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 9.313608e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 3.30228904e+10
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.990465e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 2.81766537e+08
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 5.829968e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 9.313608e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 1.5474688e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.1558912e+07
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 65687
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 1.314816e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 2.70336e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.656937721905843e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 2.81832224e+08
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 4800
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 15600
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 164424
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 326400
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 1.3137984e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 571271
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 2.326528e+06
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 2.326528e+06
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 3.8093832e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 18
# HELP nginx_ingress_controller_admission_config_size The size of the tested configuration
# TYPE nginx_ingress_controller_admission_config_size gauge
nginx_ingress_controller_admission_config_size{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 36176
# HELP nginx_ingress_controller_admission_render_duration The processing duration of ingresses rendering by the admission controller (float seconds)
# TYPE nginx_ingress_controller_admission_render_duration gauge
nginx_ingress_controller_admission_render_duration{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 0
# HELP nginx_ingress_controller_admission_render_ingresses The length of ingresses rendered by the admission controller
# TYPE nginx_ingress_controller_admission_render_ingresses gauge
nginx_ingress_controller_admission_render_ingresses{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 3
# HELP nginx_ingress_controller_admission_roundtrip_duration The complete duration of the admission controller at the time to process a new event (float seconds)
# TYPE nginx_ingress_controller_admission_roundtrip_duration gauge
nginx_ingress_controller_admission_roundtrip_duration{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 0.033
# HELP nginx_ingress_controller_admission_tested_duration The processing duration of the admission controller tests (float seconds)
# TYPE nginx_ingress_controller_admission_tested_duration gauge
nginx_ingress_controller_admission_tested_duration{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 0.033
# HELP nginx_ingress_controller_admission_tested_ingresses The length of ingresses processed by the admission controller
# TYPE nginx_ingress_controller_admission_tested_ingresses gauge
nginx_ingress_controller_admission_tested_ingresses{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 3
# HELP nginx_ingress_controller_build_info A metric with a constant '1' labeled with information about the build.
# TYPE nginx_ingress_controller_build_info gauge
nginx_ingress_controller_build_info{build="08848d69e0c83992c89da18e70ea708752f21d7a",controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7",release="v1.2.1",repository="https://github.com/kubernetes/ingress-nginx"} 1
# HELP nginx_ingress_controller_check_success Cumulative number of Ingress controller syntax check operations
# TYPE nginx_ingress_controller_check_success counter
nginx_ingress_controller_check_success{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7",ingress="fr11",namespace="dev"} 4
nginx_ingress_controller_check_success{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7",ingress="grafana",namespace="dev"} 2
nginx_ingress_controller_check_success{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7",ingress="ingress-nginx-fr11",namespace="dev"} 5
nginx_ingress_controller_check_success{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7",ingress="rocketmq-dashboard",namespace="dev"} 1
# HELP nginx_ingress_controller_config_hash Running configuration hash actually running
# TYPE nginx_ingress_controller_config_hash gauge
nginx_ingress_controller_config_hash{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 2.238367303635776e+18
# HELP nginx_ingress_controller_config_last_reload_successful Whether the last configuration reload attempt was successful
# TYPE nginx_ingress_controller_config_last_reload_successful gauge
nginx_ingress_controller_config_last_reload_successful{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 1
# HELP nginx_ingress_controller_config_last_reload_successful_timestamp_seconds Timestamp of the last successful configuration reload.
# TYPE nginx_ingress_controller_config_last_reload_successful_timestamp_seconds gauge
nginx_ingress_controller_config_last_reload_successful_timestamp_seconds{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 1.656764886e+09
# HELP nginx_ingress_controller_leader_election_status Gauge reporting status of the leader election, 0 indicates follower, 1 indicates leader. 'name' is the string used to identify the lease
# TYPE nginx_ingress_controller_leader_election_status gauge
nginx_ingress_controller_leader_election_status{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7",name="ingress-controller-leader"} 1
# HELP nginx_ingress_controller_nginx_process_connections current number of client connections with state {active, reading, writing, waiting}
# TYPE nginx_ingress_controller_nginx_process_connections gauge
nginx_ingress_controller_nginx_process_connections{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7",state="active"} 2
nginx_ingress_controller_nginx_process_connections{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7",state="reading"} 0
nginx_ingress_controller_nginx_process_connections{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7",state="waiting"} 0
nginx_ingress_controller_nginx_process_connections{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7",state="writing"} 2
# HELP nginx_ingress_controller_nginx_process_connections_total total number of connections with state {accepted, handled}
# TYPE nginx_ingress_controller_nginx_process_connections_total counter
nginx_ingress_controller_nginx_process_connections_total{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7",state="accepted"} 168638
nginx_ingress_controller_nginx_process_connections_total{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7",state="handled"} 168638
# HELP nginx_ingress_controller_nginx_process_cpu_seconds_total Cpu usage in seconds
# TYPE nginx_ingress_controller_nginx_process_cpu_seconds_total counter
nginx_ingress_controller_nginx_process_cpu_seconds_total{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 275.87999999999556
# HELP nginx_ingress_controller_nginx_process_num_procs number of processes
# TYPE nginx_ingress_controller_nginx_process_num_procs gauge
nginx_ingress_controller_nginx_process_num_procs{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 10
# HELP nginx_ingress_controller_nginx_process_oldest_start_time_seconds start time in seconds since 1970/01/01
# TYPE nginx_ingress_controller_nginx_process_oldest_start_time_seconds gauge
nginx_ingress_controller_nginx_process_oldest_start_time_seconds{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 1.65623253e+09
# HELP nginx_ingress_controller_nginx_process_read_bytes_total number of bytes read
# TYPE nginx_ingress_controller_nginx_process_read_bytes_total counter
nginx_ingress_controller_nginx_process_read_bytes_total{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 3.09796864e+08
# HELP nginx_ingress_controller_nginx_process_requests_total total number of client requests
# TYPE nginx_ingress_controller_nginx_process_requests_total counter
nginx_ingress_controller_nginx_process_requests_total{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 186211
# HELP nginx_ingress_controller_nginx_process_resident_memory_bytes number of bytes of memory in use
# TYPE nginx_ingress_controller_nginx_process_resident_memory_bytes gauge
nginx_ingress_controller_nginx_process_resident_memory_bytes{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 2.88776192e+08
# HELP nginx_ingress_controller_nginx_process_virtual_memory_bytes number of bytes of memory in use
# TYPE nginx_ingress_controller_nginx_process_virtual_memory_bytes gauge
nginx_ingress_controller_nginx_process_virtual_memory_bytes{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 2.255822848e+09
# HELP nginx_ingress_controller_nginx_process_write_bytes_total number of bytes written
# TYPE nginx_ingress_controller_nginx_process_write_bytes_total counter
nginx_ingress_controller_nginx_process_write_bytes_total{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 6.3950848e+07
# HELP nginx_ingress_controller_ssl_certificate_info Hold all labels associated to a certificate
# TYPE nginx_ingress_controller_ssl_certificate_info gauge
nginx_ingress_controller_ssl_certificate_info{class="k8s.io/ingress-nginx",host="_",identifier="-191667271742597675198468937751592662221",issuer_common_name="Kubernetes Ingress Controller Fake Certificate",issuer_organization="Acme Co",namespace="",public_key_algorithm="RSA",secret_name="",serial_number="191667271742597675198468937751592662221"} 1
# HELP nginx_ingress_controller_ssl_expire_time_seconds Number of seconds since 1970 to the SSL Certificate expire.\n                   An example to check if this certificate will expire in 10 days is: "nginx_ingress_controller_ssl_expire_time_seconds < (time() + (10 * 24 * 3600))"
# TYPE nginx_ingress_controller_ssl_expire_time_seconds gauge
nginx_ingress_controller_ssl_expire_time_seconds{class="k8s.io/ingress-nginx",host="_",namespace="ingress-nginx",secret_name=""} 1.687768531e+09
# HELP nginx_ingress_controller_success Cumulative number of Ingress controller reload operations
# TYPE nginx_ingress_controller_success counter
nginx_ingress_controller_success{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7c4cb45568-smjw7"} 25
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 548.58
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 65536
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 39
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 3.7924864e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.65623253008e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 7.61339904e+08
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 2650
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

longwuyuan commented 2 years ago

/kind bug /triage accepted

@Yelijah, can you prove/show that the specific metric was available earlier..

I wonder if the label has changed.

@ekovacs , Hi, do you have any thoughts on this. Simple search showed #8225 so asking

Yelijah commented 2 years ago

/kind bug /triage accepted

@Yelijah, can you prove/show that the specific metric was available earlier..

I wonder if the label has changed.

@ekovacs , Hi, do you have any thoughts on this. Simple search showed #8225 so asking

It never succeed。。。

Yelijah commented 2 years ago

my prometheus and ingress-nginx are in different namespace，is it a problem？ But I use curl to get metics, such as: 'curl ingress-nginx-controller-metrics:10254/metrics', it lost 'nginx_ingress_controller_requests' as well

longwuyuan commented 2 years ago

@Yelijah , you should not see any metrics if reaching the /metrics endpoint was a problem.

The question here is if that label/metric was ever working in the first place. Maybe we need to try installing a older version of the controller before #8225. So I asked if you were seeing the metric before.

Yelijah commented 2 years ago

@longwuyuan this metcis:nginx_ingress_controller_requests is used by grafana dashboard(https://github.com/kubernetes/ingress-nginx/blob/main/deploy/grafana/dashboards/nginx.json)

longwuyuan commented 2 years ago

That I know.

I asked hoping to establish if #8225 has changed anything related to that metric or if it was not available by that label, even before #8255. Maybe we should try to install a version of controller before #8225 and check.

@ekovacs please comment if/when possible.

ekovacs commented 2 years ago

hi without #8225 the logs were flooded by error logs of: inconsistent label cardinality errors besides the fact that the metrics was not initialized/available.

my pr was basically fixing an incomplete earlier pr (#8201) that introduced that bug (expecting 6 metrics if i recall correctly, but was not updated to provide all 6, but only the original 4ish i think, there was no test catching it at that time).

while creating #8225 besides the above issues i also found some other metric code related issues while writing tests for it, that i incorporated the fix for and i think i commented those on the pr.

i think something other than #8225 must be at play here, as it was tested thoroughly and also while providing a fix i also made sure to cover it with a regression test.

Yelijah commented 2 years ago

I change my chart to 4.0.16, ingress nginx controller to v1.1.1. But i stil lost some metrics as title says Here is my chart values:

commonLabels: {}

controller:
  name: controller
  image:
    registry: docker.io
    image: yelijah/ingress-nginx-controller
    tag: "v1.1.1"
    digest:
    pullPolicy: IfNotPresent
    runAsUser: 101
    allowPrivilegeEscalation: true
  existingPsp: ""
  containerName: controller
  containerPort:
    http: 80
    https: 443
  config: {}
  configAnnotations: {}
  proxySetHeaders: {}
  addHeaders: {}
  dnsConfig: {}
  hostname: {}
  dnsPolicy: ClusterFirst
  reportNodeInternalIp: false
  watchIngressWithoutClass: false
  ingressClassByName: false
  allowSnippetAnnotations: true
  hostNetwork: false
  hostPort:
    enabled: false
    ports:
      http: 80
      https: 443
  electionID: ingress-controller-leader
  ingressClassResource:
    name: nginx
    enabled: true
    default: false
    controllerValue: "k8s.io/ingress-nginx"
    parameters: {}
  podLabels: {}
  podSecurityContext: {}
  sysctls: {}
  publishService:
    enabled: true
    pathOverride: ""
  scope:
    enabled: false
    namespace: ""
    namespaceSelector: ""
  configMapNamespace: ""

  tcp:
    configMapNamespace: ""
    annotations: {}

  udp:
    configMapNamespace: ""
    annotations: {}
  maxmindLicenseKey: ""
  extraArgs: {}
  extraEnvs: []
  kind: Deployment
  annotations: {}
  labels: {}
  updateStrategy: {}
  minReadySeconds: 0
  tolerations: []
  affinity: {}
  topologySpreadConstraints: []
  terminationGracePeriodSeconds: 300
  nodeSelector:
    kubernetes.io/os: linux
  livenessProbe:
    httpGet:
      path: "/healthz"
      port: 10254
      scheme: HTTP
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 1
    successThreshold: 1
    failureThreshold: 5
  readinessProbe:
    httpGet:
      path: "/healthz"
      port: 10254
      scheme: HTTP
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 1
    successThreshold: 1
    failureThreshold: 3
  healthCheckPath: "/healthz"
  healthCheckHost: ""
  podAnnotations: {}

  replicaCount: 1

  minAvailable: 1
  resources:
    requests:
      cpu: 100m
      memory: 90Mi
  autoscaling:
    enabled: false
    minReplicas: 1
    maxReplicas: 11
    targetCPUUtilizationPercentage: 50
    targetMemoryUtilizationPercentage: 50
    behavior: {}

  autoscalingTemplate: []
  keda:
    apiVersion: "keda.sh/v1alpha1"
    enabled: false
    minReplicas: 1
    maxReplicas: 11
    pollingInterval: 30
    cooldownPeriod: 300
    restoreToOriginalReplicaCount: false
    scaledObject:
      annotations: {}
    triggers: []

    behavior: {}
  enableMimalloc: true
  customTemplate:
    configMapName: ""
    configMapKey: ""

  service:
    enabled: true
    appProtocol: true

    annotations: {}
    labels: {}
    externalIPs: []
    loadBalancerSourceRanges: []

    enableHttp: true
    enableHttps: true
    ipFamilyPolicy: "SingleStack"
    ipFamilies:
      - IPv4

    ports:
      http: 80
      https: 443

    targetPorts:
      http: http
      https: https

    type: LoadBalancer
    nodePorts:
      http: "80"
      https: "443"
      tcp: {}
      udp: {}

    external:
      enabled: true

    internal:
      enabled: false
      annotations: {}
      loadBalancerSourceRanges: []
  extraContainers: []
  extraVolumeMounts: []
  extraVolumes: []
  extraInitContainers: []

  extraModules: []

  admissionWebhooks:
    annotations: {}
    enabled: false
    failurePolicy: Fail
    port: 8443
    certificate: "/usr/local/certificates/cert"
    key: "/usr/local/certificates/key"
    namespaceSelector: {}
    objectSelector: {}
    labels: {}
    existingPsp: ""

    service:
      annotations: {}
      externalIPs: []
      loadBalancerSourceRanges: []
      servicePort: 443
      type: ClusterIP

    createSecretJob:
      resources: {}

    patchWebhookJob:
      resources: {}

    patch:
      enabled: true
      image:
        registry: k8s.gcr.io
        image: ingress-nginx/kube-webhook-certgen
        tag: v1.1.1
        digest: sha256:64d8c73dca984af206adf9d6d7e46aa550362b1d7a01f3a0a91b20cc67868660
        pullPolicy: IfNotPresent
      priorityClassName: ""
      podAnnotations: {}
      nodeSelector:
        kubernetes.io/os: linux
      tolerations: []
      labels: {}
      runAsUser: 2000

  metrics:
    port: 10254
    enabled: true

    service:
      annotations: {}
      externalIPs: []
      loadBalancerSourceRanges: []
      servicePort: 10254
      type: ClusterIP

    serviceMonitor:
      enabled: false
      additionalLabels: {}
      namespace: ""
      namespaceSelector: {}
      scrapeInterval: 30s
      targetLabels: []
      relabelings: []
      metricRelabelings: []

    prometheusRule:
      enabled: false
      additionalLabels: {}
      rules: []
  lifecycle:
    preStop:
      exec:
        command:
          - /wait-shutdown

  priorityClassName: ""
revisionHistoryLimit: 10
defaultBackend:
  enabled: false

  name: defaultbackend
  image:
    registry: k8s.gcr.io
    image: defaultbackend-amd64
    tag: "1.5"
    pullPolicy: IfNotPresent
    runAsUser: 65534
    runAsNonRoot: true
    readOnlyRootFilesystem: true
    allowPrivilegeEscalation: false
  existingPsp: ""

  extraArgs: {}

  serviceAccount:
    create: true
    name: ""
    automountServiceAccountToken: true
  extraEnvs: []

  port: 8080
  livenessProbe:
    failureThreshold: 3
    initialDelaySeconds: 30
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 5
  readinessProbe:
    failureThreshold: 6
    initialDelaySeconds: 0
    periodSeconds: 5
    successThreshold: 1
    timeoutSeconds: 5
  tolerations: []

  affinity: {}
  podSecurityContext: {}
  containerSecurityContext: {}
  podLabels: {}
  nodeSelector:
    kubernetes.io/os: linux
  podAnnotations: {}

  replicaCount: 1

  minAvailable: 1

  resources: {}

  extraVolumeMounts: []

  extraVolumes: []

  autoscaling:
    annotations: {}
    enabled: false
    minReplicas: 1
    maxReplicas: 2
    targetCPUUtilizationPercentage: 50
    targetMemoryUtilizationPercentage: 50

  service:
    annotations: {}
    externalIPs: []
    loadBalancerSourceRanges: []
    servicePort: 80
    type: ClusterIP

  priorityClassName: ""
  labels: {}
rbac:
  create: true
  scope: false
podSecurityPolicy:
  enabled: false

serviceAccount:
  create: true
  name: ""
  automountServiceAccountToken: true
imagePullSecrets: []
tcp: {}
udp: {}
dhParam:

Actually my chart‘s value is mostly default. Anyone has ideas?

Yelijah commented 2 years ago

my image is froked from k8s.gcr.io, and i just tag it without any change in content

longwuyuan commented 2 years ago

I will wait for comments from @ekovacs

ekovacs commented 2 years ago

@longwuyuan let me take a deeper look at the current codebase. i'll report back ASAP.

Yelijah commented 2 years ago

@longwuyuan @ekovacs I'am afraid my charts value has some problem, because I change my chart to 4.0.16, ingress nginx controller to v1.1, it also lost. Can u check out my chart values avove . Thank u very much! By the why，my k8s version is 2.4.2, the latest one.

ekovacs commented 2 years ago

@Yelijah , @longwuyuan my findings so far:

all of socket.go 's metrics are missing. these metrics are kept track in this socket.go and are provided by monitor.lua (if i understand this correctly)
when i added a few more debug logs, it showed that the collection in Collect does happen, but it seems that the metrics themselves are not present yet
if i force initialise one of those metrics, eg.: _, _ = sc.requests.GetMetricWithLabelValues("", "", "", "", "", "", "", "") then the metric appears when i curl the metrics endpoint.

so i think somehow the sending of the metrics from send https://github.com/kubernetes/ingress-nginx/blob/f85c3866d8135d698fe6a2753b1ed17d89a9efa0/rootfs/etc/nginx/lua/monitor.lua#L28 does not make it to handleMessage: https://github.com/kubernetes/ingress-nginx/blob/f85c3866d8135d698fe6a2753b1ed17d89a9efa0/internal/ingress/metric/collectors/socket.go#L251

which in turn never initialise the metrics and thus they never appear in the /metrics endpoint.

one thing i verified, is that i have monitor related lua (https://github.com/kubernetes/ingress-nginx/blob/2852e2998cbfb8c89f1b3d61de8ed03e0a1d0134/rootfs/etc/nginx/template/nginx.tmpl#L101-L106) in my nginx.conf.

Yelijah commented 2 years ago

@ekovacs Thank for your time. It seem's to be a bug? which version can i rollback to avoid this problem? I tryed ingress nginx controllerv1.1.1, it also failed. Is it related to the k8s version?

longwuyuan commented 2 years ago

Yes, its a bug. Now we need a developer to work on it, but there is acute shortage of developer time.

Lets wait and see.

/priority important-longterm

longwuyuan commented 2 years ago

/area stabilization

@strongjz this needs Project Stabilisation tag

ekovacs commented 2 years ago

@longwuyuan I'm on holiday now so i have some time to invest here :). I'll try to come up with a solution/fix.

longwuyuan commented 2 years ago

@ekovacs wow, that will be so helpful. Thanks. Look forward to it. I wonder where this broke.

ekovacs commented 2 years ago

@longwuyuan i managed to spend some time with this. the good news is that it is not broken / there is no bug.

[x] monitor.lua is loaded in the config, timer that is set up to flush data is called peridically 👍
[x] when a request comes in monitor.call() is called, the metrics are tracked in its metrics table 👍
[x] when the time comes flush() & send() is called 👍
[x] on the other side (in socket.go's handleMessage is called) 👍
[x] and then i can see all of the metrics that we were missing when calling :10254/metrics 👍

i think the culprit may be this for @Yelijah (and for me, when i tried to verify things on local kind cluster): https://github.com/kubernetes/ingress-nginx/blob/f85c3866d8135d698fe6a2753b1ed17d89a9efa0/internal/ingress/metric/collectors/socket.go#L263-L264

this makes sure that metrics are not tracked for hosts that are not explicitly mentioned in the Ingress objects. when the host: localhost is set in the ingress, eg.:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prometheus
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx
  rules:
  - host: localhost
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-server
            port:
              number: 9090

then the host localhost will be amongs the hosts that metrics are tracked for, and then there will not be any missing metrics.

BUT, without the host field present on the ingress, i got this in the ingress-controller's log:

I0707 13:47:20.544710      10 socket.go:258] "Metric" message="[{\"host\":\"localhost\",\"ingress\":\"\",\"method\":\"GET\",\"canary\":\"\",\"requestLength\":717,\"namespace\":\"\",\"status\":\"404\",\"upstreamResponseTime\":0.014,\"responseLength\":681,\"requestTime\":0.014,\"upstreamLatency\":0.014,\"upstreamHeaderTime\":0.014,\"service\":\"\",\"path\":\"\",\"upstreamResponseLength\":548}]"
I0707 13:47:20.546645      10 socket.go:270] "Skipping metric for host not being served" host="localhost"

@Yelijah can you verify that you see this log message in your log? (note the line with "Metric" message=" is at level 5 log, and the Skipping metric for is at level 3 log)

to change the logging level for you ingress-controller, please adjust the args parameters in the deployment, eg.: Screenshot 2022-07-07 at 16 51 12

longwuyuan commented 2 years ago

ok. That is v v v helpful info @ekovacs . But when I tested, I did have a ingress with the host field having a fqdn value, and yet at least the metric nginx_ingress_controller_requests was missing. I will test again as per your latest update. But yeah, we need to get to the bottom of this.

Yelijah commented 2 years ago

@ekovacs u are right, when i have a ingress with the host field, i can get all metrics including nginx_ingress_controller_requests, but how can i skip this limit, or how can i get this metic in all hosts and ip? because i can't limit host...

Yelijah commented 2 years ago

@ekovacs @longwuyuan Thank you for your time, i add arg metrics-per-host=false, then it fixed!

longwuyuan commented 2 years ago

Can you copy/paste that flag and also your curl. I still can not see it ;

% k -n ingress-nginx get po ingress-nginx-controller-7d94447c49-78sn9 -o yaml| grep -i metric -B10
  - args:
    - /nginx-ingress-controller
    - --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
    - --election-id=ingress-controller-leader
    - --controller-class=k8s.io/ingress-nginx
    - --ingress-class=nginx
    - --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
    - --validating-webhook=:8443
    - --validating-webhook-certificate=/usr/local/certificates/cert
    - --validating-webhook-key=/usr/local/certificates/key
    - --metrics-per-host=false

% k -n ingress-nginx exec -ti ingress-nginx-controller-7d94447c49-78sn9 -- curl localhost:10254/metrics | grep -i requests
# HELP nginx_ingress_controller_nginx_process_requests_total total number of client requests
# TYPE nginx_ingress_controller_nginx_process_requests_total counter
nginx_ingress_controller_nginx_process_requests_total{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7d94447c49-78sn9"} 74
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 10
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

Yelijah commented 2 years ago

Can you copy/paste that flag and also your curl. I still can not see it ;

% k -n ingress-nginx get po ingress-nginx-controller-7d94447c49-78sn9 -o yaml| grep -i metric -B10
  - args:
    - /nginx-ingress-controller
    - --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
    - --election-id=ingress-controller-leader
    - --controller-class=k8s.io/ingress-nginx
    - --ingress-class=nginx
    - --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
    - --validating-webhook=:8443
    - --validating-webhook-certificate=/usr/local/certificates/cert
    - --validating-webhook-key=/usr/local/certificates/key
    - --metrics-per-host=false

% k -n ingress-nginx exec -ti ingress-nginx-controller-7d94447c49-78sn9 -- curl localhost:10254/metrics | grep -i requests
# HELP nginx_ingress_controller_nginx_process_requests_total total number of client requests
# TYPE nginx_ingress_controller_nginx_process_requests_total counter
nginx_ingress_controller_nginx_process_requests_total{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7d94447c49-78sn9"} 74
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 10
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

Here is my args but they seem to have nothing different from yours.

args:
/nginx-ingress-controller
--publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
--election-id=ingress-controller-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/ingress-nginx-controller
--metrics-per-host=false

my curl result is:

[root@k8s ~]# kubectl exec -it -n dev alphine-extra sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # curl ingress-nginx-controller-metrics:10254/metrics|grep -i requests
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP nginx_ingress_controller_nginx_process_requests_total total number of client requests
# TYPE nginx_ingress_controller_nginx_process_requests_total counter
nginx_ingress_controller_nginx_process_requests_total{controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6"} 8296
# HELP nginx_ingress_controller_requests The total number of client requests.
# TYPE nginx_ingress_controller_requests counter
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="",method="GET",namespace="",path="",service="",status="404"} 15
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="GET",namespace="dev",path="/grafana",service="grafana",status="101"} 7
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="GET",namespace="dev",path="/grafana",service="grafana",status="200"} 290
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="GET",namespace="dev",path="/grafana",service="grafana",status="302"} 1
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="GET",namespace="dev",path="/grafana",service="grafana",status="304"} 20
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="GET",namespace="dev",path="/grafana",service="grafana",status="499"} 3
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="GET",namespace="dev",path="/grafana",service="grafana",status="500"} 1
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="POST",namespace="dev",path="/grafana",service="grafana",status="200"} 1908
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="POST",namespace="dev",path="/grafana",service="grafana",status="400"} 20
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="POST",namespace="dev",path="/grafana",service="grafana",status="499"} 13
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="POST",namespace="dev",path="/grafana",service="grafana",status="500"} 5
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="PUT",namespace="dev",path="/grafana",service="grafana",status="200"} 2
100  237k    0  237k    0     0  7424k      0 --:--:-- --:--:-- --:--:-- 7424k
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 412
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

By the way , i have no nginx_ingress_controller_requests at first. But when i access any my ingress by curl or browser, these metrics appear

longwuyuan commented 2 years ago

ok, I see it now.

I think we should document this because it will be hard for others to find. We should add this in the monitoring docs.

Yelijah commented 2 years ago

Thank you for your time again!

banbridge commented 11 months ago

好的，我现在看到了。

我认为我们应该记录这一点，因为其他人很难找到。我们应该将其添加到监控文档中。

i have the same problem.

`$ kubectl get pod ingress-nginx-controller-5d95b8fd78-ffztb -o yaml| grep -i metric -B10

args:
- /nginx-ingress-controller
- --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
- --election-id=ingress-nginx-leader
- --controller-class=k8s.io/ingress-nginx
- --ingress-class=nginx
- --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
- --validating-webhook=:8443
- --validating-webhook-certificate=/usr/local/certificates/cert
- --validating-webhook-key=/usr/local/certificates/key
- --metrics-per-host=false
  
  timeoutSeconds: 1 name: controller ports:
- containerPort: 80 name: http protocol: TCP
- containerPort: 443 name: https protocol: TCP
- containerPort: 10254 name: metrics`

$ kubectl exec -ti ingress-nginx-controller-5d95b8fd78-ffztb -- curl localhost:10254/metrics | grep -i requests nginx_ingress_controller_nginx_process_requests_total{controller_class="k8s.io/ingress-nginx",controller_namespace="default",controller_pod="ingress-nginx-controller-5d95b8fd78-ffztb"} 99 promhttp_metric_handler_requests_in_flight 1 promhttp_metric_handler_requests_total{code="200"} 1 promhttp_metric_handler_requests_total{code="500"} 0

kubernetes / ingress-nginx

My Ingress nginx metrics lost some metrics，for example nginx_ingress_controller_requests #8782

--metrics-per-host=false