Closed b2cc closed 2 months ago
This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
/help
@longwuyuan: This request has been marked as needing help from a contributor.
Please ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
@longwuyuan : thanks for your explanation.
Let's forget the Nodeport and external monitoring for a second, and just focus on the Prometheus service
. Even when deployed as per your documentation, the issue persists as soon as the ingress-nginx
deployment is scaled up. This seems to be a valid scenario, since even your helm chart implements an autoscaling mechanism.
Surely I'm not the first one to run more than one ingress pod on a kubernetes cluster?
Or the other way around: what is the supported scenario - must the ingress-nginx
deployment be run with only one replica? Is this defined somewhere?
@b2cc you are absolutely right. The multi pod scene is a grey area. Hence I tagged this as "help wanted" from a expert on integration.
There are multiple instances where have hinted at having a scaled out multi-pod/multi-replica being used with the Prometheus+Grafana combo. But its just that, a hint at having multiple replicas with no deep dive into the deployment and config of prom-grafana.
Next step is that someone needs to at least reproduce this and post the detailed data of the missing stuff from a multi replica environ. Although I think that most metrics will be available off of the leader.
@longwuyuan
I have a follow up question. How do I see the metrics for "tcp" services by port. I can see metrics for "http" traffic not for "tcp" traffic.
Any idea how to view that in prometheus?
@b2cc, @longwuyuan, why would be an issue having metrics per instance
/replica
?
Prometheus works fine with multiple instances of ingress-nginx
. Each replica will hold its own metrics, and you create a target for each replica. e.g., for 2 replicas, you will have 2 targets...... for n replicas, you will have n Prometheus targets.
Once the above conditions are met, Prometheus will start scrapping metrics per instance, and you will have metrics per instance ( the labels are different so you will have n metrics ).
e.g. for 2 replicas, you will have 2 metrics for nginx_ingress_controller_nginx_process_requests_total
.
nginx_ingress_controller_nginx_process_requests_total{container="controller", controller_class="k8s.io/ingress-nginx", controller_namespace="ingress-controller", controller_pod="nginx-external-ingress-nginx-controller-8f8dbf497-47znb", endpoint="http-metrics", instance="10.6.15.250:10254", job="nginx-external-ingress-nginx-controller-metrics", namespace="ingress-controller", pod="nginx-external-ingress-nginx-controller-8f8dbf497-47znb", service="nginx-external-ingress-nginx-controller-metrics"}
nginx_ingress_controller_nginx_process_requests_total{container="controller", controller_class="k8s.io/ingress-nginx", controller_namespace="ingress-controller", controller_pod="nginx-external-ingress-nginx-controller-8f8dbf497-lppc9", endpoint="http-metrics", instance="10.6.33.183:10254", job="nginx-external-ingress-nginx-controller-metrics", namespace="ingress-controller", pod="nginx-external-ingress-nginx-controller-8f8dbf497-lppc9", service="nginx-external-ingress-nginx-controller-metrics"}
Which is correct and allows you to monitor each instance, or you can simply monitor the cluster by doing a sum over the above metrics:
sum(irate(nginx_ingress_controller_requests{ controller_class=~"$controller_class", ingress=~"$ingress", namespace=~"$namespace", controller_pod=~"$pod"}[$__interval])) by (ingress)
This is how Prometheus works for any serviceMonitor
, going deeper would be a question better suited for the Prometheus Community since that knowledge is not in the Nginx scope.
Hopefully, this addresses your question.
They grey area in this issue is prometheus running outside the cluster. I have not had the time to run prometheus outside the cluster and have the scraping done on the controller replicas running inside the cluster.
@longwuyuan, as long you create the right targets, it doesn't matter where you run Prometheus, although running it outside the cluster seems overkill because you also need to address the following:
Why don't you just run Prometheus inside the cluster with remote-write to an external Prometheus cluster for centralized metrics?
Note: this is as deep as we should go since this is an Nginx repo and not Prometheus talk.
@b2cc, if @longwuyuan and my comments helped, please feel free to close the issue.
@marinflorin : thanks for adding to this issue!
I understand wo a certain degree what you mean, but my issue is exactly in what you mention when you write:
as long you create the right targets, it doesn't matter where you run Prometheus
Currently I created a service, but this doesn't seem to be the correct way to do it, because it just does round-robin over the ingress pods and I'm missing metrics.
Which kind or type of target
are you referring to in this case?
Hi,
Reading all the info here after a year now brings up a need to update here.
There are "agent" like configs from prometheus, demonstrated in the SAAS service offered by the Company Grafana Labs. This is a technique to push metrics to a external prometheus server. I think you should explore that. The reason being this work is out of the scope of the core Ingress-API specs and the project i snot able to support/maintain features and use-cases that are too far from the core Ingress-API specs & functionalities. There is a lack of resources like developer time.
For cross namespace use of prometheus, there is docs for serviceMonitor but for out-of-cluster prometheus, it will be helpful to get docs PR contributions. There is no resources to test & document that use-case and prometheus's native documentation is far far superior to any efforts this project can make.
Since there is no action-item this issue tracks now, I will close this issue because the project is requiring to limit the work on features far away from the Ingress-API, while releasing secure by default controller and implementing the Gateway-API.
/close
@longwuyuan: Closing this issue.
FYI: Since this is only a question, I tried to follow the "Support" link, but I couldn't sign up to Slack, so I'm creating this issue.
We have a basic/default setup of the ingress controller deployed via helm chart on OKD 4.12. Loadbalancing is supplied via metal-lb and everything works - apps are accessible and ingresses work as expected. For redundancy we currently run the deployment with 4 replicas, one of which receives all the traffic, and the three other ones just idling for fail-over purpose.
We're now in the process of setting up monitoring/metering with Prometheus as per the documentation (https://kubernetes.github.io/ingress-nginx/user-guide/monitoring/). Since Prometheus is also used for monitoring of other components we run it outside the cluster on a dedicated server. Therefore the service to expose the Prometheus endpoint is of type
NodePort
so Prometheus is able to scrape it from outside the cluster.This works and we can confirm by running
curl
on the/metrics
endpoint theNodePort
s that we can get some metrics. However most of the time we seem to hit an idle replica pod and most the metrics are empty due to the way the service loadbalancing works (roundrobin?). Only sometimes we hit the pod that is actually handling all the traffic and get the correct values.We scoured the documentation, but we couldn't find a way around this issue.
ingress-nginx
namespace for this to work?NGINX Ingress controller version
Kubernetes version (use
kubectl version
):Environment:
Openshift/OKD 4.12 cluster x86_64
How was the ingress-nginx-controller installed: installed via helm, no user-supplied values (default)
Current State of the controller:
oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-nginx-controller LoadBalancer 172.31.13.57 10.1.200.199 80:32439/TCP,443:30691/TCP 30d ingress-nginx-controller-admission ClusterIP 172.31.244.148 443/TCP 30d
nodeport-ingress-nginx-prometheus NodePort 172.31.29.28 10254:30254/TCP 102m
oc describe ingressclasses.networking.k8s.io Name: nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.5.1 helm.sh/chart=ingress-nginx-4.4.2 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx Controller: k8s.io/ingress-nginx Events:
Name: openshift-default Labels:
Annotations:
Controller: openshift.io/ingress-to-route
Parameters:
APIGroup: operator.openshift.io
Kind: IngressController
Name: default
oc describe pod ingress-nginx-controller-779798ff78-lhdcf Name: ingress-nginx-controller-779798ff78-lhdcf Namespace: ingress-nginx Priority: 0 Service Account: ingress-nginx Node: compute01/10.1.200.203 Start Time: Sun, 26 Feb 2023 19:26:09 +0100 Labels: app=ingress-nginx app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/name=ingress-nginx pod-template-hash=779798ff78 Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.132.0.214" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.132.0.214" ], "default": true, "dns": {} }] kubectl.kubernetes.io/restartedAt: 2023-02-26T19:24:52+01:00 openshift.io/scc: privileged Status: Running IP: 10.132.0.214 IPs: IP: 10.132.0.214 Controlled By: ReplicaSet/ingress-nginx-controller-779798ff78 Containers: controller: Container ID: cri-o://275eb7160f8b1c8c9b9b591be148dabd225dadd94fdbe43db88cd77eb1cf3f1c Image: registry.k8s.io/ingress-nginx/controller:v1.5.1@sha256:4ba73c697770664c1e00e9f968de14e08f606ff961c76e5d7033a4a9c593c629 Image ID: registry.k8s.io/ingress-nginx/controller@sha256:2f7551977e8553a50cd88e8175b1411acbef319f7040357b58be95e9b99c07e5 Ports: 80/TCP, 443/TCP, 8443/TCP, 10254/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP Args: /nginx-ingress-controller --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller --election-id=ingress-nginx-leader --controller-class=k8s.io/ingress-nginx --ingress-class=nginx --configmap=$(POD_NAMESPACE)/ingress-nginx-controller --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key State: Running Started: Sun, 26 Feb 2023 19:26:12 +0100 Ready: True Restart Count: 0 Requests: cpu: 100m memory: 90Mi Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5 Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAME: ingress-nginx-controller-779798ff78-lhdcf (v1:metadata.name) POD_NAMESPACE: ingress-nginx (v1:metadata.namespace) LD_PRELOAD: /usr/local/lib/libmimalloc.so TZ: Europe/Vienna Mounts: /usr/local/certificates/ from webhook-cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-r5djr (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: webhook-cert: Type: Secret (a volume populated by a Secret) SecretName: ingress-nginx-admission Optional: false kube-api-access-r5djr: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional:
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional:
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
node-role.kubernetes.io/worker=
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s