Closed Yelijah closed 2 years ago
/kind bug /triage accepted
@Yelijah, can you prove/show that the specific metric was available earlier..
I wonder if the label has changed.
@ekovacs , Hi, do you have any thoughts on this. Simple search showed #8225 so asking
/kind bug /triage accepted
@Yelijah, can you prove/show that the specific metric was available earlier..
I wonder if the label has changed.
@ekovacs , Hi, do you have any thoughts on this. Simple search showed #8225 so asking
It never succeed。。。
my prometheus and ingress-nginx are in different namespace,is it a problem? But I use curl to get metics, such as: 'curl ingress-nginx-controller-metrics:10254/metrics', it lost 'nginx_ingress_controller_requests' as well
@Yelijah , you should not see any metrics if reaching the /metrics endpoint was a problem.
The question here is if that label/metric was ever working in the first place. Maybe we need to try installing a older version of the controller before #8225. So I asked if you were seeing the metric before.
@longwuyuan this metcis:nginx_ingress_controller_requests is used by grafana dashboard(https://github.com/kubernetes/ingress-nginx/blob/main/deploy/grafana/dashboards/nginx.json)
That I know.
I asked hoping to establish if #8225 has changed anything related to that metric or if it was not available by that label, even before #8255. Maybe we should try to install a version of controller before #8225 and check.
@ekovacs please comment if/when possible.
hi without #8225 the logs were flooded by error logs of: inconsistent label cardinality errors besides the fact that the metrics was not initialized/available.
my pr was basically fixing an incomplete earlier pr (#8201) that introduced that bug (expecting 6 metrics if i recall correctly, but was not updated to provide all 6, but only the original 4ish i think, there was no test catching it at that time).
while creating #8225 besides the above issues i also found some other metric code related issues while writing tests for it, that i incorporated the fix for and i think i commented those on the pr.
i think something other than #8225 must be at play here, as it was tested thoroughly and also while providing a fix i also made sure to cover it with a regression test.
I change my chart to 4.0.16, ingress nginx controller to v1.1.1. But i stil lost some metrics as title says Here is my chart values:
commonLabels: {}
controller:
name: controller
image:
registry: docker.io
image: yelijah/ingress-nginx-controller
tag: "v1.1.1"
digest:
pullPolicy: IfNotPresent
runAsUser: 101
allowPrivilegeEscalation: true
existingPsp: ""
containerName: controller
containerPort:
http: 80
https: 443
config: {}
configAnnotations: {}
proxySetHeaders: {}
addHeaders: {}
dnsConfig: {}
hostname: {}
dnsPolicy: ClusterFirst
reportNodeInternalIp: false
watchIngressWithoutClass: false
ingressClassByName: false
allowSnippetAnnotations: true
hostNetwork: false
hostPort:
enabled: false
ports:
http: 80
https: 443
electionID: ingress-controller-leader
ingressClassResource:
name: nginx
enabled: true
default: false
controllerValue: "k8s.io/ingress-nginx"
parameters: {}
podLabels: {}
podSecurityContext: {}
sysctls: {}
publishService:
enabled: true
pathOverride: ""
scope:
enabled: false
namespace: ""
namespaceSelector: ""
configMapNamespace: ""
tcp:
configMapNamespace: ""
annotations: {}
udp:
configMapNamespace: ""
annotations: {}
maxmindLicenseKey: ""
extraArgs: {}
extraEnvs: []
kind: Deployment
annotations: {}
labels: {}
updateStrategy: {}
minReadySeconds: 0
tolerations: []
affinity: {}
topologySpreadConstraints: []
terminationGracePeriodSeconds: 300
nodeSelector:
kubernetes.io/os: linux
livenessProbe:
httpGet:
path: "/healthz"
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: "/healthz"
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 3
healthCheckPath: "/healthz"
healthCheckHost: ""
podAnnotations: {}
replicaCount: 1
minAvailable: 1
resources:
requests:
cpu: 100m
memory: 90Mi
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 11
targetCPUUtilizationPercentage: 50
targetMemoryUtilizationPercentage: 50
behavior: {}
autoscalingTemplate: []
keda:
apiVersion: "keda.sh/v1alpha1"
enabled: false
minReplicas: 1
maxReplicas: 11
pollingInterval: 30
cooldownPeriod: 300
restoreToOriginalReplicaCount: false
scaledObject:
annotations: {}
triggers: []
behavior: {}
enableMimalloc: true
customTemplate:
configMapName: ""
configMapKey: ""
service:
enabled: true
appProtocol: true
annotations: {}
labels: {}
externalIPs: []
loadBalancerSourceRanges: []
enableHttp: true
enableHttps: true
ipFamilyPolicy: "SingleStack"
ipFamilies:
- IPv4
ports:
http: 80
https: 443
targetPorts:
http: http
https: https
type: LoadBalancer
nodePorts:
http: "80"
https: "443"
tcp: {}
udp: {}
external:
enabled: true
internal:
enabled: false
annotations: {}
loadBalancerSourceRanges: []
extraContainers: []
extraVolumeMounts: []
extraVolumes: []
extraInitContainers: []
extraModules: []
admissionWebhooks:
annotations: {}
enabled: false
failurePolicy: Fail
port: 8443
certificate: "/usr/local/certificates/cert"
key: "/usr/local/certificates/key"
namespaceSelector: {}
objectSelector: {}
labels: {}
existingPsp: ""
service:
annotations: {}
externalIPs: []
loadBalancerSourceRanges: []
servicePort: 443
type: ClusterIP
createSecretJob:
resources: {}
patchWebhookJob:
resources: {}
patch:
enabled: true
image:
registry: k8s.gcr.io
image: ingress-nginx/kube-webhook-certgen
tag: v1.1.1
digest: sha256:64d8c73dca984af206adf9d6d7e46aa550362b1d7a01f3a0a91b20cc67868660
pullPolicy: IfNotPresent
priorityClassName: ""
podAnnotations: {}
nodeSelector:
kubernetes.io/os: linux
tolerations: []
labels: {}
runAsUser: 2000
metrics:
port: 10254
enabled: true
service:
annotations: {}
externalIPs: []
loadBalancerSourceRanges: []
servicePort: 10254
type: ClusterIP
serviceMonitor:
enabled: false
additionalLabels: {}
namespace: ""
namespaceSelector: {}
scrapeInterval: 30s
targetLabels: []
relabelings: []
metricRelabelings: []
prometheusRule:
enabled: false
additionalLabels: {}
rules: []
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
priorityClassName: ""
revisionHistoryLimit: 10
defaultBackend:
enabled: false
name: defaultbackend
image:
registry: k8s.gcr.io
image: defaultbackend-amd64
tag: "1.5"
pullPolicy: IfNotPresent
runAsUser: 65534
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
existingPsp: ""
extraArgs: {}
serviceAccount:
create: true
name: ""
automountServiceAccountToken: true
extraEnvs: []
port: 8080
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
readinessProbe:
failureThreshold: 6
initialDelaySeconds: 0
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 5
tolerations: []
affinity: {}
podSecurityContext: {}
containerSecurityContext: {}
podLabels: {}
nodeSelector:
kubernetes.io/os: linux
podAnnotations: {}
replicaCount: 1
minAvailable: 1
resources: {}
extraVolumeMounts: []
extraVolumes: []
autoscaling:
annotations: {}
enabled: false
minReplicas: 1
maxReplicas: 2
targetCPUUtilizationPercentage: 50
targetMemoryUtilizationPercentage: 50
service:
annotations: {}
externalIPs: []
loadBalancerSourceRanges: []
servicePort: 80
type: ClusterIP
priorityClassName: ""
labels: {}
rbac:
create: true
scope: false
podSecurityPolicy:
enabled: false
serviceAccount:
create: true
name: ""
automountServiceAccountToken: true
imagePullSecrets: []
tcp: {}
udp: {}
dhParam:
Actually my chart‘s value is mostly default. Anyone has ideas?
my image is froked from k8s.gcr.io, and i just tag it without any change in content
I will wait for comments from @ekovacs
@longwuyuan let me take a deeper look at the current codebase. i'll report back ASAP.
@longwuyuan @ekovacs I'am afraid my charts value has some problem, because I change my chart to 4.0.16, ingress nginx controller to v1.1, it also lost. Can u check out my chart values avove . Thank u very much! By the why,my k8s version is 2.4.2, the latest one.
@Yelijah , @longwuyuan my findings so far:
Collect
does happen, but it seems that the metrics themselves are not present yet_, _ = sc.requests.GetMetricWithLabelValues("", "", "", "", "", "", "", "")
then the metric appears when i curl the metrics endpoint.so i think somehow the sending of the metrics from send https://github.com/kubernetes/ingress-nginx/blob/f85c3866d8135d698fe6a2753b1ed17d89a9efa0/rootfs/etc/nginx/lua/monitor.lua#L28 does not make it to handleMessage: https://github.com/kubernetes/ingress-nginx/blob/f85c3866d8135d698fe6a2753b1ed17d89a9efa0/internal/ingress/metric/collectors/socket.go#L251
which in turn never initialise the metrics and thus they never appear in the /metrics endpoint.
one thing i verified, is that i have monitor related lua (https://github.com/kubernetes/ingress-nginx/blob/2852e2998cbfb8c89f1b3d61de8ed03e0a1d0134/rootfs/etc/nginx/template/nginx.tmpl#L101-L106) in my nginx.conf.
@ekovacs Thank for your time. It seem's to be a bug? which version can i rollback to avoid this problem? I tryed ingress nginx controllerv1.1.1, it also failed. Is it related to the k8s version?
Yes, its a bug. Now we need a developer to work on it, but there is acute shortage of developer time.
Lets wait and see.
/priority important-longterm
/area stabilization
@strongjz this needs Project Stabilisation tag
@longwuyuan I'm on holiday now so i have some time to invest here :). I'll try to come up with a solution/fix.
@ekovacs wow, that will be so helpful. Thanks. Look forward to it. I wonder where this broke.
@longwuyuan i managed to spend some time with this. the good news is that it is not broken / there is no bug.
monitor.lua
is loaded in the config, timer that is set up to flush data is called peridically 👍 monitor.call()
is called, the metrics are tracked in its metrics table 👍 flush()
& send()
is called 👍 socket.go
's handleMessage
is called) 👍 :10254/metrics
👍 i think the culprit may be this for @Yelijah (and for me, when i tried to verify things on local kind cluster): https://github.com/kubernetes/ingress-nginx/blob/f85c3866d8135d698fe6a2753b1ed17d89a9efa0/internal/ingress/metric/collectors/socket.go#L263-L264
this makes sure that metrics are not tracked for hosts that are not explicitly mentioned in the Ingress objects.
when the host: localhost
is set in the ingress, eg.:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prometheus
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx
rules:
- host: localhost
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-server
port:
number: 9090
then the host localhost
will be amongs the hosts that metrics are tracked for, and then there will not be any missing metrics.
BUT, without the host
field present on the ingress, i got this in the ingress-controller's log:
I0707 13:47:20.544710 10 socket.go:258] "Metric" message="[{\"host\":\"localhost\",\"ingress\":\"\",\"method\":\"GET\",\"canary\":\"\",\"requestLength\":717,\"namespace\":\"\",\"status\":\"404\",\"upstreamResponseTime\":0.014,\"responseLength\":681,\"requestTime\":0.014,\"upstreamLatency\":0.014,\"upstreamHeaderTime\":0.014,\"service\":\"\",\"path\":\"\",\"upstreamResponseLength\":548}]"
I0707 13:47:20.546645 10 socket.go:270] "Skipping metric for host not being served" host="localhost"
@Yelijah can you verify that you see this log message in your log? (note the line with "Metric" message="
is at level 5 log, and the Skipping metric for
is at level 3 log)
to change the logging level for you ingress-controller, please adjust the args
parameters in the deployment, eg.:
ok. That is v v v helpful info @ekovacs . But when I tested, I did have a ingress with the host field having a fqdn value, and yet at least the metric nginx_ingress_controller_requests
was missing. I will test again as per your latest update. But yeah, we need to get to the bottom of this.
@ekovacs u are right, when i have a ingress with the host field, i can get all metrics including nginx_ingress_controller_requests, but how can i skip this limit, or how can i get this metic in all hosts and ip? because i can't limit host...
@ekovacs @longwuyuan Thank you for your time, i add arg metrics-per-host=false, then it fixed!
Can you copy/paste that flag and also your curl. I still can not see it ;
% k -n ingress-nginx get po ingress-nginx-controller-7d94447c49-78sn9 -o yaml| grep -i metric -B10
- args:
- /nginx-ingress-controller
- --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
- --election-id=ingress-controller-leader
- --controller-class=k8s.io/ingress-nginx
- --ingress-class=nginx
- --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
- --validating-webhook=:8443
- --validating-webhook-certificate=/usr/local/certificates/cert
- --validating-webhook-key=/usr/local/certificates/key
- --metrics-per-host=false
% k -n ingress-nginx exec -ti ingress-nginx-controller-7d94447c49-78sn9 -- curl localhost:10254/metrics | grep -i requests
# HELP nginx_ingress_controller_nginx_process_requests_total total number of client requests
# TYPE nginx_ingress_controller_nginx_process_requests_total counter
nginx_ingress_controller_nginx_process_requests_total{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7d94447c49-78sn9"} 74
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 10
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
Can you copy/paste that flag and also your curl. I still can not see it ;
% k -n ingress-nginx get po ingress-nginx-controller-7d94447c49-78sn9 -o yaml| grep -i metric -B10 - args: - /nginx-ingress-controller - --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller - --election-id=ingress-controller-leader - --controller-class=k8s.io/ingress-nginx - --ingress-class=nginx - --configmap=$(POD_NAMESPACE)/ingress-nginx-controller - --validating-webhook=:8443 - --validating-webhook-certificate=/usr/local/certificates/cert - --validating-webhook-key=/usr/local/certificates/key - --metrics-per-host=false % k -n ingress-nginx exec -ti ingress-nginx-controller-7d94447c49-78sn9 -- curl localhost:10254/metrics | grep -i requests # HELP nginx_ingress_controller_nginx_process_requests_total total number of client requests # TYPE nginx_ingress_controller_nginx_process_requests_total counter nginx_ingress_controller_nginx_process_requests_total{controller_class="k8s.io/ingress-nginx",controller_namespace="ingress-nginx",controller_pod="ingress-nginx-controller-7d94447c49-78sn9"} 74 # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served. # TYPE promhttp_metric_handler_requests_in_flight gauge promhttp_metric_handler_requests_in_flight 1 # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code. # TYPE promhttp_metric_handler_requests_total counter promhttp_metric_handler_requests_total{code="200"} 10 promhttp_metric_handler_requests_total{code="500"} 0 promhttp_metric_handler_requests_total{code="503"} 0
Here is my args but they seem to have nothing different from yours.
- args:
- /nginx-ingress-controller
- --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
- --election-id=ingress-controller-leader
- --controller-class=k8s.io/ingress-nginx
- --ingress-class=nginx
- --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
- --metrics-per-host=false
my curl result is:
[root@k8s ~]# kubectl exec -it -n dev alphine-extra sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # curl ingress-nginx-controller-metrics:10254/metrics|grep -i requests
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP nginx_ingress_controller_nginx_process_requests_total total number of client requests
# TYPE nginx_ingress_controller_nginx_process_requests_total counter
nginx_ingress_controller_nginx_process_requests_total{controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6"} 8296
# HELP nginx_ingress_controller_requests The total number of client requests.
# TYPE nginx_ingress_controller_requests counter
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="",method="GET",namespace="",path="",service="",status="404"} 15
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="GET",namespace="dev",path="/grafana",service="grafana",status="101"} 7
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="GET",namespace="dev",path="/grafana",service="grafana",status="200"} 290
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="GET",namespace="dev",path="/grafana",service="grafana",status="302"} 1
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="GET",namespace="dev",path="/grafana",service="grafana",status="304"} 20
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="GET",namespace="dev",path="/grafana",service="grafana",status="499"} 3
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="GET",namespace="dev",path="/grafana",service="grafana",status="500"} 1
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="POST",namespace="dev",path="/grafana",service="grafana",status="200"} 1908
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="POST",namespace="dev",path="/grafana",service="grafana",status="400"} 20
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="POST",namespace="dev",path="/grafana",service="grafana",status="499"} 13
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="POST",namespace="dev",path="/grafana",service="grafana",status="500"} 5
nginx_ingress_controller_requests{canary="",controller_class="k8s.io/ingress-nginx",controller_namespace="dev",controller_pod="ingress-nginx-controller-6cd7fc5f98-vk4f6",ingress="grafana",method="PUT",namespace="dev",path="/grafana",service="grafana",status="200"} 2
100 237k 0 237k 0 0 7424k 0 --:--:-- --:--:-- --:--:-- 7424k
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 412
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
By the way , i have no nginx_ingress_controller_requests at first. But when i access any my ingress by curl or browser, these metrics appear
ok, I see it now.
I think we should document this because it will be hard for others to find. We should add this in the monitoring docs.
Thank you for your time again!
好的,我现在看到了。
我认为我们应该记录这一点,因为其他人很难找到。我们应该将其添加到监控文档中。
i have the same problem.
`$ kubectl get pod ingress-nginx-controller-5d95b8fd78-ffztb -o yaml| grep -i metric -B10
timeoutSeconds: 1 name: controller ports:
$ kubectl exec -ti ingress-nginx-controller-5d95b8fd78-ffztb -- curl localhost:10254/metrics | grep -i requests nginx_ingress_controller_nginx_process_requests_total{controller_class="k8s.io/ingress-nginx",controller_namespace="default",controller_pod="ingress-nginx-controller-5d95b8fd78-ffztb"} 99 promhttp_metric_handler_requests_in_flight 1 promhttp_metric_handler_requests_total{code="200"} 1 promhttp_metric_handler_requests_total{code="500"} 0
My Ingress nginx metrics lost some metrics,for example nginx_ingress_controller_requests。anyone can help me?
my helm chart version is 4.1.4, and ingress nginx controller version is 1.2.1.
here is my mrtics: