kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.5k stars 8.26k forks source link

how to find out reason why Ingress controller restarting on heavy load test #9148

Closed silandrew closed 2 months ago

silandrew commented 2 years ago

how to find reason why Ingress controller restarted on heavy load on strest test AKS version 1.24.3 ingress ingress-nginx-helm chart 4.2.3 app version1.3.0

how to find a reason why the Ingress controller restarted on heavy load on the stress test

Ingress I could find the following error in logs

[error] 39#39: *68832 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 10.1x.x.3x, server: ~^(?[\w-]+)\xxxxxx.com$, request: "GET /?spDomainId=98BF8CC6%2D3D7E%2D4A4C%2D82ED253DD586129D&spInstId=E0E07930%2D3849%2D4E10%2DB098CCB2FFB3F05E&s=5BEFF5FC%2DD61E%2D4349%2D84B6DBB6B587D93B&action=main&ts=202210101300&t=531F744A8AF2&spCode=Home HTTP/1.1", upstream: "http://10.xxx0.0.22:80/?spDomainId=98BF8CC6%2D3D7E%2D4A4C%2D82ED253DD586129D&spInstId=E0E07930%2D3849%2D4E10%2DB098CCB2FFB3F05E&s=5BEFF5FC%2DD61E%2D4349%2D84B6DBB6B587D93B&action=main&ts=202210101300&t=531F744A8AF2&spCode=Home", host: "xxxxxxxz.com" IMPORTANT!!! 10.x.0.153 - - [10/Oct/2022:12:35:01 +0000] "GET /js/xxx/uleErrors.js?2_0_19_1 HTTP/1.1" 200 19396 "https://x2.x.com/index.cfm?spDomainId=7AB0BE7A%2D1EC9%2DB3EF%2DC4A72F48BE642A17&spInstId=6ACF1F5B%2DA3C8%2D9B00%2D11D0085A8F892C87&s=5B3F8EB5%2D4566%2D4A8C%2D994BBC7CBEC019A8&action=lpEditProp&ts=202210101334&t=729A30C19201&lpId=F925949C-603C-4B1D-A3D34B761D0A5AF1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" 1731 0.002 [xxxxx-console-5001] [] 10.140.0.55:80 19396 0.000 200 3bcae911dd1252a0f4b1ab448da0c93c 2022/10/10 14:42:54 [crit] 40#40: *251456 SSL_do_handshake() failed (SSL: error:141CF06C:SSL routines:tls_parse_ctos_key_share:bad key share) while SSL handshaking, client: 10.x.0.33, server: 0.0.0.0:443 10.xx.0.33 - - [10/Oct/2022:14:42:55 +0000] "POST /xxxxApi/login/processLoginJSON/?nc=0.586474373 HTTP/1.1" 200 508 "-" "Artillery (https://artillery.io)" 777 0.158 [poxxxrtal-appl-5000] [] 10.xx.0.19:80 508 0.156 200 10.x0.33 - - [10/Oct/2022:12:53:39 +0000] "GET /ixxx/xxxl/installations/testdelazure/xxx_client.css?2_0_19 HTTP/1.1" 404 3742 "https://x.com/index.cfm?spDomainId=F0444649%2D16EC%2D441C%2DA7BB5EACB7508659&spInstId=E0E07930%2D3849%2D4E10%2DB098CCB2FFB3F05E&s=133799F0%2DF24D%2D4F75%2D8A5985E4C8B1F5B6&action=mcFile&ts=202210101353&t=42E88121E12A&uploadError=false" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Edg/106.0.1370.37" 1889 0.057 [xxxxxxxxapp-5001] [] 10.x.0.20:80 3742 0.064 404 9ae434ab0a016cca770f758ea01d11f0 2022/10/10 12:08:47 [error] 39#39: *68832 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 10.140.0.35, server: ~^(?[\w-]+)\xxx\.com$, request: "GET /?spDomainId=98BF8CC6%2D3D7E%2D4A4C%2D82ED253DD586129D&spInstId=E0E07930%2D3849%2D4E10%2DB098CCB2FFB3F05E&s=5BEFF5FC%2DD61E%2D4349%2D84B6DBB6B587D93B&action=main&ts=202210101300&t=531F744A8AF2&spCode=Home HTTP/1.1", upstream: "http://10.xena.0.22:80/?spDomainId=98BF8CC6%2D3D7E%2D4A4C%2D82ED253DD586129D&spInstId=E0E07930%2D3849%2D4E10%2DB098CCB2FFB3F05E&s=5BEFF5FC%2DD61E%2D4349%2D84B6DBB6B587D93B&action=main&ts=202210101300&t=531F744A8AF2&spCode=Home", host: "xx..xx.com" 022/10/10 14:42:54 [crit] 40#40: *251456 SSL_do_handshake() failed (SSL: error:141CF06C:SSL routines:tls_parse_ctos_key_share:bad key share) while SSL handshaking, client: 10.1xx.0.33, server: 0.0.0.0:443 10.xx.0.33 - - [10/Oct/2022:14:42:55 +0000] "POST /xxxtApi/login/processLoginJSON/?nc=0.586474373 HTTP/1.1" 200 508 "-" "Artillery (https:/xxy.io)" 777 0.158 [xx-xxxl-5000] [] 10.xxx.0.19:80 508 0.156 200 37a7cf3fc4a73fb34ac21a9e1d03c42a 10.xxx.0.33 - - [10/Oct/2022:12:53:39 +0000] "GET /xxxx/xxxx/installations/txxxxxxa_client.css?2_0_19 HTTP/1.1" 404 3742 "https://xxx2.e-xxxxxx.com/index.cfm?spDomainId=F0444649%2D16EC%2D441C%2DA7BB5EACB7508659&spInstId=E0E07930%2D3849%2D4E10%2DB098CCB2FFB3F05E&s=133799F0%2DF24D%2D4F75%2D8A5985E4C8B1F5B6&action=mcFile&ts=202210101353&t=42E88121E12A&uploadError=false" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Edg/106.0.1370.37" 1889 0.057 [xxx-app-console-5001] [] 10.xxxx.0.20:80 3742 0.064 404 9ae434ab0a016cca770f758ea01d11f0

error on monitoring Source from ingress pod

Source Kubernetes Description Liveness probe failed: Get "http://10.140.0.43:10254/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Scope agent_id = 100500991 agent_tag_cluster =xxxxxx-01 agent_tag_sysdig_secure_enabled = true host_domain = compute.internal host_hostname = aks-xxxxx-13026865-vmss000005 host_ip_private = 10.xx.0.153 host_mac = 60:45:bd:12:cf:a1 kube_cluster_name = aksxxx-01 kube_deployment_name = ingress-nginx-xxx-controller kube_namespace_name = portal kube_node_name = xx-13026865-vmss000002 kube_pod_name = ingress-nginx-1661945723-controller-5844c7b7bd-ngfcv kube_replicaset_name = ingress-nginx-1661945723-controller-5844c7b7bd kube_service_name = ingress-nginx-1661945723-controller-admission kube_workload_name = ingress-nginx-1661945723-controller kube_workload_type = deployment

VolumeAttributes: secretProviderClass=ingress-tls Conditions Type Status Reason


Progressing True NewReplicaSetAvailable Available True MinimumReplicasAvailable OldReplicaSets: NewReplicaSet: ingress-nginx-1661945723-controller-5844c7b7bd (3/3 replicas created) Events:

kubectl describe svc ingress-nginx-1661945723-controller -n xxxx

> Name: ingress-nginx-1661945723-controller Namespace: xxxl Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx-1661945723 app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.3.0 helm.sh/chart=ingress-nginx-4.2.3 Annotations: meta.helm.sh/release-name: ingress-nginx-1661945723 meta.helm.sh/release-namespace: xxxxl service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx-1661945723,app.kubernetes.io/name=ingress-nginx Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 192xx.0.217 IPs: 192.xxx.0.217 LoadBalancer Ingress: 20.254.82.53 Port: http 80/TCP TargetPort: http/TCP NodePort: http 31030/TCP Endpoints: 10.xxx.0.24:80,10.1xx.0.26:80,10.xx.0.43:80 Port: https 443/TCP TargetPort: https/TCP NodePort: https 32353/TCP Endpoints: 10.xxx.0.24:443,10.xx.0.26:443,10.xx.0.43:443 Session Affinity: None External Traffic Policy: Cluster Events: **-kubectl describe configmap ingress-controller-leader -n xxxx** Name: ingress-controller-leader Namespace: xxxx Labels: Annotations: control-plane.alpha.kubernetes.io/leader: {"holderIdentity":"ingress-nginx-1661945723-controller-5844c7b7bd-fq48g","leaseDurationSeconds":30,"acquireTime":"2022-10-10T17:16:04Z","r... ** indication of the restarts** ingress-nginx-1661945723-controller-5844c7b7bd-fq48g 1/1 Running 3 (2d15h ago) 42d ingress-nginx-1661945723-controller-5844c7b7bd-kmsqc 1/1 Running 3 (2d15h ago) 42d ingress-nginx-1661945723-controller-5844c7b7bd-ngfcv 1/1 Running 4 (2d15h ago) 42d
k8s-ci-robot commented 2 years ago

@silandrew: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 2 years ago

/remove-kind bug

silandrew commented 2 years ago

/remove-kind bug

  • Your message post is not formatted so please format it to make it legible
  • Please try to answer the questions asked in the new issue template

Ive update formatting

bmv126 commented 2 years ago

@silandrew i do not see error logs indicating restart, I see timeouts. Also format your description and provide kubectl outputs of pod, svc. Also what test you are trying

silandrew commented 2 years ago

@silandrew I do not see error logs indicating restart, I see timeouts. Also, format your description and provide kubectl outputs of the pod, svc. Also what test you are trying

@ Ive formatted the description, problem which one I had users were able to log in, when I've checked the ingress pods status was showing that all ingress restarted and on logds from monitoring that was a problem with the readiness probe ingress-nginx-1661945723-controller-5844c7b7bd-fq48g 1/1 Running 3 (2d15h ago) 42d ingress-nginx-1661945723-controller-5844c7b7bd-kmsqc 1/1 Running 3 (2d15h ago) 42d ingress-nginx-1661945723-controller-5844c7b7bd-ngfcv 1/1 Running 4 (2d15h ago) 42d

tao12345666333 commented 2 years ago

It might be more convenient if you have a monitoring panel.

longwuyuan commented 2 months ago

/kind support

Closing this as this was a support question and seems to have been answered but issue kept open for years without tracking any action item.

/close

k8s-ci-robot commented 2 months ago

@longwuyuan: Closing this issue.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/9148#issuecomment-2328984387): >/kind support > >Closing this as this was a support question and seems to have been answered but issue kept open for years without tracking any action item. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.