kubernetes-retired / contrib

[EOL] This is a place for various components in the Kubernetes ecosystem that aren't part of the Kubernetes core.
Apache License 2.0
2.46k stars 1.68k forks source link

ingress controller reloading backend regularly (between 30s-3m) #2923

Closed ironslob closed 5 years ago

ironslob commented 6 years ago

I've run the ingress with --v=2 to see what's happening and get the following, which always seems to be the same:

104.156.229.24 - [104.156.229.24] - - [25/Jun/2018:15:26:49 +0000] "GET /pro/ HTTP/1.1" 200 4180 "https://www.twigdoo.com/" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/98 Safari/537.4 (StatusCake)" 522 1.683 [apps-twigdoo-web] 172.20.84.227:5000 16631 1.684 200 f5981fc9e19566a250f13d8fa7cbce60
104.238.159.87 - [104.238.159.87] - - [25/Jun/2018:15:26:54 +0000] "HEAD / HTTP/1.1" 307 0 "-" "updown.io daemon 2.2" 216 0.003 [apps-twigdoo-web] 172.20.127.53:5000 0 0.004 307 00d6edaaba8425444128740683ab5e51
104.238.159.87 - [104.238.159.87] - - [25/Jun/2018:15:26:54 +0000] "HEAD /pro/ HTTP/1.1" 200 0 "-" "updown.io daemon 2.2" 220 0.024 [apps-twigdoo-web] 172.20.84.227:5000 0 0.024 200 fe2bfb80727df5e8c8133d9b1842df81
138.68.24.60 - [138.68.24.60] - - [25/Jun/2018:15:26:56 +0000] "GET /services/weddings/ HTTP/1.1" 200 3919 "-" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/98 Safari/537.4 (StatusCake)" 499 0.020 [apps-twigdoo-web] 172.20.127.53:5000 19261 0.020 200 b8a8335581c3f4938587857c06f9cc01
I0625 15:27:01.584519       6 controller.go:169] Configuration changes detected, backend reload required.
I0625 15:27:01.584543       6 util.go:68] rlimit.max=1048576
I0625 15:27:01.584568       6 nginx.go:522] Maximum number of open file descriptors: 523264
I0625 15:27:01.656091       6 nginx.go:629] NGINX configuration diff:
--- /etc/nginx/nginx.conf       2018-06-25 15:26:12.608700061 +0000
+++ /tmp/new-nginx-cfg220593655 2018-06-25 15:27:01.652822169 +0000
@@ -213,6 +213,7 @@

                server 172.20.84.227:5000 max_fails=0 fail_timeout=0;
                server 172.20.127.53:5000 max_fails=0 fail_timeout=0;
+               server 172.20.116.105:5000 max_fails=0 fail_timeout=0;

        }

I0625 15:27:01.699733       6 controller.go:179] Backend successfully reloaded.
84.201.133.36 - [84.201.133.36] - - [25/Jun/2018:15:27:04 +0000] "GET /sitemap/england/south-west/cornwall/brunnion/ HTTP/1.1" 200 2936 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 345 0.103 [apps-twigdoo-web] 172.20.84.227:5000 9041 0.104 200 8156bb867d872abc6ac37897a3ce3b85
I0625 15:27:04.917910       6 controller.go:169] Configuration changes detected, backend reload required.
I0625 15:27:04.917931       6 util.go:68] rlimit.max=1048576
I0625 15:27:04.917937       6 nginx.go:522] Maximum number of open file descriptors: 523264
I0625 15:27:04.965493       6 nginx.go:629] NGINX configuration diff:
--- /etc/nginx/nginx.conf       2018-06-25 15:27:01.652822169 +0000
+++ /tmp/new-nginx-cfg538364737 2018-06-25 15:27:04.960830406 +0000
@@ -211,9 +211,8 @@

                keepalive 32;

-               server 172.20.84.227:5000 max_fails=0 fail_timeout=0;
-               server 172.20.127.53:5000 max_fails=0 fail_timeout=0;
                server 172.20.116.105:5000 max_fails=0 fail_timeout=0;
+               server 172.20.84.227:5000 max_fails=0 fail_timeout=0;

        }

I0625 15:27:05.007378       6 controller.go:179] Backend successfully reloaded.

Any help on resolving this would be great, as I'm seeing regular 502 responses.

ironslob commented 6 years ago

I found upstream annotations and thought I'd try these:

+    nginx.ingress.kubernetes.io/upstream-fail-timeout: "30"
+    nginx.ingress.kubernetes.io/upstream-max-fails: "5"

but no joy, same thing is happening.

54.172.193.230 - [54.172.193.230] - - [25/Jun/2018:15:44:09 +0000] "GET /sitemap/wales/dyfed/ceredigion-sir-ceredigion/llywernog/ HTTP/1.1" 200 2949 "-" "MauiBot (crawler.feedback+wc@gmail.com)" 324 1.872 [apps-twigdoo-web] 172.20.127.53:5000 9112 1.868 200 45f49e1b9bef9e5540bcdaf1df577264
I0625 15:44:22.533945       6 controller.go:169] Configuration changes detected, backend reload required.
I0625 15:44:22.548688       6 util.go:68] rlimit.max=1048576
I0625 15:44:22.548768       6 nginx.go:522] Maximum number of open file descriptors: 523264
I0625 15:44:22.688089       6 nginx.go:629] NGINX configuration diff:
--- /etc/nginx/nginx.conf       2018-06-25 15:43:35.579306845 +0000
+++ /tmp/new-nginx-cfg201428551 2018-06-25 15:44:22.683427020 +0000
@@ -211,9 +211,8 @@

                keepalive 32;

-               server 172.20.84.227:5000 max_fails=5 fail_timeout=30;
-               server 172.20.116.105:5000 max_fails=5 fail_timeout=30;
                server 172.20.127.53:5000 max_fails=5 fail_timeout=30;
+               server 172.20.84.227:5000 max_fails=5 fail_timeout=30;

        }

I0625 15:44:22.742475       6 controller.go:179] Backend successfully reloaded.
I0625 15:44:31.582142       6 controller.go:169] Configuration changes detected, backend reload required.
I0625 15:44:31.582167       6 util.go:68] rlimit.max=1048576
I0625 15:44:31.582173       6 nginx.go:522] Maximum number of open file descriptors: 523264
I0625 15:44:31.639561       6 nginx.go:629] NGINX configuration diff:
--- /etc/nginx/nginx.conf       2018-06-25 15:44:22.683427020 +0000
+++ /tmp/new-nginx-cfg243681809 2018-06-25 15:44:31.635449683 +0000
@@ -211,6 +211,7 @@

                keepalive 32;

+               server 172.20.116.105:5000 max_fails=5 fail_timeout=30;
                server 172.20.127.53:5000 max_fails=5 fail_timeout=30;
                server 172.20.84.227:5000 max_fails=5 fail_timeout=30;

I0625 15:44:31.678958       6 controller.go:179] Backend successfully reloaded.
54.172.193.230 - [54.172.193.230] - - [25/Jun/2018:15:44:39 +0000] "GET /sitemap/wales/dyfed/ceredigion-sir-ceredigion/maen-y-groes/ HTTP/1.1" 200 2953 "-" "MauiBot (crawler.feedback+wc@gmail.com)" 327 0.005 [apps-twigdoo-web] 172.20.127.53:5000 9123 0.004 200 daf90a3fae7caca6efaffbbaf6ed0755
fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 5 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 5 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 5 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/contrib/issues/2923#issuecomment-441092287): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.