Closed tlvenn closed 4 years ago
@tlvenn I notice something similar. In my case it's most of the time the 1st request after some idle time or every now and then single requests while actively interacting with a webapp that produces a 500.
But In my case there is interesting log output of the respective nginx-ingress instance:
2019/03/26 08:12:43 [error] 905#905: *919209 nginx-ldap-auth-login-coreadmin.infra.svc.<domain>.kube could not be resolved (3: Host not found), client: 10.88.0.5, server: coresql.<domain>.intra, request: "GET / HTTP/2.0", subrequest: "/_external-auth-Lw", host: "coresql.<domain>.intra"
2019/03/26 08:12:43 [error] 905#905: *919209 auth request unexpected status: 502 while sending to client, client: 10.88.0.5, server: coresql.<domain>.intra, request: "GET / HTTP/2.0", host: "coresql.<domain>.intra"
10.88.0.5 - [10.88.0.5] - m0se [26/Mar/2019:08:12:43 +0000] "GET / HTTP/2.0" 502 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36" 0 0.024 [app-coresql-phpmyadmin-80] - - - - cda818e7a62d5b646e2dfe9b1bb1299d
10.88.0.5 - [10.88.0.5] - m0se [26/Mar/2019:08:12:43 +0000] "GET / HTTP/2.0" 500 579 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36" 98 0.024 [app-coresql-phpmyadmin-80] - - - - cda818e7a62d5b646e2dfe9b1bb1299d
I was initially thinking that my cluster is somehow miss-configured, since I just recently built it and still in the process of getting things setup right, but maybe this is related.
After hitting refresh once or twice the 500 disappears and everything works until the next random request fails:
10.88.0.5 - [10.88.0.5] - m0se [26/Mar/2019:08:13:02 +0000] "GET / HTTP/2.0" 200 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36" 0 0.003 [app-coresql-phpmyadmin-80] 10.233.23.69:5555 0 0.002 200 0b1d059988ddd773b6ebaf880eeba0d7
10.88.0.5 - [10.88.0.5] - m0se [26/Mar/2019:08:13:02 +0000] "GET / HTTP/2.0" 200 2431 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36" 21 0.031 [app-coresql-phpmyadmin-80] 10.233.67.27:80 2431 0.028 200 0b1d059988ddd773b6ebaf880eeba0d7
10.88.0.5 - [10.88.0.5] - m0se [26/Mar/2019:08:13:02 +0000] "GET /js/vendor/codemirror/lib/codemirror.css?v=4.8.5 HTTP/2.0" 200 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36" 0 0.001 [app-coresql-phpmyadmin-80] 10.233.23.69:5555 0 0.002 200 d5e96d3d1616dc554f717129cfe0d3d4
I'm also using nginx-ingress 0.21.0 (quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0
)
In my case I'm pretty sure it is a DNS related issue, since when I switch auth-url
from service-name/DNS to the ClusterIP of the nginx-ldap-auth service I'm not able to reproduce the 500.
Maybe you can try this to verify that it is not a DNS issue in your case?
As for debugging the DNS issue, I have not been very successful yet, when entering the running Ingress container with exec
I'm always able to resolve the service-name/DNS. So I'm a bit puzzled why nginx sometimes is not able to.
Good catch again @m0se , it's indeed a DNS issue, if I use the service cluster ip, it works just fine and just like you, i can resolve the dns of the service from the ingress container as well...
I might have a clue of why this is happening. I dumped the nginx.conf yesterday and I was surprised to see that an explicit nameserver is actually set (google):
http {
init_by_lua_block {
require("resty.core")
collectgarbage("collect")
local lua_resty_waf = require("resty.waf")
lua_resty_waf.init()
-- init modules
local ok, res
ok, res = pcall(require, "configuration")
if not ok then
error("require failed: " .. tostring(res))
else
configuration = res
configuration.nameservers = { "8.8.8.8" }
end
ok, res = pcall(require, "balancer")
if not ok then
error("require failed: " .. tostring(res))
else
balancer = res
end
ok, res = pcall(require, "monitor")
if not ok then
error("require failed: " .. tostring(res))
else
monitor = res
end
}
...
resolver 8.8.8.8 valid=30s;
}
So i believe nginx is directly talking to google dns server to resolve that auth url instead of using kube-dns or coredns. At least it seems that way.
I confirm it also looks like DNS issue from my side.
To fix it, just add svc.cluster.local
after the <service name>.<namespace>
URL defined in annotation auth-url
, i.e. nginx-ldap-auth.my-namespace.svc.cluster.local
.
I believe the '.local' will force internal DNS resolution instead of above pointed 8.8.8.8
I had no issue with 0.26.2
, I'm using this one. If the problem persists please check if it is related to nginx or k8s cluster configuration.
Hi,
I am trying to setup nginx-ldap-auth with nginx ingress 0.21.0 but as soon as I add the annotation to one ingress, Nginx returns a 500. The nginx log does not show any error and the pod log does not show anything either beside:
I can reach the nginx-ldap auth service within the cluster on any pod so the url given to Nginx is correct. Any idea what could be the issue ?
Thanks in advance.