grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
22.73k stars 3.31k forks source link

loki-simple-scalable loki-gateway Nginx startup failed #7287

Open jwping opened 1 year ago

jwping commented 1 year ago

After installing loki simple scalable with help, the gateway log reports the following error:

kubectl logs -n loki loki-gateway-7f78b889f9-9tb75
/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration
2022/09/29 09:55:35 [emerg] 1#1: host not found in resolver "kube-dns.kube-system.svc.cluster.local" in /etc/nginx/nginx.conf:27
nginx: [emerg] host not found in resolver "kube-dns.kube-system.svc.cluster.local" in /etc/nginx/nginx.conf:27

But there are kube dns in my cluster

root@master-01:# kubectl get -n kube-system svc
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   27d
root@master-01:# kubectl get -n kube-system pods
NAME                                READY   STATUS    RESTARTS       AGE
coredns-c676cc86f-dxv8m             1/1     Running   0              150m
coredns-c676cc86f-l6njk             1/1     Running   0              151m
etcd-master-01                      1/1     Running   65             27d
kube-apiserver-master-01            1/1     Running   0              27d
kube-controller-manager-master-01   1/1     Running   2 (24h ago)    27d
kube-proxy-c9746                    1/1     Running   0              27d
kube-proxy-d8qbr                    1/1     Running   0              24d
kube-proxy-jhl2s                    1/1     Running   0              24d
kube-proxy-rw5tr                    1/1     Running   0              24d
kube-scheduler-master-01            1/1     Running   71 (24d ago)   27d
darox commented 1 year ago

I have the same issue and can see following in the logs:

[INFO] 10.0.0.72:34141 - 11273 "A IN kube-dns.kube-system.svc.cluster.local.monitoring.svc.cluster.local. udp 85 false 512" NXDOMAIN qr,aa,rd 178 0.000441375s [INFO] 10.0.0.72:34141 - 11776 "AAAA IN kube-dns.kube-system.svc.cluster.local.monitoring.svc.cluster.local. udp 85 false 512" NXDOMAIN qr,aa,rd 178 0.000412507s [INFO] 10.0.0.72:51078 - 27467 "AAAA IN kube-dns.kube-system.svc.cluster.local.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.000208542s [INFO] 10.0.0.72:51078 - 26859 "A IN kube-dns.kube-system.svc.cluster.local.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.000176965s [INFO] 10.0.1.120:60184 - 63770 "A IN loki.monitoring.svc.cluster.local.monitoring.svc.cluster.local. udp 80 false 512" NXDOMAIN qr,aa,rd 173 0.000274916s [INFO] 10.0.1.120:50301 - 19685 "AAAA IN loki.monitoring.svc.cluster.local.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd 162 0.000092976s [INFO] 10.0.1.120:59617 - 17088 "A IN loki.monitoring.svc.cluster.local.cluster.local. udp 65 false 512" NXDOMAIN qr,aa,rd 158 0.000166104s [INFO] 10.0.1.120:60339 - 24553 "AAAA IN loki.monitoring.svc.cluster.local.damn.li. udp 59 false 512" NOERROR qr,rd,ra 143 0.000685193s [INFO] 10.0.1.120:56636 - 10429 "AAAA IN loki.monitoring.svc.cluster.local. udp 51 false 512" NXDOMAIN qr,aa,rd 144 0.000104261s

For some reason gateway is requesting a way too long domain.

seb-835 commented 1 year ago

@darox @jwping You have to check that you configure loki with the right dns setting.

Query the name of your kube-dns service name,

kubectl get svc --namespace=kube-system -l k8s-app=kube-dns -o jsonpath='{.items..metadata.name}'

then adjust your helm setting with the result you got, in my case the dns svc is not kube-dns but "rke2-coredns-rke2-coredns". so i use

global:
   dnsService: "rke2-coredns-rke2-coredns"

and it works fine, pod start and does not complain anymore.

trevorwhitney commented 1 year ago

Could you try this again? I normally develop against a k3d cluster, but in testing against a kind cluster to debug some CI failures (since that's what our CI uses), I noticed some differences in the ndots value present in the /etc/resolv.conf in the containers on the kind cluster. As a result I needed to add an extra dot to the end of the resolver DNS record. That change should be in 3.3.0. Can you please try that version and let me know if this is still an issue?

darox commented 1 year ago

In my case it's: kube-dns

willzhang commented 1 year ago

same error

root@node52:~# kubectl -n loki logs -f loki-gateway-774ff559b9-2w4dq  
/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration
2023/01/05 08:41:13 [emerg] 1#1: host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:27
nginx: [emerg] host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:27

and dns

root@node52:~# kubectl get svc --namespace=kube-system -l k8s-app=kube-dns  -o jsonpath='{.items..metadata.name}'
coredns

and resloved by

global:
  dnsService: "coredns"
zalegrala commented 1 year ago

I suspect this is related to the ndots configuration in the /etc/resolv.conf. May we see the resolver configuration please?

rezaebrahimi1 commented 1 year ago

The solutin from seb-835 https://github.com/grafana/loki/issues/7287#issuecomment-1282339134 works for me

murand78 commented 8 months ago

In my case the cluster dns was not resolving the cluster.local domain at all, the solution was to add also the clusterDomain. The installation was a k3s Cluster provisioned via Rancher 2.7.6 with Cluster Domain explicitly set.

Kubernetes Version: v1.25.13 +k3s1

Helm Chart:


global:
   dnsService: "kube-dns"
   dnsNamespace: "kube-system"
   clusterDomain: "mysubdomain.mydomain.it"

Could be a nice option to have the possibility to set in the helm chart the IP of the DNS svc instead of the fqdn?

batazor commented 6 months ago
kubectl get svc --namespace=kube-system -l k8s-app=kube-dns  -o jsonpath='{.items..metadata.name}'
kube-dns
/docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration
2023/12/21 22:00:31 [emerg] 1#1: host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:33
nginx: [emerg] host not found in resolver "kube-dns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:33

Encountered the same error when switching to Talos

from random container:

ping kube-dns.kube-system.svc.cluster.local.
PING kube-dns.kube-system.svc.cluster.local. (10.96.0.10): 56 data bytes
camaeel commented 3 months ago

@batazor I got the same error when I run loki with gateway on Talos cluster. Have you found any solution?

camaeel commented 3 months ago

IMHO may be related to https://github.com/grafana/loki/issues/11650

artem-zherdiev-ingio commented 1 month ago

Same issue here, we have two GKE clusters and one is using DNS Kube-dns (loki works without any adjustments) and the second DNS is Cloud DNS (VPC scope) with specific Domain suffix.

As mentioned above we tried to change global.clusterDomain to Domain suffix and it works.

acar-ctpe commented 19 hours ago

Getting the same error

benjaminhuo commented 9 hours ago

The following config solve my probelm:

loki:
  global:
    dnsService: coredns