kubernetes-sigs / external-dns

Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services
Apache License 2.0
7.65k stars 2.56k forks source link

nil pointer dereference in extractHeadlessEndpoints #2029

Closed allenporter closed 3 years ago

allenporter commented 3 years ago

What happened:

I recently started observing this crash:

...
time="2021-03-26T21:33:22Z" level=debug msg="No endpoints could be generated from service external-dns/external-dns"
time="2021-03-26T21:33:22Z" level=debug msg="No endpoints could be generated from service podinfo/podinfo"
time="2021-03-26T21:33:22Z" level=debug msg="No endpoints could be generated from service home-assistant/home-assistant-codeserver"
time="2021-03-26T21:33:22Z" level=debug msg="No endpoints could be generated from service monitoring/kube-prometheus-stack-grafana"
time="2021-03-26T21:33:22Z" level=debug msg="No endpoints could be generated from service kube-system/kube-dns"
time="2021-03-26T21:33:22Z" level=debug msg="Generating matching endpoint alertmanager-operated.mrv.thebends.org with EndpointAddress IP 192.168.173.8"
time="2021-03-26T21:33:22Z" level=debug msg="Generating matching endpoint alertmanager-kube-prometheus-stack-alertmanager-0.alertmanager-operated.mrv.thebends.org with EndpointAddress IP 192.168.173.8"
time="2021-03-26T21:33:22Z" level=debug msg="Endpoints generated from service: monitoring/alertmanager-operated: [alertmanager-kube-prometheus-stack-alertmanager-0.alertmanager-operated.mrv.thebends.org 0 IN A  192.168.173.8 [] alertmanager-operated.mrv.thebends.org 0 IN A  192.168.173.8 []]"
time="2021-03-26T21:33:22Z" level=debug msg="No endpoints could be generated from service monitoring/kube-prometheus-stack-alertmanager"
time="2021-03-26T21:33:22Z" level=debug msg="No endpoints could be generated from service monitoring/speedtest-prometheus"
time="2021-03-26T21:33:22Z" level=debug msg="No endpoints could be generated from service home-assistant/home-assistant-postgresql"
time="2021-03-26T21:33:22Z" level=debug msg="No endpoints could be generated from service kubernetes-dashboard/kubernetes-dashboard"
time="2021-03-26T21:33:22Z" level=debug msg="No endpoints could be generated from service pihole/pihole-web"
time="2021-03-26T21:33:22Z" level=debug msg="No endpoints could be generated from service cert-manager/cert-manager"
time="2021-03-26T21:33:22Z" level=debug msg="No endpoints could be generated from service monitoring/kube-prometheus-stack-prometheus-node-exporter"
time="2021-03-26T21:33:22Z" level=debug msg="No endpoints could be generated from service monitoring/kube-prometheus-stack-kube-state-metrics"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x19d4062]
goroutine 1 [running]:
sigs.k8s.io/external-dns/source.(*serviceSource).extractHeadlessEndpoints(0xc0006aed20, 0xc00083a630, 0xc000b53a80, 0x30, 0x0, 0xc000804b70, 0xc0006dd178, 0xc0006dd0e0)
    /bitnami/blacksmith-sandox/external-dns-0.7.6/src/github.com/kubernetes-incubator/external-dns/source/service.go:287 +0x582
sigs.k8s.io/external-dns/source.(*serviceSource).generateEndpoints(0xc0006aed20, 0xc00083a630, 0xc000b53a80, 0x31, 0x3cd7730, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /bitnami/blacksmith-sandox/external-dns-0.7.6/src/github.com/kubernetes-incubator/external-dns/source/service.go:481 +0xb93
sigs.k8s.io/external-dns/source.(*serviceSource).endpointsFromTemplate(0xc0006aed20, 0xc00083a630, 0x0, 0x0, 0x0, 0x0, 0x2)
    /bitnami/blacksmith-sandox/external-dns-0.7.6/src/github.com/kubernetes-incubator/external-dns/source/service.go:365 +0x2b9
sigs.k8s.io/external-dns/source.(*serviceSource).Endpoints(0xc0006aed20, 0x2b13760, 0xc000b4db00, 0x4, 0x4, 0x33, 0x5c, 0xc00014e2a0)
    /bitnami/blacksmith-sandox/external-dns-0.7.6/src/github.com/kubernetes-incubator/external-dns/source/service.go:195 +0x7e5
sigs.k8s.io/external-dns/source.(*multiSource).Endpoints(0xc000b29ec0, 0x2b13760, 0xc000b4db00, 0x8, 0x3cd9320, 0x1, 0x203000, 0xc00065f130)
    /bitnami/blacksmith-sandox/external-dns-0.7.6/src/github.com/kubernetes-incubator/external-dns/source/multi_source.go:35 +0xc5
sigs.k8s.io/external-dns/source.(*dedupSource).Endpoints(0xc0003a0f30, 0x2b13760, 0xc000b4db00, 0x3c78f70, 0x222b1c0, 0xc000b48c80, 0x2b13760, 0xc000b4db00)
    /bitnami/blacksmith-sandox/external-dns-0.7.6/src/github.com/kubernetes-incubator/external-dns/source/dedup_source.go:42 +0xca
sigs.k8s.io/external-dns/controller.(*Controller).RunOnce(0xc0006aef00, 0x2b13760, 0xc000b4db00, 0x3ca5b20, 0x25e7801)
    /bitnami/blacksmith-sandox/external-dns-0.7.6/src/github.com/kubernetes-incubator/external-dns/controller/controller.go:136 +0x16e
sigs.k8s.io/external-dns/controller.(*Controller).Run(0xc0006aef00, 0x2b136a0, 0xc000463400)
    /bitnami/blacksmith-sandox/external-dns-0.7.6/src/github.com/kubernetes-incubator/external-dns/controller/controller.go:194 +0x20f
main.main()
    /bitnami/blacksmith-sandox/external-dns-0.7.6/src/github.com/kubernetes-incubator/external-dns/main.go:354 +0xe50

What you expected to happen:

I would expect an error message for an invalid cluster state, rather than a crash.

How to reproduce it (as minimally and precisely as possible):

I have not yet been able to determine which service is causing the problem.

Anything else we need to know?:

Environment:

allenporter commented 3 years ago

I turned off/on some charts, and adding back this set of services seemed to trigger this crash:

$ kubectl describe services -n monitoring

Name:              alertmanager-operated
Namespace:         monitoring
Labels:            operated-alertmanager=true
Annotations:       <none>
Selector:          app=alertmanager
Type:              ClusterIP
IP Families:       <none>
IP:                None
IPs:               None
Port:              web  9093/TCP
TargetPort:        web/TCP
Endpoints:         192.168.124.29:9093
Port:              tcp-mesh  9094/TCP
TargetPort:        9094/TCP
Endpoints:         192.168.124.29:9094
Port:              udp-mesh  9094/UDP
TargetPort:        9094/UDP
Endpoints:         192.168.124.29:9094
Session Affinity:  None
Events:            <none>

Name:              kube-prometheus-stack-alertmanager
Namespace:         monitoring
Labels:            app=kube-prometheus-stack-alertmanager
                   app.kubernetes.io/managed-by=Helm
                   chart=kube-prometheus-stack-14.3.0
                   heritage=Helm
                   release=kube-prometheus-stack
                   self-monitor=true
Annotations:       meta.helm.sh/release-name: kube-prometheus-stack
                   meta.helm.sh/release-namespace: monitoring
Selector:          alertmanager=kube-prometheus-stack-alertmanager,app=alertmanager
Type:              ClusterIP
IP Families:       <none>
IP:                10.97.167.186
IPs:               10.97.167.186
Port:              web  9093/TCP
TargetPort:        9093/TCP
Endpoints:         192.168.124.29:9093
Session Affinity:  None
Events:            <none>

Name:              kube-prometheus-stack-grafana
Namespace:         monitoring
Labels:            app.kubernetes.io/instance=kube-prometheus-stack
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=grafana
                   app.kubernetes.io/version=7.4.3
                   helm.sh/chart=grafana-6.6.3
Annotations:       meta.helm.sh/release-name: kube-prometheus-stack
                   meta.helm.sh/release-namespace: monitoring
Selector:          app.kubernetes.io/instance=kube-prometheus-stack,app.kubernetes.io/name=grafana
Type:              ClusterIP
IP Families:       <none>
IP:                10.111.138.142
IPs:               10.111.138.142
Port:              service  80/TCP
TargetPort:        3000/TCP
Endpoints:         192.168.173.17:3000
Session Affinity:  None
Events:            <none>

Name:              kube-prometheus-stack-kube-state-metrics
Namespace:         monitoring
Labels:            app.kubernetes.io/instance=kube-prometheus-stack
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=kube-state-metrics
                   helm.sh/chart=kube-state-metrics-2.13.0
Annotations:       meta.helm.sh/release-name: kube-prometheus-stack
                   meta.helm.sh/release-namespace: monitoring
                   prometheus.io/scrape: true
Selector:          app.kubernetes.io/instance=kube-prometheus-stack,app.kubernetes.io/name=kube-state-metrics
Type:              ClusterIP
IP Families:       <none>
IP:                10.96.228.173
IPs:               10.96.228.173
Port:              http  8080/TCP
TargetPort:        8080/TCP
Endpoints:         192.168.124.57:8080
Session Affinity:  None
Events:            <none>

Name:              kube-prometheus-stack-operator
Namespace:         monitoring
Labels:            app=kube-prometheus-stack-operator
                   app.kubernetes.io/managed-by=Helm
                   chart=kube-prometheus-stack-14.3.0
                   heritage=Helm
                   release=kube-prometheus-stack
Annotations:       meta.helm.sh/release-name: kube-prometheus-stack
                   meta.helm.sh/release-namespace: monitoring
Selector:          app=kube-prometheus-stack-operator,release=kube-prometheus-stack
Type:              ClusterIP
IP Families:       <none>
IP:                10.106.240.203
IPs:               10.106.240.203
Port:              https  443/TCP
TargetPort:        https/TCP
Endpoints:         192.168.124.26:10250
Session Affinity:  None
Events:            <none>

Name:              kube-prometheus-stack-prometheus
Namespace:         monitoring
Labels:            app=kube-prometheus-stack-prometheus
                   app.kubernetes.io/managed-by=Helm
                   chart=kube-prometheus-stack-14.3.0
                   heritage=Helm
                   release=kube-prometheus-stack
                   self-monitor=true
Annotations:       meta.helm.sh/release-name: kube-prometheus-stack
                   meta.helm.sh/release-namespace: monitoring
Selector:          app=prometheus,prometheus=kube-prometheus-stack-prometheus
Type:              ClusterIP
IP Families:       <none>
IP:                10.97.99.103
IPs:               10.97.99.103
Port:              web  9090/TCP
TargetPort:        9090/TCP
Endpoints:         192.168.124.25:9090
Session Affinity:  None
Events:            <none>

Name:              kube-prometheus-stack-prometheus-node-exporter
Namespace:         monitoring
Labels:            app=prometheus-node-exporter
                   app.kubernetes.io/managed-by=Helm
                   chart=prometheus-node-exporter-1.16.2
                   heritage=Helm
                   jobLabel=node-exporter
                   release=kube-prometheus-stack
Annotations:       meta.helm.sh/release-name: kube-prometheus-stack
                   meta.helm.sh/release-namespace: monitoring
                   prometheus.io/scrape: true
Selector:          app=prometheus-node-exporter,release=kube-prometheus-stack
Type:              ClusterIP
IP Families:       <none>
IP:                10.108.8.117
IPs:               10.108.8.117
Port:              metrics  9100/TCP
TargetPort:        9100/TCP
Endpoints:         10.10.22.11:9100,10.10.22.12:9100,10.10.22.14:9100
Session Affinity:  None
Events:            <none>

Name:              prometheus-operated
Namespace:         monitoring
Labels:            operated-prometheus=true
Annotations:       <none>
Selector:          app=prometheus
Type:              ClusterIP
IP Families:       <none>
IP:                None
IPs:               None
Port:              web  9090/TCP
TargetPort:        web/TCP
Endpoints:         192.168.124.25:9090
Session Affinity:  None
Events:            <none>
allenporter commented 3 years ago

I built a local copy of external-dns and added some logging:

time="2021-03-27T22:24:56Z" level=error msg="Skipping address for service[kube-prometheus-stack-kube-etcd] missing TargetRef"

It looks like that corresponds to these services and endpoints:

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: kube-prometheus-stack-kube-etcd
    chart: kube-prometheus-stack-14.4.0
    helm.toolkit.fluxcd.io/name: kube-prometheus-stack
    helm.toolkit.fluxcd.io/namespace: monitoring
    heritage: Helm
    jobLabel: kube-etcd
    release: kube-prometheus-stack
  name: kube-prometheus-stack-kube-etcd
  namespace: kube-system
spec:
  clusterIP: None
  ports:
  - name: http-metrics
    port: 2379
    protocol: TCP
    targetPort: 2379
  type: ClusterIP
---
apiVersion: v1
kind: Endpoints
metadata:
  labels:
    app: kube-prometheus-stack-kube-etcd
    chart: kube-prometheus-stack-14.4.0
    helm.toolkit.fluxcd.io/name: kube-prometheus-stack
    helm.toolkit.fluxcd.io/namespace: monitoring
    heritage: Helm
    k8s-app: etcd-server
    release: kube-prometheus-stack
  name: kube-prometheus-stack-kube-etcd
  namespace: kube-system
subsets:
- addresses:
  - ip: 10.10.22.11
  ports:
  - name: http-metrics
    port: 2379
    protocol: TCP