`DNSEndpoint` resources cause Pi-hole to continually reboot (`SIGTERM`)

eXodus1440 commented 1 year ago

What happened: Declaring DNSEndpoint resources while using --provider=pihole causes Pi-hole to continually reboot (SIGTERM)

pi-hole logs from kubectl logs pihole-8669bb59d7-g4t6z --follow:

Stopping pihole-FTL
Terminated
Stopping pihole-FTL
...
Stopping pihole-FTL

pi-hole gdb debug with kubectl exec -it pihole-8669bb59d7-g4t6z -- /bin/sh:

apt update -y && apt install gdb -y
echo "handle SIGHUP nostop SIGPIPE nostop SIGTERM nostop SIG32 nostop SIG34 nostop SIG35 nostop" | sudo tee /root/.gdbinit
gdb -p $(pidof pihole-FTL)
continue

...
Thread 1 "pihole-FTL" received signal SIGTERM, Terminated.
0x00007fd1cbb07d2f in __GI___poll (fds=0x55e9da832650, nfds=6, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29  in ../sysdeps/unix/sysv/linux/poll.c

external-dns logs from kubectl logs external-dns-5fdf764c95-bnlfh --follow:

time="2023-03-08T23:17:18Z" level=debug msg="Listing A records from http://pihole.default.svc.cluster.local/admin/scripts/pi-hole/php/customdns.php"
time="2023-03-08T23:17:18Z" level=debug msg="Listing CNAME records from http://pihole.default.svc.cluster.local/admin/scripts/pi-hole/php/customcname.php"
time="2023-03-08T23:17:18Z" level=debug msg="No endpoints could be generated from service metallb-system/webhook-service"
time="2023-03-08T23:17:18Z" level=debug msg="No endpoints could be generated from service cert-manager/cert-manager"
time="2023-03-08T23:17:18Z" level=debug msg="No endpoints could be generated from service default/kubernetes"
time="2023-03-08T23:17:18Z" level=debug msg="No endpoints could be generated from service kube-system/metrics-server"
time="2023-03-08T23:17:18Z" level=debug msg="No endpoints could be generated from service cert-manager/cert-manager-webhook"
time="2023-03-08T23:17:18Z" level=debug msg="No endpoints could be generated from service kube-system/kube-dns"
time="2023-03-08T23:17:18Z" level=debug msg="Endpoints generated from service: traefik-system/traefik: [traefik.example.io 0 IN A  172.16.0.15 []]"
time="2023-03-08T23:17:18Z" level=debug msg="Endpoints generated from service: default/pihole: [pihole.example.io 0 IN A  172.16.0.18 []]"
time="2023-03-08T23:17:18Z" level=info msg="delete a.example.io IN A -> 172.16.0.25"
time="2023-03-08T23:17:25Z" level=info msg="delete cname.example.io IN CNAME -> a.example.io"
time="2023-03-08T23:17:25Z" level=info msg="add a.example.io IN A -> 172.16.0.25"
time="2023-03-08T23:17:32Z" level=info msg="add cname.example.io IN CNAME -> a.example.io"

When following logs from both external-dns & pi-hole containers in 2 separate windows, Stopping pihole-FTL occurs for each external-dns delete or add event. Full cycle occurs every 60sec, but that's in line with external-dns' default --interval=1m0s

What you expected to happen: A/CNAME record creation without continually triggering a reboot (SIGTERM) of the pihole-FTL process.

How to reproduce it (as minimally and precisely as possible):

pi-hole config

```yaml --- apiVersion: v1 kind: ConfigMap metadata: name: pihole-configmap data: TZ: "Europe/London" ADMIN_EMAIL: "admin@example.io" PIHOLE_DNS_: "8.8.8.8;8.8.4.4" VIRTUAL_HOST: pihole.example.io DNSMASQ_LISTENING: all --- apiVersion: v1 kind: Secret metadata: name: pihole-secret type: Opaque data: WEBPASSWORD: U3VwZXJTZWNyZXRQYXNzd29yZA== #SuperSecretPassword --- apiVersion: apps/v1 kind: Deployment metadata: name: pihole labels: app: pihole spec: replicas: 1 selector: matchLabels: app: pihole template: metadata: labels: app: pihole spec: containers: - name: pihole image: docker.io/pihole/pihole:2023.02.2 securityContext: capabilities: add: - SYS_PTRACE #Allowing gdb to debug the pihole-FTL process envFrom: - configMapRef: name: pihole-configmap - secretRef: name: pihole-secret ports: - name: pihole-dns-udp containerPort: 53 protocol: UDP - name: pihole-dns-tcp containerPort: 53 protocol: TCP - name: pihole-web containerPort: 80 protocol: TCP --- apiVersion: v1 kind: Service metadata: name: pihole annotations: external-dns.alpha.kubernetes.io/hostname: pihole.example.io spec: selector: app: pihole ports: - name: pihole-web port: 80 targetPort: 80 protocol: TCP - name: pihole-dns-tcp port: 53 targetPort: 53 protocol: TCP - name: pihole-dns-udp port: 53 targetPort: 53 protocol: UDP externalTrafficPolicy: Local type: LoadBalancer loadBalancerIP: 172.16.0.18 --- apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: pihole.example.io spec: secretName: pihole.example.io dnsNames: - pihole.example.io issuerRef: name: letsencrypt-prod kind: ClusterIssuer --- apiVersion: traefik.containo.us/v1alpha1 kind: IngressRoute metadata: name: pihole-ingress-https spec: entryPoints: - websecure routes: - match: Host(`pihole.example.io`) kind: Rule services: - name: pihole port: 80 tls: secretName: pihole.example.io --- ```

external-dns config

```yaml --- apiVersion: v1 kind: ServiceAccount metadata: name: external-dns --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: external-dns rules: - apiGroups: [""] resources: ["services","endpoints","pods"] verbs: ["get","watch","list"] - apiGroups: ["extensions","networking.k8s.io"] resources: ["ingresses"] verbs: ["get","watch","list"] - apiGroups: [""] resources: ["nodes"] verbs: ["list","watch"] - apiGroups: ["externaldns.k8s.io"] resources: ["dnsendpoints"] verbs: ["get","watch","list"] - apiGroups: ["externaldns.k8s.io"] resources: ["dnsendpoints/status"] verbs: ["get","update","patch","delete"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: external-dns-viewer roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: external-dns subjects: - kind: ServiceAccount name: external-dns namespace: default --- apiVersion: apps/v1 kind: Deployment metadata: name: external-dns spec: strategy: type: Recreate selector: matchLabels: app: external-dns template: metadata: labels: app: external-dns spec: serviceAccountName: external-dns containers: - name: external-dns image: registry.k8s.io/external-dns/external-dns:v0.13.2 env: - name: EXTERNAL_DNS_PIHOLE_PASSWORD valueFrom: secretKeyRef: name: pihole-secret key: WEBPASSWORD args: - --source=service - --source=ingress - --source=crd - --registry=noop - --policy=upsert-only - --provider=pihole - --pihole-server=http://pihole.default.svc.cluster.local - --log-level=debug securityContext: fsGroup: 65534 # For ExternalDNS to be able to read Kubernetes token files ```

external-dns crd-manifest

dnsendpoint resource

```yaml --- apiVersion: externaldns.k8s.io/v1alpha1 kind: DNSEndpoint metadata: name: example-a-record spec: endpoints: - dnsName: a.example.io recordTTL: 180 recordType: A targets: - 172.16.0.15 --- apiVersion: externaldns.k8s.io/v1alpha1 kind: DNSEndpoint metadata: name: example-cname-record spec: endpoints: - dnsName: cname.example.io recordTTL: 180 recordType: CNAME targets: - a.example.io ```

Anything else we need to know?:

Environment:

External-DNS version: kubectl exec -it external-dns-5fdf764c95-bnlfh -- external-dns --version returns a blank line but pulling v0.13.2 tagged container from registry.k8s.io
DNS provider: pihole

Others:

kubectl get nodes
NAME                        STATUS   ROLES                  AGE   VERSION
k3s-inframgmt-node-1093     Ready    <none>                 37h   v1.25.6+k3s1
k3s-inframgmt-node-1092     Ready    <none>                 37h   v1.25.6+k3s1
k3s-inframgmt-master-1091   Ready    control-plane,master   37h   v1.25.6+k3s1

Also using metallb version v0.13.9 as the LoadBalancer provider

eXodus1440 commented 1 year ago

On further troubleshooting, Pi-hole triggers a restart (SIGTERM) when adding or deleting records via the web interface as well - once per add/delete event.

The issue now seems more related to external-dns being unable to track records created in Pi-hole from DNSEndpoint resources, and as such, deletes & re-adds entries every x interval (1m0s by default) - once per DNSEndpoint resource.

Currently using a phantom Ingress resource as a workaround, rather than via DNSEndpoint - example below

---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  selector:
    app: nginx
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
          protocol: TCP
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: nginx-ingress-http
spec:
  entryPoints:
    - web
  routes:
    - match: Host(`nginx-test.example.io`)
      kind: Rule
      services:
        - name: nginx
          port: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx-ingress
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  rules:
  - host: nginx-test.example.io

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

szuecs commented 1 year ago

Do you use the registry? Can you share how you start external-dns? Can you check if kubelet tries to terminate external-dns? Someone sends a sigterm and I don't see from what you shared that this is an external-dns issue.

BoKKeR commented 9 months ago

I was experiencing the same problem but I assumed that it was because pi-hole got overwhelmed and crashed the FTL service. How can I provide more information?

This is how I added it:

https://github.com/BoKKeR/flux-cluster/commit/1fc1c863f6c2d7df1c4b94c75c5f6ce47d37ae40

BoKKeR commented 7 months ago

I have posted a workaround in this thread.

https://discourse.pi-hole.net/t/ftl-crashes-repeatedly-when-updating-dns-records-through-external-dns/66867

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 2 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/external-dns/issues/3463#issuecomment-2198476581): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / external-dns

`DNSEndpoint` resources cause Pi-hole to continually reboot (`SIGTERM`) #3463