kubernetes-sigs / external-dns

Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services
Apache License 2.0
7.69k stars 2.56k forks source link

External-dns crashes when trying to add a hostname with more than 63 characters #4176

Open jcralbino opened 9 months ago

jcralbino commented 9 months ago

What happened: The ExternalDNS tried to create a DNSEntry with more than 63 characters and it crashed we have in our case hostname=64 characters orb-postgresql-policies-headless_orb_mg-t-cn-network-k8s-cluster

time="2024-01-12T23:34:25+01:00" level=debug msg="Generating matching endpoint orb-postgresql-policies-0.orb-postgresql-policies-headless_orb_mg-t-cn-network-k8s-cluster.lab.tkgi.mgmtdom.intra with EndpointAddress IP 10.196.10.250"
time="2024-01-12T23:34:25+01:00" level=error msg="label orb-postgresql-policies-headless_orb_mg-t-cn-network-k8s-cluster in orb-postgresql-policies-0.orb-postgresql-policies-headless_orb_mg-t-cn-network-k8s-cluster.lab.tkgi.mgmtdom.intra is longer than 63 characters. Cannot create endpoint"
time="2024-01-12T23:34:25+01:00" level=error msg="label orb-postgresql-policies-headless_orb_mg-t-cn-network-k8s-cluster in orb-postgresql-policies-headless_orb_mg-t-cn-network-k8s-cluster.lab.tkgi.mgmtdom.intra is longer than 63 characters. Cannot create endpoint"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x60 pc=0x1ab0201]

goroutine 1 [running]:
sigs.k8s.io/external-dns/source.(*serviceSource).generateEndpoints(0xc0004c62c0, 0xc000e48b58, {0xc000f6ef60, 0x57?}, {0x5d1f5e0, 0x0, 0x0}, {0x0, 0x0}, 0x0)
    /bitnami/blacksmith-sandox/external-dns-0.14.0/src/github.com/kubernetes-incubator/external-dns/source/service.go:543 +0xca1
sigs.k8s.io/external-dns/source.(*serviceSource).endpointsFromTemplate(0xc0004c62c0, 0xc000e48b58)
    /bitnami/blacksmith-sandox/external-dns-0.14.0/src/github.com/kubernetes-incubator/external-dns/source/service.go:383 +0x1a7
sigs.k8s.io/external-dns/source.(*serviceSource).Endpoints(0xc0004c62c0, {0x1?, 0x80cafd3f00000000?})
    /bitnami/blacksmith-sandox/external-dns-0.14.0/src/github.com/kubernetes-incubator/external-dns/source/service.go:185 +0x35c
sigs.k8s.io/external-dns/source.(*multiSource).Endpoints(0xc000cc3e30, {0x3dd1428, 0xc000f65320})
    /bitnami/blacksmith-sandox/external-dns-0.14.0/src/github.com/kubernetes-incubator/external-dns/source/multisource.go:36 +0xcc
sigs.k8s.io/external-dns/source.(*dedupSource).Endpoints(0xc000eccac0, {0x3dd1428, 0xc000f65320})
    /bitnami/blacksmith-sandox/external-dns-0.14.0/src/github.com/kubernetes-incubator/external-dns/source/dedupsource.go:42 +0xbc
sigs.k8s.io/external-dns/source.(*targetFilterSource).Endpoints(0xc0007ff7a0, {0x3dd1428?, 0xc000f65320?})
    /bitnami/blacksmith-sandox/external-dns-0.14.0/src/github.com/kubernetes-incubator/external-dns/source/targetfiltersource.go:43 +0x42
sigs.k8s.io/external-dns/controller.(*Controller).RunOnce(0xc00117d930, {0x3dd1460, 0xc000757090})
    /bitnami/blacksmith-sandox/external-dns-0.14.0/src/github.com/kubernetes-incubator/external-dns/controller/controller.go:218 +0x1f1
sigs.k8s.io/external-dns/controller.(*Controller).Run(0xc00117d930?, {0x3dd1460, 0xc000757090})
    /bitnami/blacksmith-sandox/external-dns-0.14.0/src/github.com/kubernetes-incubator/external-dns/controller/controller.go:333 +0xaf
main.main()
    /bitnami/blacksmith-sandox/external-dns-0.14.0/src/github.com/kubernetes-incubator/external-dns/main.go:475 +0x4a4f

What you expected to happen:

The external-dns should move to the next entry, and present only a error in the log

If we consider the fact that each label/name object in k8s is limited to 63 characters. The fqdn hostname template {{.Name}}-{{.Namespace} , will be in total 126 characters. ( if we consider the worst case scenario)

As the dns hostname is limited to 63 characters, trying to create something higher than 63 should not work. But this use case will crash the external-dns.

This bug introduces additional risk and reduces the resilience and availability of this Software

How to reproduce it (as minimally and precisely as possible): Create a dns hostname entry with more than 63 characters.

Anything else we need to know?: the flag fqdn-template used is --fqdn-template={{.Name}}_{{.Namespace}}_26hostname-suffix-character.22charact.in.subdomain" Environment:

theloneexplorerquest commented 9 months ago

This could be resolved by inserting an if statement at https://github.com/kubernetes-sigs/external-dns/blob/master/source/service.go#L744

            if ep != nil {
                endpoints = append(endpoints, ep)
            } 

However, since it is ERROR, does that mean we crashed it intentionally? Happy to create a PR if that is not the case. :blush:

ivankatliarchuk commented 9 months ago

For SREs, dealing with an issue like this is a headache, especially when the service is in a crashloopbackoff state due to a user supplying a poison pill.

The SIGSEGV error signifies that something in the code has mishandled pointers, and the code outputs a stack trace. In simple terms, fundamentally, the code is accessing memory incorrectly. The memory address provided can be useful for debugging. For instance, if it's close to zero, it probably indicates a null pointer dereference; as it's something like 0x1ab0201, it might be intentional use of of an invalid/null pointer?.

Looks like a nasty bug; the suggested fix might not really get to the core issue—it's likely just a case of incorrect error handling which leads to a potential memory leak or|and service crash.

theloneexplorerquest commented 9 months ago

For SREs, dealing with an issue like this is a headache, especially when the service is in a crashloopbackoff state due to a user supplying a poison pill.

The SIGSEGV error signifies that something in the code has mishandled pointers, and the code outputs a stack trace. In simple terms, fundamentally, the code is accessing memory incorrectly. The memory address provided can be useful for debugging. For instance, if it's close to zero, it probably indicates a null pointer dereference; as it's something like 0x1ab0201, it might be intentional use of of an invalid/null pointer?.

Looks like a nasty bug; the suggested fix might not really get to the core issue—it's likely just a case of incorrect error handling which leads to a potential memory leak or|and service crash.

IMO It seems the code does not handle when endpoint is invalid: we should not add a nil to endpoints list. Can you suggest the proper way to fix the issue? Thanks

ivankatliarchuk commented 9 months ago

I wrongly assumed. You are right, should fix current problem.

ivankatliarchuk commented 7 months ago

Looks like it should be fixed. The fix is there https://github.com/kubernetes-sigs/external-dns/pull/4293

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

mat285 commented 3 months ago

/remove-lifecycle rotten

mat285 commented 3 months ago

I'm still experiencing this issue on version 0.14.2, seems like it hasn't been fixed?

k8s-triage-robot commented 2 weeks ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale