kubernetes-sigs / external-dns

Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services
Apache License 2.0
7.6k stars 2.55k forks source link

RFC2136 - Different services binds to the same DNS name. #3330

Open szimonek opened 1 year ago

szimonek commented 1 year ago
  1. Create first LoadBalancer Service with external-dns hostname annotation in default Namespace:
apiVersion: v1
kind: Service
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: myapp-other.example.com
    external-dns.alpha.kubernetes.io/ttl:60
  labels:
    app: myapp
  name: myapp-loadbalancer
  namespace: default
spec:
  ports:
  - name: http
    protocol: TCP
    port: 80
    nodePort: 30032
    targetPort: 8080
  selector:
    app: myapp
  type: LoadBalancer
  1. Service is created, gets EXTERNAL-IP from metallb, and external-dns populates DNS entry.
  2. Create second service with the same DNS Name, but in different Namespace (malicious) and with different selector:
apiVersion: v1
kind: Service
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: myapp-other.example.com
    external-dns.alpha.kubernetes.io/ttl:60
  labels:
    app: maliciousapp
  name: maliciousapp-loadbalancer
  namespace: malicious
spec:
  ports:
  - name: http
    protocol: TCP
    port: 80
    nodePort: 30062
    targetPort: 8888
  selector:
    app: maliciousapp
  type: LoadBalancer

What happened:

Both services were created, both got EXTERNAL-IP from metallb, and both populated DNS entry.

$ kubectl get service/myapp-loadbalancer
NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)        AGE
myapp-loadbalancer   LoadBalancer   10.43.214.127   10.2.110.224   80:30032/TCP   12h
$

$ kubectl get service/myapp-loadbalancer -o custom-columns=NAME:.metadata.name,ANNOTATAIONS:.metadata.annotations
NAME                 ANNOTATAIONS
myapp-loadbalancer   map[external-dns.alpha.kubernetes.io/hostname:myapp-other.example.com external-dns.alpha.kubernetes.io/ttl:60 field.cattle.io/publicEndpoints:[{"addresses":["10.2.110.224"],"port":80,"protocol":"TCP","serviceName":"default:myapp-loadbalancer","allNodes":false}]]
$

$ kubectl get endpoints/myapp-loadbalancer
NAME                 ENDPOINTS           AGE
myapp-loadbalancer   10.42.56.186:8080   83m
$
$ kubectl --namespace=malicious get service/maliciousapp-loadbalancer
NAME                        TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)        AGE
maliciousapp-loadbalancer   LoadBalancer   10.43.95.142   10.2.110.225   80:30062/TCP   2m
$

$ kubectl --namespace=malicious get service/maliciousapp-loadbalancer -o custom-columns=NAME:.metadata.name,ANNOTATAIONS:.metadata.annotations
NAME                        ANNOTATAIONS
maliciousapp-loadbalancer   map[external-dns.alpha.kubernetes.io/hostname:myapp-other.example.com external-dns.alpha.kubernetes.io/ttl:60 field.cattle.io/publicEndpoints:[{"addresses":["10.2.110.225"],"port":80,"protocol":"TCP","serviceName":"malicious:maliciousapp-loadbalancer","allNodes":false}]]

$ kubectl --namespace=malicious get endpoints/dpaste-loadbalancer-other-hacked
NAME                        ENDPOINTS            AGE
maliciousapp-loadbalancer   10.42.71.136:8888   103s
$

So from DNS side I have name myapp-other.example.com DNS-balanced between 10.2.110.224 (good) and 10.2.110.225 (malicious) addressess:

$ nslookup myapp-other.example.com
Server:         127.0.0.1
Address:        127.0.0.1#53

Name:   myapp-other.example.com
Address: 10.2.110.224
Name:   myapp-other.example.com
Address: 10.2.110.225
$

What you expected to happen:

Now I have one DNS name myapp-other.example.com with two DNS A records (good and malicious), where I should have only one (good) and creating second one (malicious) should return an error, because TXT based registry shouldn't allow such DNS configuration.

different namespace + different service name + different pod selector = definitely different applications so these can't share the same DNS name. If I'm not missing something then currently it behave like classic DNS poisoning based attack.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Logs from external-dns:

time="2023-01-17T12:58:31Z" level=debug msg="Endpoints generated from service: default/myapp-loadbalancer: [myapp-other.example.com 60 IN A  10.2.110.224 []]"
time="2023-01-17T12:58:31Z" level=debug msg="Endpoints generated from service: malicious/maliciousapp-loadbalancer: [myapp-other.example.com 60 IN A  10.2.110.225 []]"
time="2023-01-17T12:58:31Z" level=debug msg="RemoveRecord.ep=myapp-other.example.com 120 IN A  10.2.110.224;10.2.110.225 []"
time="2023-01-17T12:58:31Z" level=info msg="Removing RR: myapp-other.example.com 120 A 10.2.110.224"
time="2023-01-17T12:58:31Z" level=info msg="Removing RR: myapp-other.example.com 120 A 10.2.110.225"
time="2023-01-17T12:58:31Z" level=debug msg="AddRecord.ep=myapp-other.example.com 60 IN A  10.2.110.224;10.2.110.225 []"
time="2023-01-17T12:58:31Z" level=info msg="Adding RR: myapp-other.example.com 120 A 10.2.110.224"
time="2023-01-17T12:58:31Z" level=info msg="Adding RR: myapp-other.example.com 120 A 10.2.110.225"
time="2023-01-17T12:58:31Z" level=debug msg="RemoveRecord.ep=myapp-other--external-dns.example.com 0 IN TXT  \"heritage=external-dns,external-dns/owner=k8s,external-dns/resource=service/default/myapp-loadbalancer\" []"
time="2023-01-17T12:58:31Z" level=info msg="Removing RR: myapp-other--external-dns.example.com 0 TXT \"heritage=external-dns,external-dns/owner=k8s,external-dns/resource=service/default/myapp-loadbalancer\""
time="2023-01-17T12:58:31Z" level=debug msg="AddRecord.ep=myapp-other--external-dns.example.com 0 IN TXT  \"heritage=external-dns,external-dns/owner=k8s,external-dns/resource=service/default/myapp-loadbalancer\" []"
time="2023-01-17T12:58:31Z" level=info msg="Adding RR: myapp-other--external-dns.example.com 120 TXT \"heritage=external-dns,external-dns/owner=k8s,external-dns/resource=service/default/myapp-loadbalancer\""
time="2023-01-17T12:58:31Z" level=debug msg="RemoveRecord.ep=myapp-other-a-external-dns.example.com 0 IN TXT  \"heritage=external-dns,external-dns/owner=k8s,external-dns/resource=service/default/myapp-loadbalancer\" []"
time="2023-01-17T12:58:31Z" level=info msg="Removing RR: myapp-other-a-external-dns.example.com 0 TXT \"heritage=external-dns,external-dns/owner=k8s,external-dns/resource=service/default/myapp-loadbalancer\""
time="2023-01-17T12:58:31Z" level=debug msg="AddRecord.ep=myapp-other-a-external-dns.example.com 0 IN TXT  \"heritage=external-dns,external-dns/owner=k8s,external-dns/resource=service/default/myapp-loadbalancer\" []"
time="2023-01-17T12:58:31Z" level=info msg="Adding RR: myapp-other-a-external-dns.example.com 120 TXT \"heritage=external-dns,external-dns/owner=k8s,external-dns/resource=service/default/myapp-loadbalancer\""

Environment:

TomyLobo commented 1 year ago

So you created 2 services with the external-dns.alpha.kubernetes.io/hostname annotation with the same value on the same cluster and you're wondering why they're both showing up? That's pretty much the designed behaviour.

On the solution side of things: You can tell external-dns not to created RRs for Service resources and then use an Ingress resource. Ingress resources (at least in my playground cluster) can only be defined once per hostname/path combination:

Error from server (BadRequest): error when creating "double-ingress.yaml": admission webhook "validate.nginx.ingress.kubernetes.io" denied the request: host "myapp-other.example.com" and path "/" is already defined in ingress default/test
szimonek commented 1 year ago

Hi,

Thank you for your answer. If this is designed behaviour, then it is defective by design. Let's take a production k8s cluster with external-dns configured, and deploy our mission critical 24/7 app on that cluster with external-dns hostname criticalsystem.example.com (i.e. internal system of any hospital), So far so good. At some point in time somebody by mistake or by deliberate malicious action, deploys another service with the same DNS name. When external-dns configure it the DNS load balancing magic happens, and half of the requests that should go to criticalsystem.example.com now goes to some other application. When such thing breaks 24/7 "five-nines" availability, then everybody is starting to ask "why?!?", and "designed behaviour" is not a good answer.

In TXT records external-dns populates owner and resource stanzas. Such behavior which currently is defauilt, should only be possible in "relaxed security mode" triggered by new external-dns startup parameter. By default both should be strictly checked:

  1. DNS name exists, and doesn't have TXT records -> name not owned, do not add new record
  2. DNS name exists, and have TXT records, owner stanza not matched -> name not owned by this external-dns instance, do not add new record
  3. DNS name exists, and have TXT records, owner stanza matched, resource stanza not matched -> name owned by this external-dns instance, but by another service/ingress, add name only when all endpoints of new service/ingress is matching all endpoints of current service (mirror service) or when in relaxed security mode.

I believe that point 1 and 2 works, but number 3 is not implemented. That is why I filled this bug report.

About the solution I have applications that must use LoadBalancer services, so I need external-dns to create records for both services and ingresses.

TomyLobo commented 1 year ago

Ownership is bound to an external-dns instance, based on the txtOwnerId parameter, not to individual resources.

TomyLobo commented 1 year ago

So if you can't use Ingresses, you can either run one external-dns instance per namespace, each with its own txtOwnerId value or make your own mutual exclusion between services with the same hostname with an admission webhook or something at the CI level.

szimonek commented 1 year ago

Hi,

We have nearly 50 different projects on production K8s cluster and that number is still growing. Each project owns one or more Namespace, where ownership implicates full read-write access to Namespace. I can't run external-dns inside these Namespaces, because in such configuration users could (and would) see DNS security keys used by external-dns to authorize dynamic updates. I can create additional administrative Namespace for each app Namespace, just to host external-dns and have a separate configuration for each external-dns, but with so many projects Ops overhead would be massive. Having one main corporate DNS zone and different configurations for prodution and non-production clusters, I would need to spin up one houndred external-dns instances each inside new Namespace. Massive :-(

I consider using admission webhook for that case, but it still is an additional component to develop for something that could (and for the sake of security should) be implemented inside enternal-dns.

badnetmask commented 1 year ago

I'm here to agree with @szimonek. While the current behavior may be acceptable for the majority of the cases, it's certainly not acceptable for a multi-tenant cluster where each namespace belongs to a different organization, team or person. One could argue the app has not been designed for such cases, but I think there is value exploring them.

With all that said, I would like to add some of my own observations:

My gut feeling (without going too much into the weeds/code) is that the problem could somehow be "fixed" (or at least workaround) if there was an extra check on the resource ownership, to compare the entire string, which includes the namespace. Because, like @szimonek mentioned, it appears like the app ignores records created in different clusters and by different CRs (providers), but completely ignore the namespace where the resources are coming from.

I'm willing to help testing to find a solution here (I would probably contribute with code if I had time, sorry).

TomyLobo commented 1 year ago

Sorry @szimonek I missed your reply 2 weeks ago.

As far as I can tell, external-dns is not multi-tenant-capable on its own. There is no internal security. Even the solution I proposed with the multiple external-dns instances would not be secure if they all are configured the same way except for txtOwnerId. The security issue you run into is that if one tenant unregisters a hostname and another could just register it. You could solve this by using admission hooks to verify that each namespace only registers a specific set of hostnames. I hear Kyverno is nice for making admission hooks.

I guess the existence of (the much more flexible) admission hooks is why the external-dns people never bothered to implement these kinds of rules on their side.

@badnetmask sorry, a bit tired right now, I only skimmed your post, so I'm sorry if I missed anything. You mention extra checks for resource ownership - that's still not preventing another namespace from stealing your hostname if you briefly unregister it. The solution is, again, admission hooks.

szimonek commented 1 year ago

Hi,

@TomyLobo I don't request for major functional changes in external-dns i.e. adding layer of internal security or multi-tenancy. I'm aware that working securely in multi-tenant environment is a big challenge. As far as I understand external-dns code, currently it performs such checks when it is requested to add/modify a DNS record:

  1. If such A/CNAME record exists and has no related TXT records, then record change is rejected. This prevents any external-dns instance of stealing/redirecting statically added DNS records.
  2. If such A/CNAME record exists and has TXT records but with external-dns/owner value not equal to txtOwnerId of requesting external-dns instance, then record change is rejected. This prevents any external-dns instance of stealing/redirecting DNS records that are managed by another external-dns instance.
  3. In all other cases the change is accepted.

My request is to add one more rule to above ruleset, which can be switched on/off by startup parameter. Rule that verifies if both external-dns/owner and external-dns/resource are equal to current values provided in TXT record. This will prevent a single instance of external-dns from linking multiple Kubenetes resources to a single DNS record. This will prevent of stealing/redirecting DNS records inside one external-dns instance.

This of course will not solve the possibility of stealing DNS records which owning external-dns instance briefly unregister, just to register it again after a while, but that is different problem that needs another solution. I have an idea how this could be achieved in a "DNS-way", but as I mentioned before this is not in a scope of this request/issue.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

chrpinedo commented 1 year ago

Hi,

I am facing the same issue and i also think the key is the external-dns/resource field, which is managed weakly. Two services with the same annotation are able to create 2 A entries:

nginx-1.k8spru.local.  10      IN      A       10.0.60.20
nginx-1.k8spru.local.  10      IN      A       10.0.60.18

However, the external-dns/resource field only points to one of the two services, the one which creates firstly the A registry:

external-dns-nginx-1.k8spru.local. 0 IN TXT    "heritage=external-dns,external-dns/owner=k8spru,external-dns/resource=service/tmp-prueba-dyndns/nginx-1"

If this would be the desired behaviour (allow this kind of double DNS entries), external-dns/resource should point the two services that created the two A RR: service/tmp-prueba-dyndns/nginx-1 and service/tmp-prueba-dyndns/nginx-2. But I agree with @szimonek and @badnetmask that the expected behaviour should be to avoid this kind of double DNS entries and external-dns/resouce field should be the third step to check before external-DNS creates the RR.

szuecs commented 1 year ago

/remove-lifecycle rotten

szuecs commented 1 year ago

@chrpinedo I don't really know what we should expect from overlapping services. The idea of tracking both and add both make sense to me, but I think also that DNS RR is not a great idea in general, of course sometimes it's useful and so I would review and merge a PR that would change the current behavior to something useful for people.

chrpinedo commented 1 year ago

I perhaps explained bad. My personal opinion is that overlapping services are an error and it should be avoided as it was proposed by @szimonek and @badnetmask by checking the external-dns/resource field before creating/deleting RR. But in the case it would be the desired behavior the external-dns/resource field should should point at leats to the all the services and this does not happen now (now it allows overlapping of 2 services and then the external-dns/resouceonly points to one of the two service).

szuecs commented 1 year ago

@chrpinedo that's also fine for me and more in sync with my own reasoning.

badnetmask commented 1 year ago
nginx-1.k8spru.local.  10      IN      A       10.0.60.20
nginx-1.k8spru.local.  10      IN      A       10.0.60.18

This is a good example of when allowing duplicated records can be bad: suppose .20 is cluster A while .18 is cluster B. If the resource on cluster A came first, there's a chance someone spun up an app on cluster B, maliciously or not, that will provide a different response to the users, depending on which one they hit (based on DNS round-robin).

As I said previously: there could be value in allowing these duplicated records, in case you want the same app to live in two different clusters (for disaster recovery, or geo dispersion for example), but at the same time one could argue that DNS round-robin is not a good solution for neither of those cases, and an appropriate load balancer (or DNS routing mechanism) should be put in front of the two clusters to select the appropriate destination.

In any case, going back to the theoretical cluster A and B situation, the fact that the operator only tracks one of the records is also bad, since it will eventually leave cruft behind.

szimonek commented 1 year ago

Hi,

My proposal how to address this issue is to add new configuration/startup parameter to external-dns. Let's call it --rr-uniqueness that can have three (or more) states:

none - where uniqueness is not verified (so it it current bahavior of external-dns what is already implemented) samens - where two A/CNAME resource records can have the same DNS name and point to different services only when these services are defined in the same Namespace, as usually each Namespace has only one owner that may need such behavior strict - where there is no possibility that two A/CNAME resource records can share the same DNS name.

I think that this solution will address most possible scenarios, and also is expandable to address these not covered.

szuecs commented 1 year ago

You can't have 2 CNAMEs. You can have of course multiple A or AAAA (or most other types) records.

chrpinedo commented 1 year ago

I agree with the proposal of @szimonek. It allows to customize the behavior.

Regarding the comment of @badnetmask that it could be a "malicious or not" behavior. My first impression was that various services with the same FQDN would be denied and only when I tested I realised that it was allowed. It seems to me more intuitive to be denied, as it is denied to duplicate the FQDN when it was not generated by external-dns (for example, static A RR). In my experience round robin DNS is used in very few occasions (it is not the best method to balance services) and the duplication of A RR may be more related to an error than to a feature. In any case, nowadays, with two different clusters and two external-dns services with different --txt-owner, you cannot duplicate the A RR among both clusters. You should use the same --txt-owner option in both external-dns services. And if the @szimonek proposal is implemented, this would be also allowed with the --rr-uniquenes set to none (default behavior).

TomyLobo commented 1 year ago

If one cluster holds the name and then unregisters it, then the name is non-existant and can be registered by a different cluster. This is not the same situation as a static A record. An equivalent situation would be you removing that static A record and then wondering why one of your clusters was able to register it.

jcralbino commented 8 months ago

Hello I am running a multi-tenant k8s environment, and I am currently using the flag Fqdn-template that creates the entry using the Name and Namespace variables obtained from the k8s

This would at least avoid having a collision issue

mgruener commented 6 months ago

We also have this issue. What I do not understand in regard to the "this is expected behavior" and "external-dns is not multi-tenant aware" is that the owner TXT record does contain not only contain the per-external-dns instance owner ID, it also contains the namespace and resource name of the resource that triggered the creation of the actual DNS record. Why is are these fields ignored (and even overwritten) in this case? What is their purpose if not to prevent a situation exactly like this?

Regarding the suggested --rr-uniquenes parameter, I think something like --enforce-record-owner would better match the intended effect.

justsomecorporateuser commented 3 months ago

Same issue here. Reviews have prevented bigger issues but eventually a copy-paste service- annotation duplicate went thru and caused "mysterious" problem when traffic went occasionally to wrong service. Would be really nice to have the TXT- records external-dns/resource - check possible with some flag. Writing a validation webhook seems like doing duplicate work. Also bit problematic to tackle this with Kyverno etc, unless they actually can access the DNS

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

szimonek commented 3 weeks ago

Still valid and need to be addressed.

/remove-lifecycle stale