kubernetes-sigs / external-dns

Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services
Apache License 2.0
7.42k stars 2.52k forks source link

Finer grained control over `endpoint.DNSEndpoint` custom resource #4575

Open pier-oliviert opened 1 week ago

pier-oliviert commented 1 week ago

What would you like to be added: Finer grained control over the custom resource endpoint.DNSEndpoint to allow all types of record to be created by that resource. One way of creating an x509 Certificate with cert-manager is to use the DNS01 Challenge which requires the creation of a TXT record. It would be a significant improvement to the external-dns<->cert-manager interoperability if we could create TXT record with this custom resource.

Essentially, I'm proposing three changes to endpoint.DNSEndpoint:

  1. An optional field in the spec to specify whether the record should store the status in the status sub-resource (for backward compatibility)
  2. Adding a Finalizer to the Resource to allow the control loop to delete records that are marked to be deleted
  3. Add a fields to the sub resource status to keep track of the records as they should be in the provider
type DNSEndpointSpec struct {
    Endpoints []*Endpoint `json:"endpoints,omitempty"`

    // +optional
    StoreStateInResource *bool `json:"storeStateInResource,omitempty"`
}

type DNSEndpointStatus struct {
    // The generation observed by the external-dns controller.
    // +optional
    ObservedGeneration int64 `json:"observedGeneration,omitempty"`

    Records []DNSEndpointRecord `json:"records,omitempty"`
}

This way, the control loop could work essentially the same way as it does now, but instead of fetching records from TXT records or DynamoDB, it could look at its status' subresource to validate the state of the record on the provider's side. This would allow the DNSEndpoint custom resource to support TXT records the same way it does everything else.

The deletion of a DNSEndpoint custom resource would be guarded by a finalizer. The control loop would check for the customRecord.ObjectMeta.DeletionTimestamp.IsZero() and if the value is not zero, it would attempt the deletion on the provider's side, then remove the finalizer to let Kubernetes gracefully remove the custom resource.

One thing to note is that the StoreStateInResource is an optional field, which means that if a custom resource doesn't have this value set to true, the custom resource would not store the state in the sub resource and would still use the normal route (TXT,dynamodb,noop). As a result, TXT records would only be supported for endpoint.Endpoint that have that value set to true. This was only added by me to add some level of backward compatibility for the current behavior.

Why is this needed: All types of Records could be supported, which means it would allow folks like me to create TXT record to use external-dns with cert-manager to create x509 certificate through DNS01 Challenge. Those challenge requires to create a TXT record with a special field that proves the zone is owned by the one requesting the certificate.

I believe this change alone would allow me to use external-dns for creating DNS01 challenge. I am also interested in creating the PR to get this implemented, but I believe starting with a conversation might be the best option, right now.

I'm sure there are things I'm overseeing, but I believe this has enough information to at least get a conversation started.

mloiseleur commented 6 days ago

Many thanks for this very interesting proposal :+1: !

One thing to note is that the StoreStateInResource is an optional field, which means that if a custom resource doesn't have this value set to true, the custom resource would not store the state in the sub resource and would still use the normal route (TXT,dynamodb,noop). As a result, TXT records would only be supported for endpoint.Endpoint that have that value set to true. This was only added by me to add some level of backward compatibility for the current behavior.

Registry is a global parameter on external-dns level. Wdyt of keeping it global for this use case ? I mean enabling this behavior by opt-in with something like --registry=crd.

pier-oliviert commented 6 days ago

@mloiseleur I think a global opt-in parameter could definitively work. I think I would want to have that global parameter be clear about the intent as much as possible. I think --registry=crd could possibly lead to some confusion for a user where they could potentially expect the registry to work for all sources.

Like, if someone sets this up, it would only affect DNS entries created by DNSEnpdoint. Maybe --crd-self-registry as a boolean flag? I usually don't like boolean flags that much, but I have a hard time coming up with a better solution.

mloiseleur commented 6 days ago

Mmmmh :thinking: , if it's not working like a registry, it means DNSEndPoint would have their status in the CRD and other Ingress/IngressRoute/HTTPRoute resource would need to have a separate registry ? and DNSEndpoint would have then status in both CR status field and in the registry.

I've mixed feelings about this approach. It would add yet another issue on registry and there are already many known limitations on current dns-based registry.

Wdyt of using another (dedicated) CRD, let's say DNSStatus, that would be created and used by both DNSEndpoint and other sources as a registry ? It would follow a common pattern between many resources in Kubernetes API.

mcharriere commented 4 days ago

Wdyt of using another (dedicated) CRD, let's say DNSStatus, that would be created and used by both DNSEndpoint and other sources as a registry ? It would follow a common pattern between many resources in Kubernetes API.

I also think this is a better approach, in one hand because using the status field to store some data could be consider semantically incorrect. e.g the ownerId fits better in the spec of a registry CR than in the status. Also, having a dedicated CRD would make the implementation less problematic, as we wouldn't need to deal with the existing CRD logic nor the sources.

I'd really like to see a CRD based registry.

pier-oliviert commented 4 days ago

The Status's CRD didn't click for me until I read your comment @mcharriere and I have to say that I now agree with both of you. A new CRD based registry really makes sense specially when considering the implementation, like you said.

pier-oliviert commented 2 days ago

Took some time over the weekend to think about a new CRD for the registry. I used DNSRegistry here instead of DNSStatus just because a DNSStatus has a subresource named status which felt unwieldy to me. The name isn't that important right now and we can decide to name this new CRD any way we want.

type DNSRegistrySpec struct {
        OwnerID `json:"ownerId"`
    Endpoints []*Endpoint `json:"endpoints,omitempty"`
}

type DNSRegistryStatus struct {
    // The generation observed by the external-dns controller.
    // +optional
    ObservedGeneration int64 `json:"observedGeneration,omitempty"`
}

This new registry would act similarly to the TXTRegistry that already exists with external-dns. ApplyChange's, which I believe is the core of that interface would manage this new custom resource the same way the TXTRegistry would it's TXT record.

The OwnerID value would be stored as a spec field as I think that's something we'd want to track. I could also see us using custom resource labels on this new custom resource to allow filtering capabilities.

Here's an example of what I mean:

&DNSRegistry{
  ObjectMeta: metav1.ObjectMeta{
     GenerateName: fmt.sprintf("%s-", dnsName),
     Labels: map[string]string{
       "externaldns.k8s.io/dns-name": dnsName,
       "externaldns.k8s.io/record-type": "A",
       "externaldns.k8s.io/owner-id": ownerID,
     }
  },
  Spec: DNSRegistrySpec{}
}

I'm not sure if it would be useful to us to do that, but it's definitively an option if there's a need to fetch registry entries per owner-id, for example.

I also think that a lot of the encryption/special logic in TXTRegistry wouldn't need to be ported to this new CRD registry as the data would live inside the cluster so there wouldn't be any need to encrypt the data here, but I'll defer it to you.

Last note, my original issue here talks about changing the behavior of endpoint.DNSEndpoint but this discussion is moving in a different direction (creating a new registry). I don't know if I should leave this issue as is and we continue here, or if I should create a new issue to be more clear that the conversation is about a new Registry CRD. Let me know what you all think.