zachomedia / cert-manager-webhook-pdns

A PowerDNS webhook for cert-manager
MIT License
57 stars 33 forks source link

404 communicating with PowerDNS #9

Closed hacknisty closed 10 months ago

hacknisty commented 2 years ago

Hi,

I'm trying to setup cert-manager-webhook-pdns using the Helm chart.

The issuer has been defined this way :

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    email: me@me.com
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-staging-account-key
    solvers:
      - dns01:
          webhook:
            groupName: acme.zacharyseguin.ca
            solverName: pdns
            config:
              host: http://pdns:8081
              apiKeySecretRef:
                name: pdns-api-key
                key: key

              # Optional config, shown with default values
              #   all times in seconds
              ttl: 120
              timeout: 30
              propagationTimeout: 120
              pollingInterval: 2

The pdns-api-key has been properly setup with the corresponding PDNS api key.

I keep getting this error message in the cert-manager container, and no cert are delivered :

E0120 16:29:59.530988 1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="pdns: unexpected HTTP status code 404 when fetching 'http://pdns:8081/api/v1/'" "key"="default/k8s-example-cert-qwkrh-2391234416-114646716"

btw, http://pdns:8081/api/v1/ always return 404 not found in my setup, but querying a server or a zone is working properly.

how can i properly setup this plugin ?

zachomedia commented 2 years ago

@vdnclodio Would it be possible to share more about your setup so I can attempt to reproduce the issue?

  1. Can you share your PowerDNS deployment + configuration (with any sensitive values removed)?
  2. Can you share your Certificate manifest?
  3. Does the cert-manager-powerdns-webhook pod have any errors in its log?
hacknisty commented 2 years ago

@zachomedia

  1. PowerDNS is deployed on a Debian Buster, version 4.1.6 (outside of the k8s cluster)

Configuration: (i only paste the uncommented options, every other options are set to their default value)

#################################
# default-soa-name      name to insert in the SOA record if none set in the backend
#
# default-soa-name=a.misconfigured.powerdns.server
default-soa-name=letsencrypt.mydomain

#################################
# include-dir   Include *.conf files from this directory
#
# include-dir=
include-dir=/etc/powerdns/pdns.d

#################################
# launch        Which backends to launch and order to query them in
#
# launch=
launch=

#################################
# local-address Local IP addresses to which we bind
#
# local-address=0.0.0.0
local-address=w.x.y.z

#################################
# webserver     Start a webserver for monitoring (api=yes also enables the HTTP listener)
#
# webserver=no
webserver=yes
api=yes
api-key=myapikey

#################################
# webserver-address     IP Address of webserver/API to listen on
#
# webserver-address=127.0.0.1
webserver-address=w.x.y.z

#################################
# webserver-allow-from  Webserver/API access is only allowed from these subnets
#
# webserver-allow-from=127.0.0.1,::1
webserver-allow-from=0.0.0.0/0

#################################
# webserver-password    Password required for accessing the webserver
#
webserver-password=changeme
  1. I want to use it to generate Ingress cert :
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
    name: k8s-example
    namespace: default
    annotations:
    ingress.kubernetes.io/rewrite-target: /
    cert-manager.io/cluster-issuer: letsencrypt-staging
    spec:
    rules:
    - host: k8s-example.mydomain
    http:
      paths:
        - path: /apple
          pathType: Prefix
          backend:
            service:
              name: apple-service
              port: 
                number: 5678
    tls:
    - hosts:
    - k8s-example.mydomain
    secretName: k8s-example-cert
  2. No error in powerdns-webhook pod. Only this :
    I0120 16:28:18.629577 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
    I0120 16:28:18.629542 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
    I0120 16:28:18.629605 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
    I0120 16:28:18.629786 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
    I0120 16:28:18.629788 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
    I0120 16:28:18.630121 1 secure_serving.go:266] Serving securely on [::]:443
    I0120 16:28:18.630223 1 dynamic_serving_content.go:129] "Starting controller" name="serving-cert::/tls/tls.crt::/tls/tls.key"
    I0120 16:28:18.629789 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
    I0120 16:28:18.630392 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
    I0120 16:28:18.630893 1 apf_controller.go:299] Starting API Priority and Fairness config controller
    I0120 16:28:18.730130 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
    I0120 16:28:18.730175 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
    I0120 16:28:18.730589 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
    I0120 16:28:18.731532 1 apf_controller.go:304] Running API Priority and Fairness config worker

Also :

curl -s -H 'X-API-Key: myapikey' http://w.x.y.z:8081/api | jq .
[
  {
    "url": "/api/v1",
    "version": 1
  }
]
curl -s -H 'X-API-Key: myapikey' http://w.x.y.z:8081/api/v1
Not Found
curl -s -H 'X-API-Key: myapikey' http://w.x.y.z:8081/api/v1/
Not Found
curl -s -H 'X-API-Key: myapikey' http://w.x.y.z:8081/api/v1/servers | jq .
[
  {
    "config_url": "/api/v1/servers/localhost/config{/config_setting}",
    "daemon_type": "authoritative",
    "id": "localhost",
    "type": "Server",
    "url": "/api/v1/servers/localhost",
    "version": "4.1.6",
    "zones_url": "/api/v1/servers/localhost/zones{/zone}"
  }
]
hacknisty commented 2 years ago

@zachomedia

    k8s-02.my.me.16658 > letsencrypt.8081: Flags [P.], cksum 0x7f3d (correct), seq 0:185, ack 1, win 507, options [nop,nop,TS val 3975310894 ecr 1822938651], length 185
E...].@.?..._..R_...A.......Cn.d.....=.....
..n.l...GET /api/v1/servers/localhost/zones HTTP/1.1
Host: w.x.y.z:8081
User-Agent: Go-http-client/1.1
X-Api-Key: myapikey
Accept-Encoding: gzip
10:53:34.698422 IP (tos 0x0, ttl 63, id 31692, offset 0, flags [DF], proto TCP (6), length 214)

...

    k8s-02.my.me.44755 > letsencrypt.8081: Flags [P.], cksum 0xac1d (correct), seq 0:162, ack 1, win 507, options [nop,nop,TS val 3974830689 ecr 1822458464], length 162
E...{.@.?..._..R_..........F..l............
...al..`GET /api/v1/ HTTP/1.1
Host: w.x.y.z:8081
User-Agent: Go-http-client/1.1
X-Api-Key: myapikey
Accept-Encoding: gzip

any clues on this ? (my guess would be that no zone corresponding to my request has been found or somethign like that)

hacknisty commented 2 years ago

@zachomedia Ok so a bit more information about my setup : I use a delegated zone for acme challenge :

_acme-challenge IN NS letsencrypt.mydomain.com.

This setup already works for hundreds of domain using certbot, i just want to enable the same feature in my k8s cluster.

same as this :

https://cert-manager.io/docs/configuration/acme/dns01/#delegated-domains-for-dns01

So if i run the test suite with mydomain.com, i end up with the 404 error on /api/v1/, but if i run the test suite with _acme-challenge.mydomain.com i got this :

--- FAIL: TestRunsSuite (49.39s)
    --- FAIL: TestRunsSuite/Conformance (43.11s)
        --- FAIL: TestRunsSuite/Conformance/Basic (14.03s)
            --- FAIL: TestRunsSuite/Conformance/Basic/PresentRecord (14.03s)
                util.go:59: skipping file "testdata/pdns/README.md" with unrecognised extension
                util.go:68: created fixture "basic-present-record"
                suite.go:37: Calling Present with ChallengeRequest: &v1alpha1.ChallengeRequest{UID:"", Action:"", Type:"", DNSName:"example.com", Key:"123d==", ResourceNamespace:"basic-present-record", ResolvedFQDN:"cert-manager-dns01-tests._acme-challenge.mydomain.com.", ResolvedZone:"_acme-challenge.mydomain.com.", AllowAmbientCredentials:false, Config:(*v1.JSON)(0xc00000e048)}
                suite.go:49: error waiting for DNS record propagation: Could not determine authoritative nameservers for "cert-manager-dns01-tests._acme-challenge.mydomain.com."
        --- FAIL: TestRunsSuite/Conformance/Extended (15.39s)
            --- FAIL: TestRunsSuite/Conformance/Extended/DeletingOneRecordRetainsOthers (15.39s)
                util.go:59: skipping file "testdata/pdns/README.md" with unrecognised extension
                util.go:68: created fixture "extended-supports-multiple-same-domain"
                suite.go:103: error waiting for DNS record propagation: Could not determine authoritative nameservers for "cert-manager-dns01-tests._acme-challenge.mydomain.com."
FAIL
FAIL    github.com/zachomedia/cert-manager-webhook-pdns 49.421s
FAIL

is this scenario supported by this webhook ?

zachomedia commented 2 years ago

Ah! That might be why, I have not tested this with the delegated domains feature. I'll test it locally and confirm I can reproduce the issue and then figure out how to fix it.

Thanks!

zachomedia commented 2 years ago

@vdnclodio Can you try changing your issuer as such:


apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
# ...
    solvers:
      - dns01:
          cnameStrategy: Follow
          webhook:
            groupName: acme.zacharyseguin.ca
# ...

The cnameStrategy seems to be what's needed according to the docs in order to get cert-manager to update the correct zone - and in my limited testing so far seems to fix the issue.

hacknisty commented 2 years ago

@zachomedia Unfortunately, no luck, still getting a 404 not found on /api/v1/.

But i can see the request coming to my delegated DNS server so the cname following part is working.(using tcpdump), i can see this :

And i suppose the UnFqdn function for k8s-example.mydomain.com return mydomain.com and not _acme-challenge.mydomain.com (this is just a guess, maybe you have a way for me to use a debug webhook ?)

zachomedia commented 2 years ago

Can you confirm what your DNS configuration is? I'm thinking that it's different than what cert-manager is expecting.

If I read https://cert-manager.io/docs/configuration/acme/dns01/#delegated-domains-for-dns01, I understand it as this:

_acme-challenge.example.com should be a CNAME record to some other domain, say _acme-challenge.challenges.example.com. Then your pdns server should be authoritative for challenges.example.com, and which it can then insert the correct record. (So _acme-challenge IN CNAME _acme-challenge.challenges.example.com in your example.com zone)

Following this logic, I would expect that you have an entry in your example.com zone challenges IN NS letsencrypt.example.com. Then that server has a zone defined for challenges.example.com.

hacknisty commented 2 years ago

It can also be a NS (this is already working with certbot and they both follow the Acme DNS-01).

So I want a cert for k8s-example.mydomain.com,

I have my main domain name zone mydomain.com which contain a record like this :

_acme-challenge IN NS letsencrypt.mydomain.com.

I have a PowerDNS server at letsencrypt.mydomain.com hosting the _acme-challenge.mydomain.com zone And in this zone (the subdelegated zone) i have the TXT record to be updated by this webhook, which would validate DNS-01 request.

zachomedia commented 2 years ago

Unfortunately it appears that this is not supported by cert-manager: https://github.com/jetstack/cert-manager/issues/3453#issuecomment-725548578

And the issue was closed with:

Since cnameStrategy: Follow seems like a good way of working around the lack of "NS follow" support, I will close this issue. Feel free to re-open if you would like to expand.

Edit, to add: this webhook (and I'm sure all others) use the same functions to resolve the zone.

hacknisty commented 2 years ago

Ok, i will reopen this issue. Thanks a lot. As a temporary fix, i will make a custom build of your project, replacing the searched zone with the good one and i suppose it will just work (as i say i already have this setup working, and i can't change all of them to cname just like this ...), and i will wait for cert-manager to fix the issue.

zachomedia commented 2 years ago

Digging through the code I have found a few bugs in my implementation (likely unrelated to the issue), including the source of the 404 (which happens when it can't resolve a zone). It should return an error there.

I'll see if there is something I can implement as a setting to support this.

hacknisty commented 2 years ago

Sounds great, I'm available for test if needed.

hacknisty commented 2 years ago

@zachomedia So i made it works for the webhook side (my txt record is properly updated now).

But it still fail at propagation check. However if i do a dig _acme-challenge.mydomain.com txt, i get the right challenge in the TXT record.

Do you, by any chance, know what is responsible to propagation check ?

Maybe i should force a DNS server for check ?

hacknisty commented 2 years ago

Well nevermind, i found it. Cert-manager is responsible for it. And no matter if i succeed updating the challenge, DNS propagation check does not follow NS record at cert-manager level

zachomedia commented 2 years ago

@vdnclodio I wonder if setting --dns01-recursive-nameservers-only on your cert-manager deployment (from https://cert-manager.io/docs/configuration/acme/dns01/#setting-nameservers-for-dns01-self-check) would help with that, since rather than trying to locate the authoritative server it will look it up via the recursive server. Might cause a bit of a delay but it should get you past that part since I think it will skip that logic in cert-manager.

hacknisty commented 2 years ago

Unfortunately, it fails while querying the wrong SOA record. I guess i'm stuck until the issue is fixed upstream.

zachomedia commented 10 months ago

As nothing has been heard on this in a while, and since this is an upstream issue with cert-manager, I am going to close this ticket. Please don't hesitate to reach out if you have any further issues.