traefik / traefik

The Cloud Native Application Proxy
https://traefik.io
MIT License
51.15k stars 5.09k forks source link

ACME Challenge Failure Against NS1 with image tag v2.11.0 #10450

Closed JerboaGobi closed 7 months ago

JerboaGobi commented 8 months ago

Welcome!

What did you do?

A few days ago I updated to the latest release, v2.11.0 Yesterday, after performing a revocation of the certificate, due to key compromise, I cleared the acme.json file to force Traefik to create a new private key and to issue new certificates.

What did you see instead?

The logs then detailed put requests against NS1 for the _acme-challenge TXT records would then fail with http 400 codes. I rolled back to image v.2.10.7. No other configuration file changes were made. The PUT requests succeed on v2.10.7 and certificates are issued as expected. Also, tested on v3.0 and the issue is present there as well.

What version of Traefik are you using?

Version: 2.11.0 Codename: cheddar Go version: go1.22.0 Built: 2024-02-12T15:26:45Z OS/Arch: linux/amd64

What is your environment & configuration?

traefik:
    image: traefik:v2.11.0
    command:
    - --global.checknewversion=false
    - --global.sendanonymoususage=false
    - --log=true
    - --log.level=debug

    - --accesslog=true
    - --accesslog.filepath=/etc/traefik/logs/access.log
    - --accesslog.filters.statuscodes=100-199,200-203,205-299,300-399,400-499,500-599
    - --accesslog.filters.retryattempts
    - --accesslog.filters.minduration=10ms

    - --entrypoints.http.address=:80
    - --entrypoints.https.address=:443

    - --entrypoints.http.http.redirections.entryPoint.to=https

    - --entryPoints.http.transport.lifeCycle.requestAcceptGraceTimeout=30
    - --entryPoints.https.transport.lifeCycle.requestAcceptGraceTimeout=30

    - --providers.docker.endpoint=tcp://172.129.30.6:2375
    - --providers.docker.exposedbydefault=false
    - --providers.docker.watch=true
    - --providers.docker.constraints=Label(`traefik-internal.instance.enable`,`true`)
    - --providers.file.directory=/etc/traefik/rules
    - --providers.file.watch=true
    - --api=true

    - --certificatesresolvers.letsencrypt.acme.email=${CF_ACME_EMAIL}
    - --certificatesresolvers.letsencrypt.acme.storage=/etc/traefik/acme/acme.json
    - --certificatesresolvers.letsencrypt.acme.dnschallenge=true
    - --certificatesresolvers.letsencrypt.acme.dnschallenge.provider=ns1
    - --certificatesresolvers.letsencrypt.acme.dnschallenge.delaybeforecheck=60
    - --certificatesresolvers.letsencrypt.acme.dnschallenge.resolvers=172.64.36.1:53,172.64.36.2:53
  labels:
    - traefik-internal.instance.enable=true
    - traefik.enable=true

    - traefik.http.routers.traefik-internal.entrypoints=https
    - traefik.http.routers.traefik-internal.rule=Host(`${HOST_NAME}`)
    - traefik.http.routers.traefik-internal.tls=true
    - traefik.http.routers.traefik-internal.service=api@internal

    - traefik.http.routers.traefik-internal.tls.certresolver=letsencrypt
    - traefik.http.routers.traefik-internal.tls.domains[0].main=internal.redacted.com
    - traefik.http.routers.traefik-internal.tls.domains[0].sans=internal.redacted.com, *.internal.redacted.com

    - traefik.tls.stores.default.defaultgeneratedcert.resolver=letsencrypt
    - traefik.tls.stores.default.defaultgeneratedcert.domain.main=internal.redacted.com
    - traefik.tls.stores.default.defaultgeneratedcert.domain.sans=internal.redacted.com, *.internal.redacted.com

    - traefik.http.services.traefik-internal.loadbalancer.server.port=1337

If applicable, please paste the log output in DEBUG level

time="2024-02-16T03:44:38Z" level=error msg="Unable to obtain ACME certificate for domain \"*.internal.redacted.com,internal.redacted.com,internal.redacted.com\"" error="unable to generate a certificate for the domains [*.internal.redacted.com internal.redacted.com internal.redacted.com]: error: one or more domains had a problem:\n[*.internal.redacted.com] [*.internal.redacted.com] acme: error presenting token: ns1: failed to create record [zone: \"internal.redacted.com\", fqdn: \"_acme-challenge.internal.redacted.com.\"]: PUT https://api.nsone.net/v1/zones/internal.redacted.com/_acme-challenge.internal.redacted.com/TXT: 400 Input validation failed (Value None for field '<obj>.tags' is not of type object)\n[internal.redacted.com] [internal.redacted.com] acme: error presenting token: ns1: failed to create record [zone: \"internal.redacted.com\", fqdn: \"_acme-challenge.internal.redacted.com.\"]: PUT https://api.nsone.net/v1/zones/internal.redacted.com/_acme-challenge.internal.redacted.com/TXT: 400 Input validation failed (Value None for field '<obj>.tags' is not of type object)\n" ACME CA="https://acme-staging-v02.api.letsencrypt.org/directory" tlsStoreName=default providerName=letsencrypt.acme
ldez commented 8 months ago

Hello,

It's related to a breaking change introduced by NS1: https://github.com/ns1/ns1-go/pull/220 This was introduced inside a bugfix release of their API client, which is not semver compliant and without any doc related to this change.

I will fix the problem inside lego and then update lego inside Traefik.

sooslaca commented 8 months ago

I found this issue, I also had acme issues with 2.11.0 Namely it's not starting to renew expired certificates. No logs either on traefik or on the acme server (StepCA). Thought sharing here. Reverted back to 2.10.7, restarted container and renewal started immediately.

traefiker commented 7 months ago

Closed by #10508.

stickeraugust commented 7 months ago

we are still having this issue with version 2.11.0. anyone else?

ldez commented 7 months ago

The fix has been merged after v2.11.0, it will be available inside v2.11.1.