opnsense / plugins

OPNsense plugin collection
https://opnsense.org/
BSD 2-Clause "Simplified" License
836 stars 626 forks source link

os-acme-client | Cloudflare - domain validation failed (dns01) #3897

Open keithpl opened 5 months ago

keithpl commented 5 months ago

Important notices Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug After upgrading to OPNsense 24.1.5_3, the ACME client is no longer able to create TXT records using the Cloudflare DNS-01 challenge type.

To Reproduce Steps to reproduce the behavior:

  1. Go to Services
  2. Click on ACME Client > Certificates
  3. Switch to Certificates
  4. Last ACME Status > validation vailed

Expected behavior validation ok

Relevant log files ACME Log: interestingly, the acme log is empty and all outputs were recorded to the system log.

System Log:

2024-04-07T15:28:46-04:00 opnsense 81230 - [meta sequenceId="6"] AcmeClient: certificate must be issued/renewed: <redacted_domain>
2024-04-07T15:28:46-04:00 opnsense 81230 - [meta sequenceId="7"] AcmeClient: issue certificate: <redacted_domain>
2024-04-07T15:28:46-04:00 opnsense 81230 - [meta sequenceId="8"] AcmeClient: using CA: letsencrypt
2024-04-07T15:28:46-04:00 opnsense 81230 - [meta sequenceId="9"] AcmeClient: account is registered: <redacted_letsencrypt_account>
2024-04-07T15:28:46-04:00 opnsense 81230 - [meta sequenceId="10"] AcmeClient: using challenge type: <redacted_dns_challenge_name>
2024-04-07T15:28:46-04:00 opnsense 81230 - [meta sequenceId="11"] AcmeClient: running acme.sh command: /usr/local/sbin/acme.sh --issue --syslog 7 --debug --server 'letsencrypt' --dns 'dns_cf' --home '/var/etc/acme-client/home' --cert-home '/var/etc/acme-client/cert-home/<redacted_cert_path>' --certpath '/var/etc/acme-client/certs/<redacted_cert_path>/cert.pem' --keypath '/var/etc/acme-client/keys/<redacted_cert_path>/private.key' --capath '/var/etc/acme-client/certs/<redacted_cert_path>/chain.pem' --fullchainpath '/var/etc/acme-client/certs/<redacted_cert_path>/fullchain.pem' --domain '<redacted_domain>' --domain '<redacted_domain>' --days '1'   --keylength '4096' --accountconf '/var/etc/acme-client/accounts/<redacted_account>_prod/account.conf'
2024-04-07T15:28:52-04:00 acme.sh 35127 - [meta sequenceId="12"] [Sun Apr  7 15:28:52 EDT 2024] Add txt record error.
2024-04-07T15:28:52-04:00 acme.sh 37611 - [meta sequenceId="13"] [Sun Apr  7 15:28:52 EDT 2024] Error add txt for domain:_acme-challenge.<redacted_domain>
2024-04-07T15:28:52-04:00 acme.sh 41770 - [meta sequenceId="14"] [Sun Apr  7 15:28:52 EDT 2024] Please add '--debug' or '--log' to check more details.
2024-04-07T15:28:52-04:00 acme.sh 44795 - [meta sequenceId="15"] [Sun Apr  7 15:28:52 EDT 2024] See: https://github.com/acmesh-official/acme.sh/wiki/How-to-debug-acme.sh
2024-04-07T15:28:54-04:00 opnsense 81230 - [meta sequenceId="16"] /usr/local/opnsense/scripts/OPNsense/AcmeClient/lecert.php: AcmeClient: The shell command returned exit code '1': '/usr/local/sbin/acme.sh --issue --syslog 7 --debug --server 'letsencrypt' --dns 'dns_cf' --home '/var/etc/acme-client/home' --cert-home '/var/etc/acme-client/cert-home/<redacted_cert_path>' --certpath '/var/etc/acme-client/certs/<redacted_cert_path>/cert.pem' --keypath '/var/etc/acme-client/keys/<redacted_cert_path>/private.key' --capath '/var/etc/acme-client/certs/<redacted_cert_path>/chain.pem' --fullchainpath '/var/etc/acme-client/certs/<redacted_cert_path>/fullchain.pem' --domain '<redacted_domain>' --domain '<redacted_domain>' --days '1'   --keylength '4096' --accountconf '/var/etc/acme-client/accounts/<redacted_account>_prod/account.conf''
2024-04-07T15:28:54-04:00 opnsense 81230 - [meta sequenceId="17"] AcmeClient: domain validation failed (dns01)
2024-04-07T15:28:54-04:00 opnsense 81230 - [meta sequenceId="18"] AcmeClient: validation for certificate failed: <redacted_domain>

Additional context

3871 details the same problem, having to manually create the TXT record is not a solution as it defeats the point of ACME. The issue is the ACME client is failing to create the TXT record for validation, it seems.

If I run the command directly, I get additional output stating the dns_cf hook cannot be found:

[Sun Apr  7 15:40:51 EDT 2024] Can not find dns api hook for dns_cf

Environment OPNsense 24.1.5_3-amd64 FreeBSD 13.2-RELEASE-p11

jwaes commented 5 months ago

Same issue here ... It just stopped working.

Makss39 commented 5 months ago

same issue with OVH

jwaes commented 5 months ago

Ok ... i figured out how to fix it. So i want to share it with you.

While this worked without in january, now that the time for renewal is here, so something has changed

but the key error in the logs was

2024-04-13T07:31:11 acme.sh [Sat Apr 13 07:31:11 UTC 2024] Invalid status, router.MYDOMAIN.XXX:Verify error detail:DNS problem: SERVFAIL looking up CAA for router.MYDOMAIN.XXX - the domain's nameservers may be malfunctioning

So i read into CAA

https://developers.cloudflare.com/ssl/edge-certificates/caa-records/

and adding this

CAA router 0 issue letsencrypt.org

in cloudflare solved the issue upon the next forced re-issue of my certificate.

As @Makss39 had the issue also with OVH, i guess the change is at the letsencrypt side, where they must now be enforcing the CAA now.

Anyway. Hope it helps for others.

keithpl commented 5 months ago

@jwaes that is not necessary when I use the acme.sh script on a separate machine:

❯ export CF_Zone_ID="<zone_id>"
❯ export CF_Token="<token>"
❯ acme.sh --issue -d <my_domain> --dns dns_cf --server letsencrypt
[Sat Apr 13 10:12:10 AM EDT 2024] Using CA: https://acme-v02.api.letsencrypt.org/directory
[Sat Apr 13 10:12:10 AM EDT 2024] Single domain='<my_domain>'
[Sat Apr 13 10:12:10 AM EDT 2024] Getting domain auth token for each domain
[Sat Apr 13 10:12:11 AM EDT 2024] Getting webroot for domain='<my_domain>'
[Sat Apr 13 10:12:12 AM EDT 2024] Adding txt value: <record> for domain:  _acme-challenge.<my_domain>
[Sat Apr 13 10:12:13 AM EDT 2024] Adding record
[Sat Apr 13 10:12:13 AM EDT 2024] Added, OK
[Sat Apr 13 10:12:13 AM EDT 2024] The txt record is added: Success.
[Sat Apr 13 10:12:13 AM EDT 2024] Let's check each DNS record now. Sleep 20 seconds first.
[Sat Apr 13 10:12:34 AM EDT 2024] You can use '--dnssleep' to disable public dns checks.
[Sat Apr 13 10:12:34 AM EDT 2024] See: https://github.com/acmesh-official/acme.sh/wiki/dnscheck
[Sat Apr 13 10:12:34 AM EDT 2024] Checking <my_domain> for _acme-challenge.<my_domain>
[Sat Apr 13 10:12:35 AM EDT 2024] Domain <my_domain> '_acme-challenge.<my_domain>' success.
[Sat Apr 13 10:12:35 AM EDT 2024] All success, let's return
[Sat Apr 13 10:12:35 AM EDT 2024] Verifying: <my_domain>
[Sat Apr 13 10:12:35 AM EDT 2024] Pending, The CA is processing your order, please just wait. (1/30)
[Sat Apr 13 10:12:39 AM EDT 2024] Success
[Sat Apr 13 10:12:39 AM EDT 2024] Removing DNS records.
[Sat Apr 13 10:12:39 AM EDT 2024] Removing txt: <record> for domain: _acme-challenge.<my_domain>
[Sat Apr 13 10:12:40 AM EDT 2024] Removed: Success
[Sat Apr 13 10:12:40 AM EDT 2024] Verify finished, start to sign.
[Sat Apr 13 10:12:40 AM EDT 2024] Lets finalize the order.
[Sat Apr 13 10:12:40 AM EDT 2024] Le_OrderFinalize='https://acme-v02.api.letsencrypt.org/acme/finalize/<path>'
[Sat Apr 13 10:12:41 AM EDT 2024] Downloading cert.
[Sat Apr 13 10:12:41 AM EDT 2024] Le_LinkCert='https://acme-v02.api.letsencrypt.org/acme/cert/<path>'
[Sat Apr 13 10:12:42 AM EDT 2024] Cert success.
-----BEGIN CERTIFICATE-----
<blah>
-----END CERTIFICATE-----
[Sat Apr 13 10:12:42 AM EDT 2024] Your cert is in: $HOME/.acme.sh/<my_domain>_ecc/<my_domain>.cer
[Sat Apr 13 10:12:42 AM EDT 2024] Your cert key is in: $HOME/.acme.sh/<my_domain>_ecc/<my_domain>.key
[Sat Apr 13 10:12:42 AM EDT 2024] The intermediate CA cert is in: $HOME/.acme.sh/<my_domain>_ecc/ca.cer
[Sat Apr 13 10:12:42 AM EDT 2024] And the full chain certs is there: $HOME/.acme.sh/<my_domain>_ecc/fullchain.cer
keithpl commented 5 months ago

If it's helpful, this is the version of acme.sh that I tested with:

❯ acme.sh --version
https://github.com/acmesh-official/acme.sh
v3.0.7
keithpl commented 5 months ago

Same experience with opnsense 24.1.6 and os-acme-client 4.2.

mkerost commented 5 months ago

Same issue trying to use Cloudflare DNS-01. I get same Can not find dns api hook for dns_cf

OPNsense 24.1.6-amd64 ACME 4.2

EDIT: I tried some debugging; these are the variables acme.sh uses when running the _findHook function in acme.sh to search for the dns_cf.sh file, including the values they were set at when I ran /var/local/sbin/acme.sh:

$_hookdomain = opnsense.********.com
$_hookcat = dnsapi
$_hookname = dns_cf
$_SCRIPT_HOME = /usr/local/sbin
$LE_WORKING_DIR = /var/etc/acme-client/home

If it can't find the file you get the error message Can not find dns api hook for dns_cf. Searches are made using various combinations of sub folder and filenames including $ _hookdomain, $_hookcat, $_hookname, but all assume either $_SCRIPT_HOME or $LE_WORKING_DIR as the base folder.

When I look for dns_cf.sh, it shows they live here:

root@OPNsense:/usr/local/sbin # find / -name "dns_cf*"
/usr/local/share/examples/acme.sh/dnsapi/dns_cf.sh
/root/.acme.sh/dnsapi/dns_cf.sh

So maybe something to do with $_SCRIPT_HOME and $LE_WORKING_DIR not being set properly.

Maybe someone more knowledgeable can help out.

Here's the full _findHook function from https://github.com/acmesh-official/acme.sh/blob/master/acme.sh

_findHook() {
  _hookdomain="$1"
  _hookcat="$2"
  _hookname="$3"

  if [ -f "$_SCRIPT_HOME/$_hookcat/$_hookname" ]; then
    d_api="$_SCRIPT_HOME/$_hookcat/$_hookname"
  elif [ -f "$_SCRIPT_HOME/$_hookcat/$_hookname.sh" ]; then
    d_api="$_SCRIPT_HOME/$_hookcat/$_hookname.sh"
  elif [ "$_hookdomain" ] && [ -f "$LE_WORKING_DIR/$_hookdomain/$_hookname" ]; then
    d_api="$LE_WORKING_DIR/$_hookdomain/$_hookname"
  elif [ "$_hookdomain" ] && [ -f "$LE_WORKING_DIR/$_hookdomain/$_hookname.sh" ]; then
    d_api="$LE_WORKING_DIR/$_hookdomain/$_hookname.sh"
  elif [ -f "$LE_WORKING_DIR/$_hookname" ]; then
    d_api="$LE_WORKING_DIR/$_hookname"
  elif [ -f "$LE_WORKING_DIR/$_hookname.sh" ]; then
    d_api="$LE_WORKING_DIR/$_hookname.sh"
  elif [ -f "$LE_WORKING_DIR/$_hookcat/$_hookname" ]; then
    d_api="$LE_WORKING_DIR/$_hookcat/$_hookname"
  elif [ -f "$LE_WORKING_DIR/$_hookcat/$_hookname.sh" ]; then
    d_api="$LE_WORKING_DIR/$_hookcat/$_hookname.sh"
  fi

  printf "%s" "$d_api"
}
mkerost commented 5 months ago

HACKY FIX. So based on my previous post, I did the following work around and symbolically linked to the dnsapi folder from LE working directory:

ln -s /root/.acme.sh/dnsapi /var/etc/acme-client/home

I then ran a cert update and this fixed the problem. Cert successfully issued!

BUG: Through this whole process, I noticed that setting ACME log to debug doesn't work properly. There is important info that doesn't make it into syslog, specifically the exact error message from cloudflare if verification fails. This seems to down to opnsense not passing the right --syslog number when I set logging to "debug 3". The log shows that opnsense passed --syslog 7 but 7 is only debug level 1. it should be --syslog 9 for debug 3, --syslog 8 for debug 2, and --syslog 7 for debug 1. I will post this as a separate issue.

andrewmooreio commented 4 months ago

Just adding that I'm also seeing this issue, although it wasn't just with the upgrade to OPNsense 24.1.5_3.

I had the issue on os-acme-client 3.5.0 (OPNsense 23.7.12_5) so upgraded to os-acme-client 4.1.0 but the issue persisted.

If I run the command manually I get the same: Can not find dns api hook for: dns_cf

Edit: Updated to OPNsense 24.1.7_4 and os-acme-client 4.3, same issue present.

jzcad1828 commented 4 months ago

HACKY FIX. So based on my previous post, I did the following work around and symbolically linked to the dnsapi folder from LE working directory:

ln -s /root/.acme.sh/dnsapi /var/etc/acme-client/home

I then ran a cert update and this fixed the problem. Cert successfully issued!

Awesome! Thanks for finding this "hacky fix," it seems to have resolved the "Can not find dns api hook for dns_cf" error I was seeing. Now to wait an hour to clear the failed validation attempts limit lol!

os-acme-client (installed) | 4.3 | 777KiB | 3 | OPNsense | ACME Client

acme.sh --version https://github.com/acmesh-official/acme.sh v3.0.7

Graffics commented 4 months ago

Same issue for all my OPNsense installs which are kept updated in lock-step Previous acme call was done on OPNsense 23.7.9 without issues, given the roughly 2 month time between renews, it's not possible for me to say which version this started occurring at.

Currently running:

OPNsense 24.1.5_3-amd64

os-acme-client (installed) | 4.1

acme.sh --version https://github.com/acmesh-official/acme.sh v3.0.7


So maybe something to do with $_SCRIPT_HOME and $LE_WORKING_DIR not being set properly.

Agreed

Given the acme.sh _findhook() looks for the script at $LE_WORKING_DIR and it's not found unless symlinking the /root/.acme.sh/dnsapi/ dir to the $LE_WORKING_DIR

Looking further into acme.sh setting $LE_WORKING_DIR it's done in the main func _process() if specified with the --home flag

    --home)
      export LE_WORKING_DIR="$(echo "$2" | sed 's|/$||')"
      shift
      ;;

Which OPNsense calls acme.sh with the following --home '/var/etc/acme-client/home'

And if not specified with the --home flag the func __initHome() sets the value of $LE_WORKING_DIR to $DEFAULT_INSTALL_HOME

__initHome() {
  if [ -z "$_SCRIPT_HOME" ]; then
    if _exists readlink && _exists dirname; then
      _debug "Lets find script dir."
      _debug "_SCRIPT_" "$_SCRIPT_"
      _script="$(_readlink "$_SCRIPT_")"
      _debug "_script" "$_script"
      _script_home="$(dirname "$_script")"
      _debug "_script_home" "$_script_home"
      if [ -d "$_script_home" ]; then
        export _SCRIPT_HOME="$_script_home"
      else
        _err "It seems the script home is not correct:$_script_home"
      fi
    fi
  fi

  if [ -z "$LE_WORKING_DIR" ]; then
    _debug "Using default home:$DEFAULT_INSTALL_HOME"
    LE_WORKING_DIR="$DEFAULT_INSTALL_HOME"
  fi
  export LE_WORKING_DIR

  if [ -z "$LE_CONFIG_HOME" ]; then
    LE_CONFIG_HOME="$LE_WORKING_DIR"
  fi
  _debug "Using config home:$LE_CONFIG_HOME"
  export LE_CONFIG_HOME

  _DEFAULT_ACCOUNT_CONF_PATH="$LE_CONFIG_HOME/account.conf"

  if [ -z "$ACCOUNT_CONF_PATH" ]; then
    if [ -f "$_DEFAULT_ACCOUNT_CONF_PATH" ]; then
      . "$_DEFAULT_ACCOUNT_CONF_PATH"
    fi
  fi

  if [ -z "$ACCOUNT_CONF_PATH" ]; then
    ACCOUNT_CONF_PATH="$_DEFAULT_ACCOUNT_CONF_PATH"
  fi
  _debug3 ACCOUNT_CONF_PATH "$ACCOUNT_CONF_PATH"
  DEFAULT_LOG_FILE="$LE_CONFIG_HOME/$PROJECT_NAME.log"

  DEFAULT_CA_HOME="$LE_CONFIG_HOME/ca"

  if [ -z "$LE_TEMP_DIR" ]; then
    LE_TEMP_DIR="$LE_CONFIG_HOME/tmp"
  fi
}

$DEFAULT_INSTALL_HOME is set in the first few lines of acme.sh

PROJECT_NAME="acme.sh"

PROJECT_ENTRY="acme.sh"

PROJECT="https://github.com/acmesh-official/$PROJECT_NAME"

DEFAULT_INSTALL_HOME="$HOME/.$PROJECT_NAME"

Seems as though the initial install places the dns scripts at the $DEFAULT_INSTALL_HOME location, then subsequent calls specify the --home flag with a differing location?

polarstack commented 3 months ago

Hey folks

I had (almost) the same Issue - stopped working at some point in time after upgrade from OPNsense 23.x.x to 24.x.x with the same error messages as OP. Using ACME Client with Cloudflare, I've tried several things like changing to Global API key instead of specific CF tokens or adding CAA entries for subdomains (my hostname is router.subdomain.domain.tld) all without success.

To specify the "almost" in the first sentence: I was confused in every failed run about this log entries:

[Fri Jun 28 14:59:55 CEST 2024] Adding record
[Fri Jun 28 14:59:56 CEST 2024] Added, OK
[Fri Jun 28 14:59:56 CEST 2024] The txt record is added: Success.
...
omitted
...
[Fri Jun 28 14:59:57 CEST 2024] Adding record
[Fri Jun 28 14:59:58 CEST 2024] Add txt record error.
[Fri Jun 28 14:59:58 CEST 2024] Error add txt for domain:_acme-challenge.router.<redacted_domain>

So why is it adding the TXT record two times and why does it fail the second time? After enabling debug log I stumbled across this log entry: [Fri Jun 28 14:59:51 CEST 2024] Multi domain='DNS:router.<redacted_domain>,DNS:router.<redacted_domain>'

After carefully reading the Description in "Edit certificate" I saw this explanation: Common Name (CN) and first Alt Name (subjectAltName) for this certificate.

So basically what is added in the "Common Name" field will automatically also be the first "Alt Name". In my config I've had the Common Name manually copy-pasted from the first field. This was working since I remember installing OPNsense several years ago, but stopped after updating to 24.x.x

So I removed the Alt Name, did "Reset ACME Client" and then force renewed the certificate et voilà - it was working as expected: image

So if you have the same issue, shortly check your Common & Alt Name in the GUI and if it has the same values, remove the one in "Alt Names" as it is added automatically anyway as first SAN.

For completeness, I've never had this issue here, so nothing to add: Can not find dns api hook for: dns_cf

And here are some informations about my setup:

OPNsense: 24.1.9_4
os-acme-client: 4.3
acme.sh: v3.0.7
Primary DNS: 1.1.1.1

Hope this helps, good luck and best regards

MosheL commented 2 months ago

Same issue with godaddy, OPNsense 24.1.10_3.

I tested also godaddy, the _acme-challenge is in the correct TXT DNS place.

acme2 log:

2024-07-28T15:40:27 | acme.sh | [Sun Jul 28 15:40:27 IDT 2024] Removed: Success
-- | -- | --
2024-07-28T15:40:27 | acme.sh | [Sun Jul 28 15:40:27 IDT 2024] The record does not exist, skip
2024-07-28T15:40:27 | acme.sh | [Sun Jul 28 15:40:27 IDT 2024] ret='0'
2024-07-28T15:40:26 | acme.sh | [Sun Jul 28 15:40:26 IDT 2024] _CURL='curl --silent --dump-header /var/etc/acme-client/home/http.header -L --trace-ascii /tmp/tmp.1zjSoxi0 -g '
2024-07-28T15:40:26 | acme.sh | [Sun Jul 28 15:40:26 IDT 2024] timeout=

system log:

2024-07-28T15:40:27 | opnsense | AcmeClient: validation for certificate failed: (domain)
-- | -- | --
2024-07-28T15:40:27 | opnsense | AcmeClient: domain validation failed (dns01)
2024-07-28T15:40:27 | opnsense | /usr/local/opnsense/scripts/OPNsense/AcmeClient/lecert.php: AcmeClient: The shell command returned exit code '1': '/usr/local/sbin/acme.sh - ......  --accountconf '/var/etc/acme-client/accounts/635252eee095d6.55260251_prod/account.conf'
2024-07-28T15:39:56 | opnsense | AcmeClient: using challenge type: dns
2024-07-28T15:39:56 | opnsense | AcmeClient: account is registered: acme
2024-07-28T15:39:56 | opnsense | AcmeClient: using CA: letsencrypt
Monviech commented 2 months ago

@MosheL godaddy has other issues: https://github.com/opnsense/plugins/issues/4041