vancluever / terraform-provider-acme

Terraform ACME provider
https://registry.terraform.io/providers/vancluever/acme/latest
Mozilla Public License 2.0
226 stars 73 forks source link

[REG 2.23.2->2.24.1] Registration is recreated #424

Closed orgads closed 4 months ago

orgads commented 4 months ago

After upgrading the provider from 2.23 to 2.24.1, the registration becomes invalid and marked for replacement, but replace fails.

The new attributes should only apply (and trigger recreate) if account_key_pem is null.

Terraform will perform the following actions:

-/+ resource "acme_registration" "registration" {
      + account_key_algorithm   = "ECDSA" # forces replacement
      + account_key_ecdsa_curve = "P384" # forces replacement
      + account_key_rsa_bits    = 4096 # forces replacement
      ~ id                      = "https://acme-v02.api.letsencrypt.org/acme/acct/1234567890" -> (known after apply)
      ~ registration_url        = "https://acme-v02.api.letsencrypt.org/acme/acct/1234567890" -> (known after apply)
        # (2 unchanged attributes hidden)
    }

Plan: 2 to add, 0 to change, 1 to destroy.
╷
acme_registration.registration: Destroying... [id=https://acme-v02.api.letsencrypt.org/acme/acct/1234567890]
acme_registration.registration: Destruction complete after 3s
acme_registration.registration: Creating...
╷
│ Error: acme: error: 403 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-acct :: urn:ietf:params:acme:error:unauthorized :: An account with the provided public key exists but is deactivated
│
│   with acme_registration.registration,
│   on acme-cert.tf line 22, in resource "acme_registration" "registration":
│   22: resource "acme_registration" "registration" {
│
╵
vancluever commented 4 months ago

@orgads thanks for the report, I've been able to reproduce and am working on a fix.

The new attributes should only apply (and trigger recreate) if account_key_pem is null.

The issue here is actually due to a lack of a state migration along with the new default values being written to state. This is a quirk of how Terraform works - the diff works from state for the most part, and configuration values (and defaults) get written there explicitly during the plan phase. Advanced diff customization is possible to handle more complex scenarios like what you're mentioning, but is unnecessary here.

orgads commented 4 months ago

Thank you!

vancluever commented 4 months ago

@orgads no problem! Release has been queued and should be available in an hour or so.

kingnarmer commented 4 months ago

@vancluever I tried with new provider 2.24.2 and still get same error. It tries to create new registration and fails with error below.

Terraform will perform the following actions:

  #f5-application-configurations-experimental["xxxxxxxxxxx"].acme_registration.registration[0] will be created
  + resource "acme_registration" "registration" {
      + account_key_algorithm   = "ECDSA"
      + account_key_ecdsa_curve = "P384"
      + account_key_pem         = (sensitive value)
      + account_key_rsa_bits    = 4096
      + email_address           = "xxxxxxxxxxx"
      + id                      = (known after apply)
      + registration_url        = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

` Error: acme: error: 403 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-acct :: urn:ietf:params:acme:error:unauthorized :: An account with the provided public key exists but is deactivated`
vancluever commented 4 months ago

@kingnarmer unfortunately if you were affected by this before 2.24.2 (e.g., you upgraded to 2.24.0 or 2.24.1, and had a diff that resulted in your account being deactivated) you will need to re-create the registration with a new key.

You might want to try removing any external account_key_pem/tls_private_key entry if you have one and take the opportunity to have the resource manage it for you (unless you need to use an external key for any reason).

kingnarmer commented 4 months ago

@vancluever Would you please explain what I need to change in config to make this work ? Do I need to remove the state ?

vancluever commented 4 months ago

@kingnarmer removing the account_key_pem attribute from your configuration should be all that is necessary.

If you were managing the private key using tls_private_key as per the old examples in the documentation, you can like remove that as well (so long as it was not being used for anything else).

So if your configuration used to be similar to:

provider "acme" {
  server_url = "https://acme-staging-v02.api.letsencrypt.org/directory"
}

resource "tls_private_key" "private_key" {
  algorithm = "RSA"
}

resource "acme_registration" "reg" {
  account_key_pem = tls_private_key.private_key.private_key_pem
  email_address   = "nobody@example.com"
}

Change it to something like:

provider "acme" {
  server_url = "https://acme-staging-v02.api.letsencrypt.org/directory"
}

resource "acme_registration" "reg" {
  email_address   = "nobody@example.com"
}

(These are the examples from https://registry.terraform.io/providers/vancluever/acme/latest/docs/resources/registration.)

kingnarmer commented 4 months ago

@vancluever Can I pin provider version to old one to keep it working until I test changes to code. ?

vancluever commented 4 months ago

@kingnarmer yep! Just add this to your config:

terraform {
  required_providers {
    acme = {
      source = "vancluever/acme"
      version = "2.23.2"
    }
}

That will lock it to 2.23.2.

You might also want to read TF's section on locking and upgrading providers.

vancluever commented 4 months ago

@kingnarmer just doing manual testing on this side as well, and I have confirmed that you may encounter issues if you are managing certificates in the same config as registration - this is due to certificates ultimately getting their key from the tls_private_key instance or other non-changing source.

You can fix this by adding revoke_certificate_on_destroy = false to your certificates, applying this change, and then applying any fixes you want to re-create the registration (either taint your tls_private_key instance, change the account_key_pem, or remove it altogether so that it can be managed by the resource going forward). You can then remove the revoke_certificate_on_destroy entry if you still want certificates to be revoked on destroy.

Note that you may also need to temporarily comment out your acme_registration resource instance and change certificate account_key_pem entries to point to the non-changed key directly to make sure your revoke_certificate_on_destroy change goes through initially.

plemelin commented 4 months ago

I seem to be stuck, unable to delete the certificate as it always tries to replace the certs

I'm not sure how to accomplish this:

> ... change certificate account_key_pem entries to point to the non-changed key directly ...

Could you elaborate? I extracted the value from the state and wrote it to a file but that does not seem to work. Each time I set the revoke_certificate_on_destroy to false, it tries to recreate the certificates.

At this point, my main concern is that I can't even use terraform destroy to remove ONLY the acme related stuff as it fails on that 403.

But I can't wipe my whole state as there are non acme related states in there that I need to keep.

Struggling to try ti cleanup up just the certs at this point....

orgads commented 4 months ago

You can remove only the certificate from the state using terraform state rm.

vancluever commented 4 months ago

@plemelin sounds like you were able to do the state surgery necessary to unblock you with the info from @orgads, but just in case, here's a more detailed breakdown of what you'd need to do if direct state modification is not possible.

Note that here, I'm assuming that you have already:

Now, assuming a sample config of:

resource "tls_private_key" "private_key" {
  algorithm = "RSA"
}

resource "acme_registration" "reg" {
  account_key_pem = tls_private_key.private_key.private_key_pem
  email_address   = "nobody@example.com"
}

resource "acme_certificate" "certificate" {
  account_key_pem           = acme_registration.reg.account_key_pem
  common_name               = "www.example.com"
  subject_alternative_names = ["www2.example.com"]

  dns_challenge {
    provider = "route53"
  }
}

The FIRST modifications you want to make are:

resource "tls_private_key" "private_key" {
  algorithm = "RSA"
}

# resource "acme_registration" "reg" {
#   account_key_pem = tls_private_key.private_key.private_key_pem
#   email_address   = "nobody@example.com"
# }

resource "acme_certificate" "certificate" {
  account_key_pem           = tls_private_key.private_key.private_key_pem
  common_name               = "www.example.com"
  subject_alternative_names = ["www2.example.com"]

  revoke_certificate_on_destroy = false

  dns_challenge {
    provider = "route53"
  }
}

After this is done, terraform apply should only modify the revoke_certificate_on_destroy entry in state so that it can take effect on subsequent applies after this one.

The NEXT modification you want to make depends on whether or not you want to keep the tls_private_key entry. For brevity, we cover the case in which you want to keep it (i.e., restore operations with minimal changes).

Restore the config to what it was, but KEEP the revoke_certificate_on_destroy = false entry in acme_certificate.certificate.

resource "tls_private_key" "private_key" {
  algorithm = "RSA"
}

resource "acme_registration" "reg" {
  account_key_pem = tls_private_key.private_key.private_key_pem
  email_address   = "nobody@example.com"
}

resource "acme_certificate" "certificate" {
  account_key_pem           = acme_registration.reg.account_key_pem
  common_name               = "www.example.com"
  subject_alternative_names = ["www2.example.com"]

  revoke_certificate_on_destroy = false

  dns_challenge {
    provider = "route53"
  }
}

After doing this, run terraform taint on tls_private_key.private_key. This will trigger the re-creation of the private key, unblocking acme_registration.reg. terraform apply will now succeed in re-creating all resources in this configuration. Certificates will not be revoked, but rather just dropped on the floor.

[!NOTE]

If you want to move towards the new managed key setup instead, delete the tls_private_key.private_key instance, and remove the account_key_pem entry from acme_registration.reg. In this case, terraform taint is not necessary, running terraform apply is all that is needed.

This should correct everything and get you back up and running.

After this is done, you can then remove the revoke_certificate_on_destroy = false entry if you don't want certificates to be simply dropped on resource destroy.