timohirt / terraform-provider-hetznerdns

Terraform provider for Hetzner DNS
Mozilla Public License 2.0
105 stars 21 forks source link

Error: Error creating DNS record ...: Error creating record ...: API returned HTTP 422 Unprocessable Entity error with message: '409 Conflict: ' #31

Open l-with opened 2 years ago

l-with commented 2 years ago

Terraform provider for Hetzner DNS is great, thanks!

I have a bug, that seems to be a timing problem. "long run" in my "real world" is an ansible playbook.

Source-Code Tree is:

.
├── main.tf
├── modules
│   └── dns
│       └── main.tf
├── terraform.tfstate
├── terraform.tfstate.backup
└── terraform.tfvars

2 directories, 5 files

terraform version :

Terraform v1.0.10
on linux_amd64
+ provider registry.terraform.io/hashicorp/null v3.1.0
+ provider registry.terraform.io/timohirt/hetznerdns v1.1.1

./main.tf :

variable "hetzner_dns_api_token" {
  type = string
}

module "dns_ipv4" {
    source = "./modules/dns"

    hetzner_dns_api_token = var.hetzner_dns_api_token

    dns_zone_name      = "with42.de"
    dns_record_name    = "test"
    dns_record_value   = "23.88.119.215"
    dns_record_type    = "A"
    dns_record_ttl     = 60
}

module "dns_ipv6" {
    source = "./modules/dns"

    hetzner_dns_api_token = var.hetzner_dns_api_token

    dns_zone_name      = "with42.de"
    dns_record_name    = "test"
    dns_record_value   = "2a01:4f8:c17:4d56::1"
    dns_record_type    = "AAAA"
    dns_record_ttl     = 60
}

resource "null_resource" "long_run" {
  provisioner "local-exec" {
    command = "sleep 30"
  }
}

./modules/dns/main.tf :

terraform {
  required_providers {
    hetznerdns = {
      source  = "timohirt/hetznerdns"
      version = "1.1.1"
    }
  }
  required_version = ">= 0.14"
}

variable "hetzner_dns_api_token" {
  type = string
}

provider "hetznerdns" {
  apitoken = var.hetzner_dns_api_token
}

variable "dns_zone_name" {
  type = string
}

data "hetznerdns_zone" "dns_zone" {
  name = var.dns_zone_name
}

variable "dns_record_name" {
  type = string
}

variable "dns_record_type" {
  type    = string
}

variable "dns_record_value" {
  type = string
}

variable "dns_record_ttl" {
  type    = number
}

resource "hetznerdns_record" "dns_record" {
  zone_id = data.hetznerdns_zone.dns_zone.id
  name    = var.dns_record_name
  value   = var.dns_record_value
  type    = var.dns_record_type
  ttl     = var.dns_record_ttl
}

terraform apply --auto-approve

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # null_resource.long_run will be created
  + resource "null_resource" "long_run" {
      + id = (known after apply)
    }

  # module.dns_ipv4.hetznerdns_record.dns_record will be created
  + resource "hetznerdns_record" "dns_record" {
      + id      = (known after apply)
      + name    = "test"
      + ttl     = 60
      + type    = "A"
      + value   = "23.88.119.215"
      + zone_id = "Q7FSkRDBKn7s4AcbyfsCS9"
    }

  # module.dns_ipv6.hetznerdns_record.dns_record will be created
  + resource "hetznerdns_record" "dns_record" {
      + id      = (known after apply)
      + name    = "test"
      + ttl     = 60
      + type    = "AAAA"
      + value   = "2a01:4f8:c17:4d56::1"
      + zone_id = "Q7FSkRDBKn7s4AcbyfsCS9"
    }

Plan: 3 to add, 0 to change, 0 to destroy.
null_resource.long_run: Creating...
null_resource.long_run: Provisioning with 'local-exec'...
null_resource.long_run (local-exec): Executing: ["/bin/sh" "-c" "sleep 30"]
module.dns_ipv4.hetznerdns_record.dns_record: Creating...
module.dns_ipv6.hetznerdns_record.dns_record: Creating...
module.dns_ipv4.hetznerdns_record.dns_record: Creation complete after 0s [id=c804985fe43c09aedc042c2ac091c313]
null_resource.long_run: Still creating... [10s elapsed]
null_resource.long_run: Still creating... [20s elapsed]
null_resource.long_run: Still creating... [30s elapsed]
null_resource.long_run: Creation complete after 30s [id=4482767035045673271]
│
│ Error: Error creating DNS record test: Error creating record test: API returned HTTP 422 Unprocessable Entity error with message: '409 Conflict: '
│
│   with module.dns_ipv6.hetznerdns_record.dns_record,
│   on modules/dns/main.tf line 43, in resource "hetznerdns_record" "dns_record":
│   43: resource "hetznerdns_record" "dns_record" {
│
│

Sorry for the long snippets, I can also provide the output with TF_LOG=trace if necessary.

ppacher commented 2 years ago

This seems to be the same issue i encountered in #30.

timohirt commented 2 years ago

Can you provide the Terraform source files, so I can reproduce on my machine?

l-with commented 2 years ago

I do that very gladly: https://github.com/l-with/Hetzner-DNS

l-with commented 2 years ago

sorry I pushed initially with the null_resource deactivated, the error only happens with the "long run", I pushed a new version right now

timohirt commented 2 years ago

Good news :-). I was able to reproduce the error on my local machine. It seems the provider made the two request to create the DNS records almost at the same time. The first that hits the API wins while the second is rejected. This is probably an Api input validation issue..

However, after updating to v2.0.0 of this provider it works. I upgraded to Terraform provider SDK version 2 and now the resources are created one after another.

I suggest you try version 2.0.0. In the demo project you shared (thanks a lot for the effort), just edit this line.

Looking forward to hearing from you.

l-with commented 2 years ago

I am very happy that you were able to reproduce the error and thank you very much for your effort and fast response.

I updated the provider version (s. my repo) but still get the error.

The log (TF_LOG=trace) filtered by dns.hetzner.com is:

2022-01-22T07:17:54.182+0100 [INFO]  provider.terraform-provider-hetznerdns_v2.0.0: 2022/01/22 07:17:54 [DEBUG] HTTP request to API GET https://dns.hetzner.com/api/v1/zones?name=with42.de: timestamp=2022-01-22T07:17:54.181+0100
2022-01-22T07:17:54.188+0100 [INFO]  provider.terraform-provider-hetznerdns_v2.0.0: 2022/01/22 07:17:54 [DEBUG] HTTP request to API GET https://dns.hetzner.com/api/v1/zones?name=with42.de: timestamp=2022-01-22T07:17:54.188+0100
2022-01-22T07:17:54.839+0100 [INFO]  provider.terraform-provider-hetznerdns_v2.0.0: 2022/01/22 07:17:54 [DEBUG] HTTP request to API POST https://dns.hetzner.com/api/v1/records: timestamp=2022-01-22T07:17:54.839+0100
2022-01-22T07:17:54.848+0100 [INFO]  provider.terraform-provider-hetznerdns_v2.0.0: 2022/01/22 07:17:54 [DEBUG] HTTP request to API POST https://dns.hetzner.com/api/v1/records: timestamp=2022-01-22T07:17:54.848+0100
2022-01-22T07:17:55.522+0100 [INFO]  provider.terraform-provider-hetznerdns_v2.0.0: 2022/01/22 07:17:55 [DEBUG] HTTP request to API GET https://dns.hetzner.com/api/v1/records/3377e0be7dc639c9ec5459aa587a9a3f: timestamp=2022-01-22T07:17:55.521+0100

This really looks like as if the posts to the API happen at nearly the same time. This correlates to the effect that occurs time and again when I configure hetzner-dns with ansible via the API and start the playbook in null_resource's with terraform lifecycle for all hosts of a cluster. The null_resource's are cerated by terraform simultaneously. This speaks for your opinion about the API.

I am sorry that I do not know enough about implementing terraform providers to help with coding.

Is there a chance for implementing retries for posts (at worst with an ALOHA like timing) to circumvent this problem?

l-with commented 2 years ago

BTW, right now just the same error (HTTP Errror 422: Unprocessable Entity) occured with the ansible playbook (on the second null_resource), I mentioned in the above comment: grafik

l-with commented 2 years ago

I solved the problem by using https://pkg.go.dev/github.com/hashicorp/go-retryablehttp and opened a pull request.

l-with commented 2 years ago

This is the debug log with the added retries:

2022-02-01T12:36:22.190+0100 [INFO]  provider.terraform-provider-hetznerdns: 2022/02/01 12:36:22 [DEBUG] Updating resource record: timestamp=2022-02-01T12:36:22.189+0100
2022-02-01T12:36:22.190+0100 [INFO]  provider.terraform-provider-hetznerdns: 2022/02/01 12:36:22 [DEBUG] HTTP request to API POST https://dns.hetzner.com/api/v1/records: timestamp=2022-02-01T12:36:22.190+0100
2022-02-01T12:36:22.190+0100 [DEBUG] provider.terraform-provider-hetznerdns: 2022/02/01 12:36:22 [DEBUG] POST https://dns.hetzner.com/api/v1/records
2022-02-01T12:36:22.191+0100 [INFO]  provider.terraform-provider-hetznerdns: 2022/02/01 12:36:22 [DEBUG] Updating resource record: timestamp=2022-02-01T12:36:22.190+0100
2022-02-01T12:36:22.191+0100 [INFO]  provider.terraform-provider-hetznerdns: 2022/02/01 12:36:22 [DEBUG] HTTP request to API POST https://dns.hetzner.com/api/v1/records: timestamp=2022-02-01T12:36:22.190+0100
2022-02-01T12:36:22.191+0100 [DEBUG] provider.terraform-provider-hetznerdns: 2022/02/01 12:36:22 [DEBUG] POST https://dns.hetzner.com/api/v1/records
2022-02-01T12:36:22.553+0100 [DEBUG] provider.terraform-provider-hetznerdns: 2022/02/01 12:36:22 [DEBUG] POST https://dns.hetzner.com/api/v1/records (status: 422): retrying in 1s (10 left)
2022-02-01T12:36:23.119+0100 [INFO]  provider.terraform-provider-hetznerdns: 2022/02/01 12:36:23 [DEBUG] Reading resource record: timestamp=2022-02-01T12:36:23.119+0100
2022-02-01T12:36:23.119+0100 [INFO]  provider.terraform-provider-hetznerdns: 2022/02/01 12:36:23 [DEBUG] HTTP request to API GET https://dns.hetzner.com/api/v1/records/f27f297bacc987fdb680ab757a7450e2: timestamp=2022-02-01T12:36:23.119+0100
2022-02-01T12:36:23.119+0100 [DEBUG] provider.terraform-provider-hetznerdns: 2022/02/01 12:36:23 [DEBUG] GET https://dns.hetzner.com/api/v1/records/f27f297bacc987fdb680ab757a7450e2

The Errror 422 returned by the second API call is healed by a retry automatically by retryablehttp.

l-with commented 2 years ago

@timohirt Can I help and do something to fix the failed checks of the merge request? (I executed the tests locally succesfull.)

timohirt commented 2 years ago

I had to merge it into another branch in this repository. The API token in stored in a GitHub Secret which wasn't shared with your fork. It's still not working unfortunately and I have no idea why. Need to look into it.

timohirt commented 2 years ago

Ok, got it. The Retryable Client was created outside of the factory method that is used to inject a fake client in tests. Fixed that and now the ci build is green.

timohirt commented 2 years ago

@l-with please check if https://github.com/timohirt/terraform-provider-hetznerdns/releases/tag/v2.1.0 fixes this for you.

l-with commented 2 years ago

I updated to the provider version 2.1.0 in https://github.com/l-with/Hetzner-DNS and tried a few times apply and destroy without any error. I am very happy to start refactoring my deployments to use the provider to create DNS record instead using the Hetzner API with Ansible. @timohirt Thank you very much!

timohirt commented 2 years ago

Thank you for your contribution @l-with :-)

l-with commented 2 years ago

I have project creating a lot of DNS entries for a mail server and get:

Post "https://dns.hetzner.com/api/v1/records": POST https://dns.hetzner.com/api/v1/records giving up after 11 attempt(s)

Thus 10 retries is not enough, perhaps other values for the retryableClient could help, the defaults are

    // Default retry configuration
    defaultRetryWaitMin = 1 * time.Second
    defaultRetryWaitMax = 30 * time.Second
    defaultRetryMax     = 4

I would prefer to simply increase retryableClient.RetryMax to a much higer value (30), since in normal cases this does not matter.

@timohirt: Shall I create a merge request for this simple change?

timohirt commented 2 years ago

Sure, go ahead

l-with commented 2 years ago

I'm sorry: In this case I mixed up timinig problems producing error "HTTP Errror 422: Unprocessable Entity" and real semantical errors from the Hetzner DNS API also responding "HTTP Errror 422: Unprocessable Entity".

For instance "HTTP Errror 422: Unprocessable Entity" is returned for creating a CNAME record with a name already existing as A record.

Thus 10 for retryMax is high enough, perhaps too high, since it produces a waiting time of 1s (10 left) + 2s (9 left) + 4s (8 left) + 8s (7 left) + 16s (6 left) + 30s (5 left) + 30s (4 left) + 30s (3 left) + 30s (2 left) + 30s (1 left), that is in sum 3 minutes 31 seconds also for semantical errors from the Hetzner DNS API.

5 retries would be my suggestion. Please comment.