thousandeyes / terraform-provider-thousandeyes

ThousandEyes Terraform Provider
Apache License 2.0
21 stars 26 forks source link

HTTP-Server tests detaching alerts #133

Open berkeli opened 1 year ago

berkeli commented 1 year ago

Hello,

We have encountered the following issue with thousandeyes_http_server resource:

When a test is created with alerts_enabled = false and an alert is attached to it, it doesn't attach the alert. Example code below:

image

Code:

## Alert Rules
resource "thousandeyes_alert_rule" "http_server_alert_rule" {
  rule_name  = "alert"
  alert_type = "HTTP Server"

  expression                = "((errorType != \"None\"))"
  minimum_sources           = 1
  rounds_violating_out_of   = 1
  rounds_violating_required = 1

  dynamic "notifications" {
    for_each = length(var.notification_emails) > 0 ? [1] : []

    content {
      dynamic "email" {
        for_each = length(var.notification_emails) > 0 ? [1] : []
        content {
          recipient = var.notification_emails
        }
      }
    }
  }
}

resource "thousandeyes_http_server" "http_server" {
  test_name      = "sample test"
  interval       = var.test_interval_seconds
  enabled        = var.enabled
  alerts_enabled = false

  url                  = var.url
  network_measurements = var.network_measurements
  bgp_measurements     = var.bgp_measurements
  mtu_measurements     = var.mtu_measurements

  dynamic "agents" {
    for_each = local.agents_to_use
    content {
      agent_id = var.te_data.agents[agents.value]
    }
  }

  alert_rules {
    rule_id = thousandeyes_alert_rule.http_server_alert_rule.rule_id
  }
}

When you run terraform plan, it shows that alert is being created and is being attached. But when it is created the alert is not attached. If you attach the alert from the dashboard and run terraform apply again, it will again detach the alert.

When you make a change to enable the alert, it errors out because there are no alerts attached (400 bad request error)

Let me know if you have any questions

Thanks

berkeli commented 1 year ago

Same thing seems to be happening for dns tests as well:

18:15:37 │ Error: Failed call API endpoint. HTTP response code: 400. Error: 400 Bad Request
18:15:37 │ Cannot enable alerts if no alertRules are associated with the test
18:15:37 │ 
18:15:37 │   with module.thousandeyes-dns_commercial_a["****"].thousandeyes_dns_server.dns_lookup,
18:15:37 │   on ../../modules/slack/thousandeyes/dns/combined/main.tf line 86, in resource "thousandeyes_dns_server" "dns_lookup":
18:15:37 │   86: resource "thousandeyes_dns_server" "dns_lookup" {
pedro-te commented 1 year ago

Hi @berkeli ,

Thanks for raising this issue and I apologize for the inconvenience that this is causing. I have created an internal ticket to look into this. We will have a look and will get back to you as soon as possible.

Thanks, Pedro

berkeli commented 1 year ago

Hello @pedro-te,

Is there any update on this?

We just learned that modifying notifications block for alerts also causes the same issue.

pedro-te commented 1 year ago

Hello @pedro-te,

Is there any update on this?

We just learned that modifying notifications block for alerts also causes the same issue.

Hey @berkeli ,

Sorry, no update yet. Let me try to raise the priority of this.

Thanks, Pedro

pedro-te commented 1 year ago

Hey @berkeli,

We're not able to reproduce the issue you are describing. I tested using the following resources:

data "thousandeyes_agent" "lisbon" {
  agent_name = "Lisbon, Portugal"
}

resource "thousandeyes_alert_rule" "st-223" {
  rule_name = "ST-223: example rule"
  alert_type = "HTTP Server"

  expression = "((errorType != \"None\"))"
  minimum_sources = 1
  rounds_violating_out_of = 1
  rounds_violating_required = 1

  notifications {
    email {
      recipient = [
        "noreply@thousandeyes.com"
      ]
    }
  }
}

resource "thousandeyes_http_server" "st-223" {
  test_name      = "ST-223 Pedro: Example HTTP test"
  interval       = 120
  enabled        = true
  alerts_enabled = false

  url = "https://www.tesla.com"

  agents {
    agent_id = data.thousandeyes_agent.lisbon.agent_id
  }

  alert_rules {
    rule_id = thousandeyes_alert_rule.st-223.rule_id
  }
}

As you can see alerts_enabled is set to false and we're attempting to attach an alert rule to the test during its creation.

This is the terraform plan:

Terraform will perform the following actions:

  # thousandeyes_alert_rule.st-223 will be created
  + resource "thousandeyes_alert_rule" "st-223" {
      + alert_rule_id             = (known after apply)
      + alert_type                = "HTTP Server"
      + default                   = false
      + expression                = "((errorType != \"None\"))"
      + id                        = (known after apply)
      + minimum_sources           = 1
      + notify_on_clear           = true
      + rounds_violating_mode     = "ANY"
      + rounds_violating_out_of   = 1
      + rounds_violating_required = 1
      + rule_id                   = (known after apply)
      + rule_name                 = "ST-223: example rule"

      + notifications {
          + email {
              + recipient = [
                  + "noreply@thousandeyes.com",
                ]
            }
        }
    }

  # thousandeyes_http_server.st-223 will be created
  + resource "thousandeyes_http_server" "st-223" {
      + alerts_enabled         = false
      + api_links              = (known after apply)
      + auth_type              = "NONE"
      + bandwidth_measurements = false
      + content_regex          = ".*"
      + created_by             = (known after apply)
      + created_date           = (known after apply)
      + enabled                = true
      + follow_redirects       = true
      + http_target_time       = 1000
      + http_time_limit        = 5
      + http_version           = 2
      + id                     = (known after apply)
      + interval               = 120
      + live_share             = (known after apply)
      + modified_by            = (known after apply)
      + modified_date          = (known after apply)
      + network_measurements   = true
      + path_trace_mode        = "classic"
      + probe_mode             = "AUTO"
      + protocol               = "TCP"
      + saved_event            = (known after apply)
      + ssl_version            = (known after apply)
      + ssl_version_id         = 0
      + test_id                = (known after apply)
      + test_name              = "ST-223 Pedro: Example HTTP test"
      + type                   = (known after apply)
      + url                    = "https://www.tesla.com"
      + verify_certificate     = true

      + agents {
          + agent_id     = 98976
          + agent_type   = (known after apply)
          + ip_addresses = (known after apply)
        }

      + alert_rules {
          + rule_id = (known after apply)
        }
    }

Plan: 2 to add, 0 to change, 0 to destroy.

If I check the ThousandEyes App I see the following:

image

As you can see, the alert rule is attached even though alerts are disabled.

If I run terraform plan again, I get: No changes. Your infrastructure matches the configuration.

Not sure if you did something different or if I didn't exactly mimic what you did. Can you help us reproduce the issue? Can you also try the above snippet to check if it works for you?

Also, what values are you passing here?

  network_measurements = var.network_measurements
  bgp_measurements     = var.bgp_measurements
  mtu_measurements     = var.mtu_measurements

Thank you, Pedro

berkeli commented 1 year ago

Hi @pedro-te

Apologies for the delayed response. I have now run a few more tests and couldn't reproduce my original report, but I think I was able to pinpoint the issue.

The alert only detaches if the alert_rule resource is modified and the test isn't modified. From your example, I have done the following:

TF file ``` data "thousandeyes_agent" "lisbon" { agent_name = "Lisbon, Portugal" } resource "thousandeyes_alert_rule" "st-223" { rule_name = "ST-223: example rule" alert_type = "HTTP Server" expression = "((errorType != \"None\"))" minimum_sources = 1 rounds_violating_out_of = 1 rounds_violating_required = 1 notifications { email { recipient = [ "noreply@thousandeyes.com" ] } } } resource "thousandeyes_http_server" "st-223" { test_name = "ST-223 Pedro: Example HTTP test" interval = 120 enabled = true alerts_enabled = false network_measurements = true bgp_measurements = false mtu_measurements = false url = "https://www.tesla.com" agents { agent_id = data.thousandeyes_agent.lisbon.agent_id } alert_rules { rule_id = thousandeyes_alert_rule.st-223.rule_id } } ```

Terraform apply and plan works as expected as per your output, but mtu_measurements is set to true even though in terraform it is false. This is important because it makes the bug we are looking into difficult to catch.

Since I added mtu_measurements=false and by default it is created as true, this gets corrected on 2nd terraform apply:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # thousandeyes_http_server.st-223 will be updated in-place
  ~ resource "thousandeyes_http_server" "st-223" {
        id                     = "3726128"
      ~ mtu_measurements       = true -> false
        # (30 unchanged attributes hidden)

        # (2 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

After this terraform apply, mtu_measurements sticks to false.

Now, if we modify the alert_rule only (any field) it will detach from the test. As an example, I will modify notifications:

notifications {
    email {
      recipient = [
        "noreply@thousandeyes.com",
        "bhalmyradov@slack-corp.com"
      ]
    }
  }

This modifies the alert_rule and terraform plan doesn't show that test is being modified:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # thousandeyes_alert_rule.st-223 will be updated in-place
  ~ resource "thousandeyes_alert_rule" "st-223" {
        id                        = "6057192"
        # (10 unchanged attributes hidden)

      + notifications {
          + email {
              + recipient = [
                  + "bhalmyradov@slack-corp.com",
                  + "noreply@thousandeyes.com",
                ]
            }
        }
      - notifications {
          - email {
              - recipient = [
                  - "noreply@thousandeyes.com",
                ] -> null
            }
        }
    }

Plan: 0 to add, 1 to change, 0 to destroy.

If we check the dashboard now, the alert_rule is detached:

image

And subsequent terraform apply to enable alerts will fail.

I think the issue is in the test_ids field for alert_rule resource, which is setting it to null or empty list when you update alerts. We cannot provide this variable as it causes a circular dependency (we have to provide alert_rules block on tests when we enable alerts).

pedro-te commented 1 year ago

Hi @berkeli ,

Thank you for the detailed report and step by step instructions. We will have another look and get back to you as soon as possible.

Cheers, Pedro

pedro-te commented 1 year ago

Hi @berkeli ,

Just to give you a quick update, we were able to reproduce the issue now. It is indeed related to the test_ids field in the alert_rule resource. We'll work on a fix for this.

Thanks, Pedro