rollbar / terraform-provider-rollbar

Terraform provider for Rollbar
https://rollbar.com
MIT License
28 stars 13 forks source link

More graceful rate limiting handling #330

Closed bwmetcalf closed 1 year ago

bwmetcalf commented 2 years ago

When using an account access token in our terraform pipelines, we are seeing errors that we believe are due to rate limiting:

Error: 0

  with module.path.rollbar_project_token[0].data.rollbar_project.default,
  on ../../../../../../module-path/rollbar/project-token/main.tf line 1, in data "rollbar_project" "default":
   1: data "rollbar_project" "default" {

Since, at this time, rate limits for account tokens are not configurable, there is little we can do to avoid this. However, the provider should handle this more gracefully with a better error message and a retry backoff. This type of logic is present in other terraform providers.

ghost commented 2 years ago

Hey, Thanks for reporting this issue. We'll start root causing this issue and get back to you when we have updates. Does this issue block you in any matter currently?

ghost commented 2 years ago

Devs reported there's a more graceful retry mechanism in the recently released v1.8.0. Can you update the provider to the latest version and try again? Let me know if you still experience this error.

bwmetcalf commented 2 years ago

Thanks, @rollbar-bborsits! I've upgraded to v1.8.0. Let's go ahead and close this and revisit if needed.

bwmetcalf commented 2 years ago

Unfortunately, after upgrading we are seeing the same behavior. What information can I provide to help troubleshoot?

ghost commented 2 years ago

@bwmetcalf, I'm sorry to hear that. Let me check this with the devs and I will get back to you.

ghost commented 2 years ago

Can you share your .tf file somehow? I think that would be the easiest way to debug your issue. You don't need to share it publicly, we can find a confidential way to do that.

bwmetcalf commented 2 years ago

Here you go. This is part of our rollbar module that gets called for all of our microservices:

data "rollbar_project" "default" {
  name = var.project_name
}

resource "rollbar_project_access_token" "default" {
  name       = format("%s-%s-app-token", var.namespace, var.stage)
  project_id = data.rollbar_project.default.id
  scopes     = var.scopes

  lifecycle {
    ignore_changes = [
      status
    ]
  }
}

Please let me know if you need more information. We seem to always have the issue with the rollbar_project data source which we use to obtain the project id from the project name.

ghost commented 2 years ago

@bwmetcalf Thanks! This looks fine for now. I'll get back to you whenever we find something out.

pawelsz-rb commented 2 years ago

@bwmetcalf , just tested this and it looks fine on our end. How are you setting these terraform variables? How are you passing/defining them ?

bwmetcalf commented 2 years ago

Here is the code that calls the module that I previously posted:

locals {
  create_rollbar = var.enable_rollbar

  rollbar_project_token = module.rollbar_project_token[0].token

  rollbar_token_scopes = concat(var.rollbar_token_scopes, [
    "post_server_item"
  ])
}

module "rollbar_project_token" {
  source = "../../../../module-library/general/monitoring/rollbar/project-token"
  count  = var.enable_rollbar ? 1 : 0

  namespace    = module.label.namespace
  project_name = module.label.name
  scopes       = local.rollbar_token_scopes
  stage        = module.label.stage
}

namespace and stage come from the use of https://registry.terraform.io/modules/cloudposse/label/null/latest. project_name comes from a static definition in the microservice module that calls this module. rollbar_token_scopes is an empty list for most microservices; for two services we pass in ["post_client_item"] that gets concatenated as shown above. All of these variables are defined by us and do not require calling external resources to get defined, so there is no external dependencies on defining these.

ghost commented 2 years ago

@bwmetcalf Terraform can debug each command, so it's easier to find what goes south. You can set the log level with the TF_LOG environmental variable, and the output can be saved to a file via the TF_LOG_PATH variable. e.g., TF_LOG=TRACE TF_LOG_PATH=log.txt terraform apply. Can you run your command on TRACE debug level to ensure we catch everything? You can find detailed documentation here on Terraform debugging.

bwmetcalf commented 2 years ago

We have added debug logging to our terraform CI/CD pipelines and will report back with findings.

pawelsz-rb commented 2 years ago

@bwmetcalf , alternatively please look at our README debugging section, maybe it will help too.

bwmetcalf commented 2 years ago

This is the debug output from the provider when we get the error:

3:10AM DBG github.com/rollbar/terraform-provider-rollbar/client/project.go:109 > Successfully listed projects cleaned_projects=41 raw_projects=41
3:10AM DBG github.com/rollbar/terraform-provider-rollbar/client/project.go:109 > Successfully listed projects cleaned_projects=41 raw_projects=41
3:10AM DBG github.com/rollbar/terraform-provider-rollbar/client/project.go:109 > Successfully listed projects cleaned_projects=41 raw_projects=41
3:10AM ERR github.com/rollbar/terraform-provider-rollbar/client/client.go:108 >  ErrorResult={"Err":0,"Message":""} Status="502 Bad Gateway" StatusCode=502
3:10AM ERR github.com/rollbar/terraform-provider-rollbar/client/project.go:91 >  error="0 "
3:10AM DBG github.com/rollbar/terraform-provider-rollbar/client/project.go:109 > Successfully listed projects cleaned_projects=41 raw_projects=41
3:10AM DBG github.com/rollbar/terraform-provider-rollbar/client/project.go:109 > Successfully listed projects cleaned_projects=41 raw_projects=41

It appears that the rollbar API is occasionally returning a 502 which triggere this problem.

pawelsz-rb commented 2 years ago

@bwmetcalf , did you upgrade our plugin to the newest version? there should be retry mechanism which would help with errors returned from API.

bwmetcalf commented 2 years ago

Yes. We are running v1.8.0.

$ tf version|grep rollbar
+ provider registry.terraform.io/rollbar/rollbar v1.8.0
bwmetcalf commented 2 years ago

I have also opened a support ticket: 49807. It seems the provider is not gracefully handling these errors and the API is clearly having issues resulting in the 502 errors.

pawelsz-rb commented 2 years ago

@bwmetcalf , is the error happening always at the same place ?

bwmetcalf commented 2 years ago

I believe the error always occurs in the rollbar_project data source, but I am not 100% sure. I will track this and provide an update here.

bwmetcalf commented 2 years ago

Any update on this? To clarify my previous comment, the error, I believe, always occurs in the rollbar_project data source. However, we have several projects that call the module where this data source is used. The error is not specific to any one of these projects.

bwmetcalf commented 2 years ago

This occurred again and is indeed in the rollbar_project data source.

bwmetcalf commented 1 year ago

This continues to occur with greater frequency and is really impacting our productivity. Any update?

ghost commented 1 year ago

We're still working on it, but now from both sides:

github-actions[bot] commented 1 year ago

:tada: This issue has been resolved in version 1.9.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

bwmetcalf commented 1 year ago

Thanks, @pawelsz-rb !

bwmetcalf commented 1 year ago

Just wanted to provide feedback that so far this fix has resolved our issue. If we see the 502s again I'll comment here or open another issue.

ghost commented 1 year ago

I'm glad it works without any problem. Don't hesitate to reopen this ticket in case this issue occurs again.