Closed bwmetcalf closed 1 year ago
Hey, Thanks for reporting this issue. We'll start root causing this issue and get back to you when we have updates. Does this issue block you in any matter currently?
Devs reported there's a more graceful retry mechanism in the recently released v1.8.0. Can you update the provider to the latest version and try again? Let me know if you still experience this error.
Thanks, @rollbar-bborsits! I've upgraded to v1.8.0. Let's go ahead and close this and revisit if needed.
Unfortunately, after upgrading we are seeing the same behavior. What information can I provide to help troubleshoot?
@bwmetcalf, I'm sorry to hear that. Let me check this with the devs and I will get back to you.
Can you share your .tf file somehow? I think that would be the easiest way to debug your issue. You don't need to share it publicly, we can find a confidential way to do that.
Here you go. This is part of our rollbar module that gets called for all of our microservices:
data "rollbar_project" "default" {
name = var.project_name
}
resource "rollbar_project_access_token" "default" {
name = format("%s-%s-app-token", var.namespace, var.stage)
project_id = data.rollbar_project.default.id
scopes = var.scopes
lifecycle {
ignore_changes = [
status
]
}
}
Please let me know if you need more information. We seem to always have the issue with the rollbar_project
data source which we use to obtain the project id from the project name.
@bwmetcalf Thanks! This looks fine for now. I'll get back to you whenever we find something out.
@bwmetcalf , just tested this and it looks fine on our end. How are you setting these terraform variables? How are you passing/defining them ?
Here is the code that calls the module that I previously posted:
locals {
create_rollbar = var.enable_rollbar
rollbar_project_token = module.rollbar_project_token[0].token
rollbar_token_scopes = concat(var.rollbar_token_scopes, [
"post_server_item"
])
}
module "rollbar_project_token" {
source = "../../../../module-library/general/monitoring/rollbar/project-token"
count = var.enable_rollbar ? 1 : 0
namespace = module.label.namespace
project_name = module.label.name
scopes = local.rollbar_token_scopes
stage = module.label.stage
}
namespace
and stage
come from the use of https://registry.terraform.io/modules/cloudposse/label/null/latest. project_name
comes from a static definition in the microservice module that calls this module. rollbar_token_scopes
is an empty list for most microservices; for two services we pass in ["post_client_item"]
that gets concatenated as shown above. All of these variables are defined by us and do not require calling external resources to get defined, so there is no external dependencies on defining these.
@bwmetcalf Terraform can debug each command, so it's easier to find what goes south. You can set the log level with the TF_LOG
environmental variable, and the output can be saved to a file via the TF_LOG_PATH
variable. e.g., TF_LOG=TRACE TF_LOG_PATH=log.txt terraform apply
. Can you run your command on TRACE
debug level to ensure we catch everything? You can find detailed documentation here on Terraform debugging.
We have added debug logging to our terraform CI/CD pipelines and will report back with findings.
@bwmetcalf , alternatively please look at our README debugging section, maybe it will help too.
This is the debug output from the provider when we get the error:
3:10AM DBG github.com/rollbar/terraform-provider-rollbar/client/project.go:109 > Successfully listed projects cleaned_projects=41 raw_projects=41
3:10AM DBG github.com/rollbar/terraform-provider-rollbar/client/project.go:109 > Successfully listed projects cleaned_projects=41 raw_projects=41
3:10AM DBG github.com/rollbar/terraform-provider-rollbar/client/project.go:109 > Successfully listed projects cleaned_projects=41 raw_projects=41
3:10AM ERR github.com/rollbar/terraform-provider-rollbar/client/client.go:108 > ErrorResult={"Err":0,"Message":""} Status="502 Bad Gateway" StatusCode=502
3:10AM ERR github.com/rollbar/terraform-provider-rollbar/client/project.go:91 > error="0 "
3:10AM DBG github.com/rollbar/terraform-provider-rollbar/client/project.go:109 > Successfully listed projects cleaned_projects=41 raw_projects=41
3:10AM DBG github.com/rollbar/terraform-provider-rollbar/client/project.go:109 > Successfully listed projects cleaned_projects=41 raw_projects=41
It appears that the rollbar API is occasionally returning a 502 which triggere this problem.
@bwmetcalf , did you upgrade our plugin to the newest version? there should be retry mechanism which would help with errors returned from API.
Yes. We are running v1.8.0.
$ tf version|grep rollbar
+ provider registry.terraform.io/rollbar/rollbar v1.8.0
I have also opened a support ticket: 49807. It seems the provider is not gracefully handling these errors and the API is clearly having issues resulting in the 502 errors.
@bwmetcalf , is the error happening always at the same place ?
I believe the error always occurs in the rollbar_project
data source, but I am not 100% sure. I will track this and provide an update here.
Any update on this? To clarify my previous comment, the error, I believe, always occurs in the rollbar_project
data source. However, we have several projects that call the module where this data source is used. The error is not specific to any one of these projects.
This occurred again and is indeed in the rollbar_project
data source.
This continues to occur with greater frequency and is really impacting our productivity. Any update?
We're still working on it, but now from both sides:
:tada: This issue has been resolved in version 1.9.0 :tada:
The release is available on:
v1.9.0
Your semantic-release bot :package::rocket:
Thanks, @pawelsz-rb !
Just wanted to provide feedback that so far this fix has resolved our issue. If we see the 502s again I'll comment here or open another issue.
I'm glad it works without any problem. Don't hesitate to reopen this ticket in case this issue occurs again.
When using an account access token in our terraform pipelines, we are seeing errors that we believe are due to rate limiting:
Since, at this time, rate limits for account tokens are not configurable, there is little we can do to avoid this. However, the provider should handle this more gracefully with a better error message and a retry backoff. This type of logic is present in other terraform providers.