jfrog / terraform-provider-platform

Terraform provider to manage JFrog Platform
https://jfrog.com
Apache License 2.0
1 stars 1 forks source link

API 500 errors not reported to console during terraform apply in provider 10.7.1 #104

Closed jeidsath closed 1 month ago

jeidsath commented 2 months ago

We have been investigating an incident with JFrog support (case #302187), where 500 errors encountered by the provider were not reported back to the terraform console. Our provider version was 10.7.1.

Instead of console errors, the terraform apply hung with no error output (until we eventually cancelled it and ran it again without errors. Logs are in the referenced case, but we had output like the following):

<snip many entries like the below>
platform_permission.permissions["reponame"]: Still creating... [15m21s elapsed]
platform_permission.permissions["reponame_b"]: Still creating... [1m20s elapsed]
platform_permission.permissions["reponame_c"]: Still creating... [3m20s elapsed]
<no further output, job cancelled>

I see the somewhat recent "Fix http response error handling" commit by @alexhung from March that predates 10.7.1.

If this is still the current behavior of the provider, I hope that we can see API errors propagated to the console in the future. Because apply can apparently hang (like this), it might be helpful to send to console immediately instead of collating errors for later reporting.

alexhung commented 1 month ago

@jeidsath Thanks for the report. From first glance, it looks like a bug that needs to be fixed.

alexhung commented 1 month ago

@jeidsath BTW, the resource you mentioned is in the jfrog/platform provider so I'm moving this issue to that repo.

alexhung commented 1 month ago

@jeidsath The error reporting mechanism is not entirely in the provider's control. The provider is executed as a child process of the Terraform CLI so error reported by the provider is collected through Terraform protocol back to the parent process. It then outputs them to the console, according to its logic.

I'll fix the issue on the provider end but this will not ensure/guarantee realtime reporting of the errors in the logs by Terraform.

alexhung commented 1 month ago

@jeidsath Also, if the issue is reproducible then you can set env var TF_LOG=DEBUG to get lots more logging, including detailed HTTP request/response. This may provide you (and us) more clues to why the request is staled.

jeidsath commented 1 month ago

Thank you, @alexhung I looked over this with the team on Friday, and we're excited to switch to the new provider when it is released.

As the 500 errors are not easily reproducible (we think there was a database connection leak in the prod artifactory), we don't have any immediate testing plans for TF_LOG=DEBUG, but will give it a try if the issue comes up again.

We appreciate the quick followup!