Closed Oracen closed 4 months ago
Hi @Oracen!
Thank you for the thorough issue report and steps to reproduce. @deanhuynh has just shipped a new version that hopefully addresses this. Can you try it out and let us know if it solves your problem?
To be clear, the change does not rollback the creation upon failure. Instead, it ensures that the resource is tainted so upon the next apply, it is deleted then recreated and the resulting state is as expected. This follows the same convention as in the AWS provider.
We will also look into improving the error messages separately 🙂 thanks for raising!
Just upgraded, seems to be working as intended! Solid LGTM.
Instead, it ensures that the resource is tainted so upon the next apply, it is deleted then recreated and the resulting state is as expected.
Thanks for the insight... I haven't touched the depths of Terraform since ~v0.11 so I appreciate the clarification on the mechanics.
Love the work team
Description
I've found an issue where Segment resources are created even when
terraform apply
fails and the state is not updated. I noticed this in the context of tracking plans, but it's likely duplicated in any "compound" resource that invokes more than one Segment endpoint.To Reproduce
Example Terraform code that triggers the issue:
A minimal example that will fail but create a resource:
This code will result in a 400 response (screenshot below), which I assume is indicating that the tracking plan rule is invalid. However the tracking plan itself will still be created in the console, without any rules attached.
My guess is this is because the tracking plan provider is hitting the
Create Tracking Plan
endpoint followed byUpdate Tracking Plan Rules
, without any preflight checks. This operation is non-atomic, thus allowing the remote system to fall into a "halfway" state. The 400 response marks thetf apply
operation as invalid, and so no resource is added to the state. When TF apply is run again, a new resource is created and the process repeats.A recommended fix would be to align the TF resources with atomic operations in the way the AWS provider does (yes the code will get a little ugly), or add some saga-like functionality to allow a rollback of a partially successful operation. The latter is probably much more complicated.
As a follow-up, it may also be helpful to pass back better debugging information in the error message. While there are security concerns about exposing too much information from the API, it would improve the DX to know the source of validation errors. Particularly where there are duplicate resources being created, guess-and-check is no longer a valid debugging strategy! :laughing: