Open majormoses opened 6 years ago
I had the same experience and logged it in #303 including a few options that I thought of. 👍
We do need to be careful to send/trap the signal to allow terraform to shut down gracefully or it could lead to a corrupted/incomplete state.
@majormoses Is that true even when only running plan
?
@smiller171 the short answer it depends, the longer answer is yes :wink:. If you run terraform with a plan -refresh=true
it can absolutely cause state corruption without allowing it to gracefully shut down. While this is not the default (anymore) we need to be careful about this as users are free to specify their own options. Additionally in #187 we also want to be able to abort an apply in the case of something missed during code review and to provide some damage control. I think we should solve the problem of how to trap the terraform process properly once which allows reuse rather than trying to take shortcuts and focus on delivering potentially dangerous quick wins.
I'm seeing this issue primarily in the context of this sequence of events:
In my mind, the way it should work is that after the second commit is pushed, a new plan should be 'queued' for execution directly after the first plan. Then the first plan can pass, set the check status to whatever that plan ended up as, then immediately begin working on the second plan.
The way things are now, I have to wait for all the plans to quiesce then issue atlantis plan again. It's a fairly annoying papercut for a lot of the amateur users of atlantis in my team because they're not aware of the state management (or lack thereof) inside atlantis.
It seems like the progression of this thread is focused on locking or aborting existing plans rather than enqueuing a new one. It seems like you can get correctness easiest by not trying to abort anything, but instead just getting a queueing logic setup properly.
Personally I'd prefer an abort so that I don't have to wait on the irrelevant plan
Personally I'd prefer an abort so that other tooling waiting on ✅doesn't start up (race condition).
Aren't those checks bound to specific commits?
The short answer is that it should be user configurable, in many cases all it does is clog up your pipeline to queue them. I think abort is a sane default and allow those that want to queue to do that.
I would suggest the implementation of this not depend on new commits, but rather a new plan/apply starting. This will broaden scope to include manually re-running plan or apply via comments.
I see your point but I think that we need to limit that only to plans and not applies. Canceling an apply can lead to state corruption even in some cases when they are properly trapped. On the matter of canceling a running apply that is a much larger issue and that needs to be tackled separately with a great deal of care.
That's a great point. Perhaps, instead, applies can block, or otherwise be stopped from running in parallel, but without interruption.
That's a great point. Perhaps, instead, applies can block, or otherwise be stopped from running in parallel, but without interruption.
If an apply is running you can't just run an apply again without running a plan. In addition to atlantis locks there are terraform locks and hopefully something dynamodb as a locking table. What is the use case of doing anything other than commenting back that there is an apply in progress and they will need to wait until it is finished before plans can be run again? Terraform plan and applies must be done serially if they are to be trusted. I might be missing something but the problem set here IMO is specific to plans. I think that even if there were a use case I would want to tackle the general problem of safe killing an apply which is already being tracked on #187 I would not do it in the same change set as this.
Not a correctness problem, then, so much as a clarity problem. It two people comment apply about the same time, it becomes unclear what's happening, as you see errors but no indication one of the runs is still running. I have seen that happen before.
Not a correctness problem, then, so much as a clarity problem. It two people comment apply about the same time, it becomes unclear what's happening, as you see errors but no indication one of the runs is still running. I have seen that happen before.
💯 % agree and I think that's actually not that hard to solve, but is a separate issue. An apply (at least if you use it sanely) is always based on a previously saved plan so we just need to detect when an apply comment is passed and to have a better comment message. This is focused specifically on having the ability to cancel a running plan when a new comment (love the addition) or commit essentially invalidates if configured to. In the context of apply there is no such thing as it being invalid because you add new things, you wait for the current one to finish and then do another plan / apply cycle. There is a use case for canceling applies and I linked to that already.
Is this in some roadmap to be worked on?
Current Behavior:
terraform fmt
for example)"the default workspace is currently locked by another command that is running for this pull request–wait until the previous command is complete and try again"
and marks the pull request status checks as failedDesired Behavior:
Related