runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.82k stars 1.06k forks source link

New commit or "plan comment" to cancel the currently running plan and rerun #305

Open majormoses opened 6 years ago

majormoses commented 6 years ago

Current Behavior:

  1. user opens PR
  2. plan runs and user waits for CR
  3. user needs to rebase/merge master into their branch to pick up already merged changes
  4. user commits again (terraform fmt for example)
  5. Atlantis comments back "the default workspace is currently locked by another command that is running for this pull request–wait until the previous command is complete and try again" and marks the pull request status checks as failed
  6. running plan finishes and outputs success but does not update the status check

Desired Behavior:

  1. cancels the current plan and comments back such
  2. new plan is run with the latest commit

Related

jolexa commented 6 years ago

I had the same experience and logged it in #303 including a few options that I thought of. 👍

majormoses commented 6 years ago

We do need to be careful to send/trap the signal to allow terraform to shut down gracefully or it could lead to a corrupted/incomplete state.

smiller171 commented 5 years ago

@majormoses Is that true even when only running plan?

majormoses commented 5 years ago

@smiller171 the short answer it depends, the longer answer is yes :wink:. If you run terraform with a plan -refresh=true it can absolutely cause state corruption without allowing it to gracefully shut down. While this is not the default (anymore) we need to be careful about this as users are free to specify their own options. Additionally in #187 we also want to be able to abort an apply in the case of something missed during code review and to provide some damage control. I think we should solve the problem of how to trap the terraform process properly once which allows reuse rather than trying to take shortcuts and focus on delivering potentially dangerous quick wins.

joshk0 commented 5 years ago

I'm seeing this issue primarily in the context of this sequence of events:

In my mind, the way it should work is that after the second commit is pushed, a new plan should be 'queued' for execution directly after the first plan. Then the first plan can pass, set the check status to whatever that plan ended up as, then immediately begin working on the second plan.

The way things are now, I have to wait for all the plans to quiesce then issue atlantis plan again. It's a fairly annoying papercut for a lot of the amateur users of atlantis in my team because they're not aware of the state management (or lack thereof) inside atlantis.

It seems like the progression of this thread is focused on locking or aborting existing plans rather than enqueuing a new one. It seems like you can get correctness easiest by not trying to abort anything, but instead just getting a queueing logic setup properly.

smiller171 commented 5 years ago

Personally I'd prefer an abort so that I don't have to wait on the irrelevant plan

jolexa commented 5 years ago

Personally I'd prefer an abort so that other tooling waiting on ✅doesn't start up (race condition).

smiller171 commented 5 years ago

Aren't those checks bound to specific commits?

majormoses commented 5 years ago

The short answer is that it should be user configurable, in many cases all it does is clog up your pipeline to queue them. I think abort is a sane default and allow those that want to queue to do that.

andrewring commented 4 years ago

I would suggest the implementation of this not depend on new commits, but rather a new plan/apply starting. This will broaden scope to include manually re-running plan or apply via comments.

majormoses commented 4 years ago

I see your point but I think that we need to limit that only to plans and not applies. Canceling an apply can lead to state corruption even in some cases when they are properly trapped. On the matter of canceling a running apply that is a much larger issue and that needs to be tackled separately with a great deal of care.

andrewring commented 4 years ago

That's a great point. Perhaps, instead, applies can block, or otherwise be stopped from running in parallel, but without interruption.

majormoses commented 4 years ago

That's a great point. Perhaps, instead, applies can block, or otherwise be stopped from running in parallel, but without interruption.

If an apply is running you can't just run an apply again without running a plan. In addition to atlantis locks there are terraform locks and hopefully something dynamodb as a locking table. What is the use case of doing anything other than commenting back that there is an apply in progress and they will need to wait until it is finished before plans can be run again? Terraform plan and applies must be done serially if they are to be trusted. I might be missing something but the problem set here IMO is specific to plans. I think that even if there were a use case I would want to tackle the general problem of safe killing an apply which is already being tracked on #187 I would not do it in the same change set as this.

andrewring commented 4 years ago

Not a correctness problem, then, so much as a clarity problem. It two people comment apply about the same time, it becomes unclear what's happening, as you see errors but no indication one of the runs is still running. I have seen that happen before.

majormoses commented 4 years ago

Not a correctness problem, then, so much as a clarity problem. It two people comment apply about the same time, it becomes unclear what's happening, as you see errors but no indication one of the runs is still running. I have seen that happen before.

💯 % agree and I think that's actually not that hard to solve, but is a separate issue. An apply (at least if you use it sanely) is always based on a previously saved plan so we just need to detect when an apply comment is passed and to have a better comment message. This is focused specifically on having the ability to cancel a running plan when a new comment (love the addition) or commit essentially invalidates if configured to. In the context of apply there is no such thing as it being invalid because you add new things, you wait for the current one to finish and then do another plan / apply cycle. There is a use case for canceling applies and I linked to that already.

igorjanevski commented 10 months ago

Is this in some roadmap to be worked on?