runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.68k stars 1.05k forks source link

Prevent atlantis lock using `atlantis plan -lock=false` and disable lock on draft PRs #2237

Open lorenz-scalable opened 2 years ago

lorenz-scalable commented 2 years ago

Hi Atlantis community 👋

Problem

My feature proposal is related to a comment on this issue. We use Atlantis in my company to plan and apply terraform changes. The fact that you can only run the plan command if the directory/project is not locked by another PR slows our workflow down significantly. Issues that are related to the plan output can only be fixed once the previous PR was merged.

Solution

That is why I would like to add an option to the plan command that will let you run a plan without acquiring any Atlanits locks. It will only run the plan, comment the output on Github and then forget about the plan. This would obviously also mean that this plan cannot be applied.

Example: atlantis plan --draft

How to proceed

As I said, I would like to implement this feature myself and I have some questions:

nitrocode commented 2 years ago

When I first read the title of this, I thought you would want atlantis to ignore draft github PRs.

It sounds like the flag should be called

lorenz-scalable commented 2 years ago

When I first read the title of this, I thought you would want atlantis to ignore draft github PRs.

It sounds like the flag should be called

  • --plan-only
  • or perhaps --disable-lock similar to -lock=false

@nitrocode I appreciate your feedback! I would like to separate function and implementation of the flag. I am not sure yet how this will be implemented and it might not only be disabling the lock. The purpose of the option would be to draft a plan. This might mean that it will disable locking or something else. However it will definitely mean that the plan will not be apply-able. IMO this part is not obvious if we go for --plan-only or --disable-lock. I try to familiarise myself with the general architecture of the project this weekend - then I will know more. I will also add you to the potential PR as I value your feedback.

nitrocode commented 2 years ago

It's possible for terraform to only lock on apply and not on plan which may be the simplest solution.

For now, a workaround might be to update your workflow for plans to prevent locking.

# atlantis.yaml

version: 3
projects:
- name: project1-staging
  dir: project1
  workflow: staging

workflows:
  staging:
    plan:
      steps:
      - init
      - plan:
          extra_args: ["-lock=false"]

After you dig more into the code to familiarize yourself, I'd be curious to hear what other helpful additions could improve your team workflow.

nitrocode commented 2 years ago

Another somewhat related issue https://github.com/runatlantis/atlantis/issues/1125 (Allow for draft prs to not lock state)

lorenz-scalable commented 2 years ago

I started working on an implementation and I am slowly realising that this will require changes in a lot of places. Of course you would have to add the flag to the command, to the project context and the command runner. But you would also have to add it to a lower level to avoid overriding existing plans. It is definitely possible but I am not sure if my PR would ever be merged, because I would either have to change the architecture or introduce more coupling.

Writing a custom workflow sounds like a good alternative. I think extra_args: ["-lock=false"] would not have the desired effect though since it is a Terraform option and has nothing to do with the Atlantis lock. Maybe creating a new workspace, copying the Terraform state, planning on the new workspace and the removing the workspace again would work 🤔 ?

I will keep working on this 👍

jeffgran commented 2 years ago

I think another approach to this problem is to disable locking (which feature already exists) but instead just invalidate all other plans for the same project/directory when one plan is applied. So:

  1. PR # 1 is opened, auto-plans but does not make an atlantis lock.
  2. PR # 2 is opened, auto-plans but does not make an atlantis lock.
  3. PR # 1 gets atlantis apply comment
  4. The first thing Atlantis server does is delete/invalidate all other plans for this directory/project - in this case PR # 2. And then it would comment "this plan has been invalidated by a plan on PR # 1, please re-plan"

I am not sure if this has been proposed already but I have not seen it.

nitrocode commented 1 year ago

In order to disable atlantis locking, you'd have to find a way to keep your PRs always up to date with the default branch. Otherwise PR 1 could get merged first and the PR 2 may have a stale plan that was from before PR 1 was merged.

I suppose it's likely that PR 2's atlantis apply would result in a stale plan error and would be forced to do another atlantis plan.

nitrocode commented 1 year ago

I would prefer a flag to disable locking for all atlantis plans if a PR is a "draft" PR.

nitrocode commented 1 year ago

Or a subcommand of atlantis plan --disable-lock and like you said @lorenz-scalable , if it's not obvious that this plan is not appliable, perhaps we can craft the comment template to state that when the plan is posted.

jimsmith commented 1 year ago

I came across this from https://github.com/runatlantis/atlantis/issues/1212 after reading this really is a first class citizen functionality.

Given env0 has listened to customer feedback, this will sway alot of developers to look at alternatives as not having this feature is a bottleneck for reduce friction and draft PR, slow down development.

https://www.env0.com/blog/implement-atlantis-style-terraform-and-terragrunt-workflows-in-env0

A key difference with our implementation is that PR Plans do not lock the Terraform state, and can be run concurrently. The reason for this comes directly from customer feedback. In multiple discussions, it was clear that locking the directory or workspace until merge blocks other devs from working on the same project, and slows teams down without adding much value

I'm looking at Atlantis and so far without this basic feature I need to feedback that this is not suitable at all.

The use case is exactly ephemeral PRs and infrastructure PR stacks that are short lived ones to spin up for short lived spikes, development and experiments and to test things out.

Developer's no longer build a single stack as the approach is to build ephemeral infrastructure stacks to be able to replicate and free to fix iac infrastructure coding issues (single stack that everyone uses and the queues up to use and get time to make changes is antiquated)

This github issue matches the current use cases thats needed for the developers, engineers across multiple git repositories and projects that are spun up and short lived https://github.com/runatlantis/atlantis/pull/3275

developers aren't constraint to a single infrastructure stack until it gets to integration stage, staging and then deployed out to production using terraform workspace each ephemeral stack has their own isolated tfstate but equally there are use cases were terraform workspace isn't used in some projects and git repositories.

Please reconsider for not locking terraform plan

lorenz-scalable commented 1 year ago

Nice to see that there is some update on the topic with 3275. I have to apologise as I forgot to keep the issue updated myself. I started working on an implementation but quickly realised that it would require a bigger refactoring and my GO skills are not suitable for that. Instead we are currently experimenting with a special Terraform configuration setup that allows us to create draft plans without taking the lock. Those plans cannot be applied to ensure a valid state. The idea is to add draft projects for the existing projects and use a custom workflow to override the apply command. For example:

version: 3

workflows:
  draft:
    plan:
      steps:
      - init:
          extra_args: ["-lock=false"]
      - plan:
          extra_args: ["-lock=false"]
    apply:
      steps:
        - run: echo "not allowed for draft" && exit 1

projects:
  # normal projects
  - dir: services/service-a
    name: services/service-a
  - dir: services/service-b
    name: services/service-b

  # draft projects
  - dir: services/service-a
    name: services/service-a/draft
    repo_locking: false
    workflow: draft
    autoplan:
      enabled: false
  - dir: services/service-b
    name: services/service-b/draft
    repo_locking: false
    workflow: draft
    autoplan:
      enabled: false

The draft plan can be created by specifying the draft project: atlantis plan -p services/service-a/draft. Autoplan will still create a normal plan that will take the lock and can be applied.

brandon-fryslie commented 1 year ago

@jimsmith

I'm looking at Atlantis and so far without this basic feature I need to feedback that this is not suitable at all.

Disabling both Atlantis locks (via server flag) and Terraform locks (via custom workflow) is already available and have been for a long time. Are you asking for a server flag to disable Atlantis locking only on plan but not apply? I agree that would be useful, but as long as you have the Terraform lock on apply that will prevent 2 applies from running at the same time which is the behavior that matters. But I have used Atlantis before so I'm familiar with how to use it, it's possible you missed the server flag --disable-repo-locking and Atlantis custom workflows.

env0 is a paid product and for me the procurement process for a vendor takes so long it's not usually worth it. Pretty much every feature of env0 (except SSO and fancy UI) can be implemented in Atlantis with custom workflows and not only that I can integrate with all of the tools my company is already using. And implement it in less time than it would take just to get a contract signed with env0.

I'm sure env0 is better in a lot of ways, but the fact that it doesn't support a fully self-hosted option even if I have an enterprise license means it's not even worth looking at for us. Sure you can self-host agents, but:

The agent requires an internet connection but no inbound network access.

Automatic fail. It's a hard no on any solution that requires outbound internet access from a prod/prod adjacent environment. If you're only using it to deploy dev stacks and use something else for production then I'm sure it works great.