runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.72k stars 1.04k forks source link

Drift Detection #3245

Open jamengual opened 1 year ago

jamengual commented 1 year ago

Community Note

Describe the user story

As a User I will like to be able to detect drift in my infrastructure automatically. Atlantis could detect the change by running a plan for all the projects defined in my atlantis.yaml file against the main branch and create a PR for the ones that have changes pending.

Describe the solution you'd like I will like Atlantis to be able to enable drift detection, using some sort of configurable schedule/cron job to run plan against all my projects defined in my atlantis.yaml in the main branch and create PR/s for all the projects that found changes and I will like to be able to configure if I want auto apply of those drift PRs or have human intervention and slack alerting.

Describe the drawbacks of your solution The API might need some adjustments to make this possible. https://www.runatlantis.io/docs/api-endpoints.html

It will need to be compatible with github, gitlab and Bitbucket, but it can be incrementally released.

Atlantis does not create PRs so that will have to be implemented to make this work or something could be added to the UI to manage the drift feature.

Describe alternatives you've considered There is a github action implementation of this already that I have tested and it works : https://github.com/cresta/atlantis-drift-detection

it requires two actions and dependencies on actions that are not so well known so it will be ideal to implement this in atlantis internally instead of relying on different actions to do the job.

motatoes commented 1 year ago
The API might need some adjustments to make this possible.
https://www.runatlantis.io/docs/api-endpoints.html

why would it be needed?

from my understanding it is adding extra configuration flags in atlantis.yml and when picked up then atlantis can start running the jobs in a cron in its own backend and after that it will create a PR everytime there is drift

The interaction with the PR itself then becomes as usual with the atlantis flow right?

nitrocode commented 1 year ago

why would it be needed?

I believe you're correct. If drift detection is built-in then Atlantis won't need to hit its own API.


Features I'd like to see


I kind of like having this feature outside of atlantis since its less to maintain. It would be cool to maintain a GitHub action like the one you linked to and make better use of the api

I wonder if we could take advantage of renovatebot when hitting the atlantis api?

https://github.com/renovatebot/renovate

If we went this route then api changes might be needed


If we were to build this in then it would be nice to have a couple settings available in a server configuration

ATLANTIS_DRIFT_DETECTION=true
# daily at 8am
ATLANTIS_DRIFT_DETECTION_CRON="0 8 * * *"

and in the atlantis.yaml config

# repo global to override server config
drift_detection:
  enabled: true
  cron: 0 9 * * *
projects:
  - name: ue1-dev-ecs-service-titan
    dir: components/terraform/ecs-service
    workspace: ue1-dev-ecs-service-titan
    # per project override
    drift_detection:
      enabled: false

  - name: ue1-dev-ecs-service-metro
    dir: components/terraform/ecs-service
    workspace: ue1-dev-ecs-service-metro
    drift_detection:
      enabled: true
      cron: 0 9 * * *

Atlantis would need to skip locking while it runs plans for each directory or it may block developer flow intentionally

If drift was detected (plan contains changes) then for Atlantis to open a pr, it would have to modify a file in the directory with some commented metadata

Perhaps a drift.tf or similar per directory which could be appended or overwritten whenever drift is detected

# atlantis detected changed on 2023-03-20T12:42:14+00:00

Once the file is modified or added, a pr can be created.

nitrocode commented 1 year ago

This may be a duplicate of https://github.com/runatlantis/atlantis/issues/1035

jamengual commented 1 year ago

As @nitrocode explained there are two paths for this, Initially I thought about doing internally first so we can get that working and stable and then add changes to the API (PR with no changes, no locking etc) to be able to use any webhook type system to trigger the drift detection and let the user decide how to deal/create the reconcile PRs, this way the users have more control on how to deal with change. The reason for this is that I can see how many users will prefer to trigger this by other means due to control policies, auditing, security scanning etc.

motatoes commented 1 year ago

Thanks @nitrocode @jamengual I'm going to implement initial version based on his example from atlantis config example, will not worry about API changes for this one. Then we can iterate from there ..

server

ATLANTIS_DRIFT_DETECTION=true
# daily at 8am
ATLANTIS_DRIFT_DETECTION_CRON="0 8 * * *"

and atlantis.yml

# repo global to override server config
drift_detection:
  enabled: true
  cron: 0 9 * * *
projects:
  - name: ue1-dev-ecs-service-titan
    dir: components/terraform/ecs-service
    workspace: ue1-dev-ecs-service-titan
    # per project override
    drift_detection:
      enabled: false

  - name: ue1-dev-ecs-service-metro
    dir: components/terraform/ecs-service
    workspace: ue1-dev-ecs-service-metro
    drift_detection:
      enabled: true
      cron: 0 9 * * *
jamengual commented 1 year ago

sounds good to me

nitrocode commented 1 year ago

I'm thinking if this is built into atlantis, it may overload this single thread machine, so im for the option to do drift detection as a helper outside of atlantis.

If we can support a single atlantis plan run from cli and comment somewhere (like slack or any webhook) then we can do the following

Use a workflow or k8s cron to

  1. get all projects/dirs and start loop
  2. run atlantis plan locally for the current project/dir
    • then it should run the plan for a specific project
  3. If a plan shows changes, hit the web hook (which could be slack or other) with some templated text
  4. Repeat from step 2

i believe the above is basically https://github.com/cresta/atlantis-drift-detection

jamengual commented 1 year ago

I don't like that approach much since now you are the mercy of the VCS options to do that and if you are in bitbucket where you have no community of shared actions you will need to build all that yourself.

if you try the approach of the cresta action you will see how much the user needs to do to get it to work.

plus you can potentially have another Atlantis instance just to do drifts and not do anything else.

On Sat, Apr 1, 2023, 8:07 a.m. nitrocode @.***> wrote:

I'm thinking if this is built into atlantis, it may overload this single thread machine, so im for the option to do drift detection as a helper outside of atlantis.

If we can support a single atlantis plan run from cli and comment somewhere (like slack or any webhook) then we can do the following

Use a workflow or k8s cron to

  1. get all projects/dirs and start loop
  2. run atlantis plan locally for the current project/dir
    • then it should run the plan for a specific project
  3. If a plan shows changes, hit the web hook (which could be slack or other) with some templated text
  4. Repeat from step 2

i believe the above is basically https://github.com/cresta/atlantis-drift-detection

— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/3245#issuecomment-1492999706, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERDQYSPDE5PDEQUIDBLW7BACNANCNFSM6AAAAAAWARTOTE . You are receiving this because you were mentioned.Message ID: @.***>

jukie commented 1 year ago

I recently built a similar version of https://github.com/cresta/atlantis-drift-detection for Gitlab with the intention of having the VCS client bits be pluggable and less lock-in for a specific service. I need to clean it up before making it public but if that helps maybe we could create a separate repo for a drift-detector service under Atlantis which could run alongside.

Based on a scheduled pipeline it will:

  1. Run an API plan against all projects (supports explicitly including/excluding projects)
  2. If drift is found, opens a PR with a change to a dummy file (currently commits a timestamp to file)
  3. Comments atlantis plan project1|project2|project3 based on drifted projects
  4. Has support for adding assignees/reviewers

Personally I like having it as a scheduled Gitlab pipeline but that could easily be extended to a long running service or another form of trigger. We can start with an opinionated design but still allow for user choice.

jamengual commented 1 year ago

the most annoying part of the cresta implementation is the fact. that you need yet another action to create the pull request, is that was built in in atlantis then it will make it very easy to integrate.

On Sat, Apr 1, 2023, 10:09 a.m. Isaac Wilson @.***> wrote:

I recently built a similar version of https://github.com/cresta/atlantis-drift-detection for Gitlab with the intention of having the VCS client bits be pluggable. I need to clean it up for pushing publicly but if that helps maybe we could create a separate repo under Atlantis for a drift-detector service which could run alongside.

— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/3245#issuecomment-1493048606, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERB7B7OWFN5FCYEYCN3W7BOK3ANCNFSM6AAAAAAWARTOTE . You are receiving this because you were mentioned.Message ID: @.***>

nitrocode commented 1 year ago

I don't know if I'd want it to create a pull request. Id rather it hit a webhook (e.g. mention the drift in slack with a link) with custom data. The link could navigate to the project in the ui and show a status of drifted. The ui could expose an apply button to run terraform apply in resolve the drift. This is how other saas platform do it.

Ideally each run of Atlantis would be in a separate run so the server isn't overloaded.

jukie commented 1 year ago

Still WIP but I've made this which can be orchestrated up to the user: https://github.com/jukie/atlantis-drift-detection

sadminriley commented 1 year ago

Any updates on this one? Would be super nice to have this in Atlantis. I noticed this has been outstanding for quite some time - https://github.com/runatlantis/atlantis/pull/3269

jamengual commented 1 year ago

Sadly, both developers who volunteered to build this feature never replied, so it is on pause.

We need committed community contributors to make this happen and hopefully supported by their companies to do so.

marcportabellaclotet-mt commented 9 months ago

I have also been working in a custom tool, to be able to manage drift. It is mainly focused on github, and it works by auto-discovering the repositories. Repository list to be checked can be passed as an argument if needed.

It works quite well, but it lacks some functionality like checking for atlantis locks. I will work on this.

But as shared here, maybe it would be a better approach to integrate this feature as part of atlantis core. The related PR in this thread seemed promising, but unfortunately discontinued.

motatoes commented 9 months ago

Hi guys, I'm sorry for the silence here last couple of weeks .. I got caught up with work and couldn't give the PR much attention. I'm going to take a look at it on the next couple of days over new years holidays! So I hope to make good progress on it :)

gaurav517 commented 6 months ago

It would be so nice to have this feature.

PScoriae commented 6 months ago

doing my part to show interest :)

seifrajhi commented 5 months ago

+1 Looking forward to seeing this feature implemented soon! :)

djsingh23 commented 5 months ago

Will this feature be available anytime soon?

jamengual commented 5 months ago

not without community contribution.

On Wed, May 1, 2024, 5:32 p.m. DJ Singh @.***> wrote:

Will this feature be available anytime soon?

— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/3245#issuecomment-2089337894, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERG3JPD7WK3BIRAWY4DZAGCRLAVCNFSM6AAAAAAWARTOTGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBZGMZTOOBZGQ . You are receiving this because you were mentioned.Message ID: @.***>

nitrocode commented 5 months ago

Another way to set this up is similar to how atmos has set it up where it runs the plans across root dirs and creates open github issues when drift is found.

https://atmos.tools/integrations/github-actions/atmos-terraform-drift-detection/

raghulkrishna commented 4 months ago

+1