connect multiple atlantis server with Single repo

amitrana93 commented 3 years ago

Hi Guys, We have a following scenario 2 AWS account and 1 repo for DEV, QA and PROD Environment.

One aws account for DEV . Second aws account QA and prod .

Right now, We are managing single repo for deployment in DEV ,QA and PROD and want to continue with that and connect multiple atlantis server via single repo.

Can you suggest the other solution to managing multiple Atlantis server for multiple account without the use of IAM assume roles as we dont want to use aws multi account.

jamengual commented 3 years ago

So far Atlantis does not have functionality for that but I have been working on a branch for multienv but is it not complete but basically will allow you to have repo, multiple atlantis servers.

if you are curious the branch is : https://github.com/runatlantis/atlantis/tree/multiserver

kitos9112 commented 3 years ago

@jamengual we'd love to see something like that being implemented. That model will truly support a multi-account environment within a monorepo approach. Perfectly suitable to people like me who uses Terragrunt in production

@jamengual Is it in your plans to keep working on it any time shortly?

Many thanks!

tinder-tder commented 3 years ago

you can do it now, but the issue I have it multiple comments in git show up (for each atlantis server). We have a wrapper setup to only do actions if files are change in certain paths (the same as the env the atlantis server is in), so other accounts just leave a NOOP comment. I think the only way to prevent that right now is to have another wrapper for the webhook to eat it if its not applicable.

jamengual commented 3 years ago

yes, my plan is to start working on this again in March, I'm too busy to do it right now.

jamengual commented 3 years ago

@kitos9112 @amitrana93 you guys could build atlantis from that branch and use it and give me feedback, it will be pretty useful.

patrickjahns commented 3 years ago

@tinder-tder Could you elaborate a bit on the wrapper setup? Quite curious on how it works

tinder-tder commented 3 years ago

@tinder-tder Could you elaborate a bit on the wrapper setup? Quite curious on how it works

@patrickjahns sure, sorry for the delayed response. Its a simple script that atlantis calls instead of terraform/terragrunt directly. We check the change path against a regexp and if it doesnt match it exits with NOOP (this is a puppet template)

#!/bin/bash
#this script will wrap terragrunt to limit what gets applied based on the allowed path value
VALID="<%= @allowed_path -%>"
COUNT=$(echo $REPO_REL_DIR | grep -Ec "${VALID}")
#env
hostname
if [ $COUNT -eq 0 ]; then
   echo "NOOP $1: $REPO_REL_DIR not a match for $VALID"
   exit 0
else
   echo "$1: $REPO_REL_DIR matched for $VALID"
fi
case $1 in
  plan)
    terragrunt plan -no-color -out=$PLANFILE
    ;;
  apply)
    terragrunt apply -no-color $PLANFILE
    ;;
  *)
    echo "unknown command $1"
    exit 1
    ;;
esac

the repo yaml looks like

# atlantis server side repo config
repos:
- id: "<%= @repo_whitelist -%>"
  workflow: terragrunt
workflows:
  terragrunt:
    plan:
      steps:
      - run: tgwrapper.sh plan
    apply:
      steps:
        - run: tgwrapper.sh apply

ipeacocks commented 3 years ago

@tinder-tder is $REPO_REL_DIR internal Atlantis variable?

jasonrberk commented 3 years ago

@ipeacocks - yes

REPO_REL_DIR - The relative path of the project in the repository. For example if your project is in dir1/dir2/ then this will be set to "dir1/dir2". If your project is at the root this will be ".".

https://www.runatlantis.io/docs/custom-workflows.html#reference

jasonrberk commented 3 years ago

I got this setup...and just to be clear....even with a wrapper in place, you still get an empty comment in the PR like this:

I have a prod, pre-prod and non-prod atlantis running and that's what happened when the script "no-op" exits while using atlantis to update atlantis :-)

spuder commented 2 years ago

We also would very much like support for multiple atlantis servers. We have 1 repo with 4 environments, and each of those environments is completely isolated. We want to point all production terraform at the production atlantis server.

consul
  \_ sand
  \_ qa
  \_ integration
  \_ production
vault
  \_ sand
  \_ qa
  \_ integration
  \_ production

Using the wrapper script suggested above does allow for multiple atlantis servers, however without the ability to hide empty runs, the discussion quickly becomes unreadable.

Screen Shot 2021-11-01 at 10 40 41 PM

Leooo commented 2 years ago

I managed to make it work without much problems using existing Atlantis options:

[x] use one Atlantis instance per environment per repo.
[x] uses Atlantis repo-config-json
[x] uses Atlantis pre workflow hooks to pre-generate for each Atlantis instance atlantis.yaml by copying from atlantis-[env].yaml files, when they exist.
[x] uses Atlantis silence-no-projects

          env {
            name  = "ATLANTIS_SILENCE_NO_PROJECTS"
            value = true
          }
          env {
            name = "ATLANTIS_REPO_CONFIG_JSON"
            value = jsonencode(
              {
                "repos" : [
                  {
                    "id" : "github.com/myorg/my-repo",
                    "pre_workflow_hooks" : [{
                      "run" : "cp atlantis-${each.value.env}.yaml atlantis.yaml"
                    }]
                  }
                ]
              }
            )
          }

# atlantis-dev.yaml
version: 3
projects:
  - name: dev
    dir: env/dev
    autoplan:
      when_modified: ["env/dev/modules/**/*.tf", "*.tf", "*.yaml"]

# atlantis-prd.yaml
version: 3
projects:
  - name: prd
    dir: env/prd
    autoplan:
      when_modified: ["env/prd/modules/**/*.tf", "*.tf", "*.yaml"]

[x] Add one Atlantis status checks per project using --vcs-status-name flag

The only thing we can't use now is the automergeable feature (because it could merge the PR if some plans apply and others don't).

FlorianNeacsu commented 2 years ago

The solution I implemented is similar to what @Leooo did:

one atlantis per env in a single repo: dev, qa and prod

pre_workflow_hooks with terragrunt-atlantis-config filtering per env - this autogenerates the atlantis.yaml

terragrunt-atlantis-config generate --automerge=true --autoplan --parallel=true --create-workspace --create-project-name --output ./atlantis.yaml --filter ${ENV_NAME}

Atlantis silence-no-projects

The problem with this is when changes are made to multiple environments under one PR, and auto-merge is configured. One atlantis can close the PR before the others have started/finished.

Leooo commented 2 years ago

@FlorianNeacsu exactly. I'm looking for a way to add one github status check per atlantis instance atm (so have atlantis/apply-dev and atlantis/apply-prod names for status checks, say), so that the PR can't be merged until all checks / jobs pass. Not sure where this code sits on Atlantis side for now.

Short term mitigation is to use atlantis apply --auto-merge-disabled until all plans apply successfully (maybe alias it on atlantis apply), then use a final atlantis apply for the final merge. meh.

EDIT:

That should solve everything (one more time, Atlantis has all the options we need, just need to find it in the documentation). Testing it now and if it works I will update the process above.

jasonrberk commented 2 years ago

I built a custom Atlantis Proxy that handles confirming all GHE status checks are green before making the PR mergable. My GHE hooks point to the proxy. The proxy forwards the request to the correct atlantis server. we have three AWS accounts basically tied to one GHE repo, where the repo has a folder for dev, staging, and production. The proxy adds status checks to the PRs and handles cases where things like the README in the repo change and atlantis shouldn't get involved. This repo is basically a hub with terragrunt files that point to other repos where the TF actually lives

Leooo commented 2 years ago

@jasonrberk happy to get your code - although I was reaching in the above for a simple solution using standard Atlantis options (no terragrunt etc.), and it's pretty close now.

jasonrberk commented 2 years ago

I can't share the code base as it's not open source, but I can give a general overview

in my container running atlantis, I have a pre workflow and a custom workflow that looks like this: (ghe is a node script that takes args)

  pre_workflow_hooks:
        - run: ghe setCommitStatus ${HEAD_COMMIT} success

plan)
    ghe configureBranchProtection
    ghe dismissApprovals ${PULL_NUM}
    ghe setCommitStatus ${HEAD_COMMIT} pending
    terragrunt plan -out=$PLANFILE $DESTROY_PARAMETER | sed -E 's/^( *)([-+~]|-\/\+)/\2\1/;s/^~ /! /'
    ;;

https://docs.github.com/en/enterprise-server@3.0/rest/reference/repos#create-a-commit-status

the idea is that anytime atlantis plans something, it dismisses the approvals so the plan needs to be re approved and it sets a commit status that prevents a human from clicking the merge button after an approval, in case of a failed plan (ie: rubber stamping w/o validating the plan)

in order to merge your PR, you have to comment atlantis apply. The proxy will see that comment and clean the status check.

(Yes, a human could comment atlantis apply and manually merge before atlantis actually applies.....but we've found that smart people don't do this..... this was more to block the Pavlovian response to click the big green merge button in GHE)

the gist of the proxy is:

import express from 'express';
import httpContext from 'express-http-context';
import { PORT } from './utils/config.js';

import * as middleware from './middleware.js';

const app = express();

app.use('/events',
    express.json(),
    middleware.validateRequest,               // validate the request came from GHE
    httpContext.middleware,                      // https://www.npmjs.com/package/express-http-context
    middleware.storeRequestMetadata,   // so we can associate the logs with a specific GHE event
    middleware.filterComments,                // ignore any comment on the PR that doesn't start with 'atlantis'
    middleware.extractPullNumber,          // normalize the location of the pull number on the req object
    middleware.addPrFilesToRequest,      // get all the files in the PR so subsequent filters can make decisions 
    middleware.handleNonTfPulls,           // escape hatch for PRs that don't change TF files

    // from this point forward, at least _some_ of the files in the PR are actionable (ie .hcl) files
    middleware.handleCrossAccountPulls,     // filter out cross account / non-applicable PRs 
    middleware.forwardToAtlantis,                   // finally, forward the request to atlantis and return the response
);

app.get('/health', (req, res) => {
    res.sendStatus(200);
});

app.listen(PORT, () => console.log(`starting Atlantis proxy on port ${PORT}`));

clear as mud?

I'm sure my use case is very specific to how my org does thing in AWS and how we have TF / TG and Terragrunt Frontend configured.

as with anything, I'm sure there's room for improvement, but this is working for us in that it:

prevents PR merges by humans clicking big green buttons by creating a status check AND adding it to the branch protection
allows three different atlantis instances to be tied to a single repo where folders denote AWS accounts
prevents users from making changes to multiple AWS account in a single PR
prevents atlantis instances in unaffected accounts from commenting on the PR (ie: when you PR a change for dev we don't want any comments from staging or production atlantis instances)

jamengual commented 2 years ago

this is possible by doing https://github.com/runatlantis/atlantis/issues/1345#issuecomment-1002625950

jamengual commented 2 years ago

please report back otherwise

lvthao commented 1 year ago

@Leooo : i have two instance (dev and prod )of atlantis and connect to one repo. I follow your way to setup pre-hook but i get the message error on prod instance about the workflow dev is not define on prod atlantis when run atlantis plan -p dev . How do you solve this issue ?

Leooo commented 1 year ago

@lvthao not sure, it looks like your dev file hasn't been parsed. Have you declared one like the below in your repo. together with the ATLANTIS_REPO_CONFIG_JSON env var? Note that the name of the plan in the file must be dev. Probably the Atlantis logs will give you more details. Can you also check that a simple atlantis plan fails with same error.

# atlantis-dev.yaml
version: 3
projects:
  - name: dev
    dir: env/dev
    autoplan:
      when_modified: ["env/dev/modules/**/*.tf", "*.tf", "*.yaml"]

lvthao commented 1 year ago

it was parsed the atlantis yaml file. When i ran the atlantis plan -p dev, both instance dev and prod will be run this cmd and on prod we only define atlantis-prod.yaml and there is no project for dev so that i got the error from atlantis prod instance. On the instance prod atlantis, i configed the /etc/atlantis/repos.yaml

    ---
    repos:
    - id: "xxx"
      apply_requirements: ["approved", "mergeable", "undiverged"]
      allowed_overrides: ["workflow"]
      allow_custom_workflows: true
      allowed_workflows: [dev,prod]
      pre_workflow_hooks:
        - run: cp atlantis-prod.yaml atlantis.yaml
    workflows:
      prod:
         plan: xxx
         run: xxx

atlantis prod:

version: 3
projects:
- name: prod
  dir: .
  workspace: prod
  workflow: prod

Leooo commented 1 year ago

@lvthao it looks like you have a different setup than the one I described. On my side I have separate atlantis-dev.yaml, atlantis-prod.yaml files in my repo (and no atlantis.yaml), and I dynamically create an atlantis.yaml file before processing in my pre_workflow_hook by copying one of the atlantis-xx.yaml files (depending on the env. being run)

lvthao commented 1 year ago

i don't have the file atlantis.yaml on the repo. I copy this file by the prehook


        - run: cp atlantis-prod.yaml atlantis.yaml ```

Leooo commented 1 year ago

not sure then tbh. The atlantis logs should help you zoom into the error

lvthao commented 1 year ago

@Leooo did you define the custom workflow on your atlantis repos.yaml ?

Leooo commented 1 year ago

@lvthao I don't have any custom workflows / workflows object defined neither in the server nor in the repos. This is the content of atlantis-prd.yaml for example, in the repo:

version: 3
projects:
  - name: prd
    dir: env/prd
    terraform_version: v1.3.2
    autoplan:
      when_modified: [ "../../modules/**/*.tf",  "../../modules/**/*.yaml", "*.tf", "../../*.yaml"]

runatlantis / atlantis

connect multiple atlantis server with Single repo #1345