runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.54k stars 1.02k forks source link

Debian Image errors out while creating .git-credentials #4247

Open ugurcancaykara opened 5 months ago

ugurcancaykara commented 5 months ago

Community Note


Overview of the Issue

When using the alpine based image ghcr.io/runatlantis/atlantis:v0.27.1 write git-credentials succeed with the following message:

{"level":"info","ts":"2024-02-15T15:00:49.449Z","caller":"vcs/git_cred_writer.go:29","msg":"wrote git credentials to /home/atlantis/.git-credentials","json":{}}
{"level":"info","ts":"2024-02-15T15:00:49.451Z","caller":"vcs/git_cred_writer.go:71","msg":"successfully ran git config --global credential.helper store","json":{}}
{"level":"info","ts":"2024-02-15T15:00:49.452Z","caller":"vcs/git_cred_writer.go:77","msg":"successfully ran git config --global url.https://x-access-token/@github.com.insteadOf ssh://git@github.com","json":{}}

but when using the debian based image ghcr.io/runatlantis/atlantis:v0.27.1-debian write git-credentials failed with the following message:

Error: initializing server: could not write credentials: Writing ~/.git-credentials file: writing generated .git-credentials file with user, token and hostname to /run/sshd/.git-credentials: open /run/sshd/.git-credentials: no such file or directory

Reproduction Steps

Switch image tag from v0.27.1 to v0.27.1-debian

Environment details

env:
    - name: ATLANTIS_DATA_DIR
      value: /atlantis-data
    - name: ATLANTIS_REPO_ALLOWLIST
      value: github.com/$COMPANY/*
    - name: ATLANTIS_PORT
      value: "4141"
    - name: ATLANTIS_REPO_CONFIG
      value: /etc/atlantis/repos.yaml
    - name: ATLANTIS_ATLANTIS_URL
      value: http://atlantis/.$COMPANY.com
    - name: ATLANTIS_GH_APP_ID
      value: "REDACTED"
    - name: ATLANTIS_GH_APP_SLUG
      value: "REDACTED"
    - name: ATLANTIS_WRITE_GIT_CREDS
      value: "true"
    - name: ATLANTIS_GH_WEBHOOK_SECRET
      valueFrom:
        secretKeyRef:
          key: github_secret
          name: atlantis-webhook
    - name: ATLANTIS_GH_APP_KEY_FILE
      value: /var/github-app/key.pem
    - name: AWS_STS_REGIONAL_ENDPOINTS
      value: regional
    - name: AWS_DEFAULT_REGION
      value: eu-west-1
    - name: AWS_REGION
      value: eu-west-1
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::REDACTED:role/REDACTED
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token

Atlantis server-side config file:

Undefined

Repo atlantis.yaml file: Note: terragrunt is unused

workflows:
      withsubmodules:
        plan:
          steps:
            - run: git config --global url."https://x-access-token/@github.com/".insteadOf "git@github.com:"
            - run: git submodule update --recursive --init
            - init
            - plan
        apply:
          steps:
            - apply
      terragrunt:
        plan:
          steps:
            - env:
                name: TERRAGRUNT_TFPATH
                command: 'echo "terraform${ATLANTIS_TERRAFORM_VERSION}"'
            - env:
                # Reduce Terraform suggestion output
                name: TF_IN_AUTOMATION
                value: "true"
            - env:
                name: TERRAGRUNT_NON_INTERACTIVE
                value: "true"
            - env:
                name: TERRAGRUNT_INCLUDE_EXTERNAL_DEPENDENCIES
                value: "true"
            - env:
                name: TERRAGRUNT_SOURCE_UPDATE
                value: "true"
            - env:
                name: TF_PLUGIN_CACHE_DIR
                command: 'echo "${ATLANTIS_DATA_DIR}/plugin-cache"'
            - run:
                command:
                  terragrunt run-all init -input=false -no-color
                  #output: hide
            - run:
                command:
                  terragrunt run-all plan -input=false -no-color
                  #output: hide
        apply:
          steps:
            - env:
                name: TERRAGRUNT_TFPATH
                command: 'echo "terraform${ATLANTIS_TERRAFORM_VERSION}"'
            - env:
                # Reduce Terraform suggestion output
                name: TF_IN_AUTOMATION
                value: "true"
            - env:
                name: TERRAGRUNT_NON_INTERACTIVE
                value: "true"
            - env:
                name: TERRAGRUNT_INCLUDE_EXTERNAL_DEPENDENCIES
                value: "true"
            - env:
                name: TF_PLUGIN_CACHE_DIR
                command: 'echo "${ATLANTIS_DATA_DIR}/plugin-cache"'
            - run: terragrunt run-all apply
        import:
          steps:
            - env:
                name: TERRAGRUNT_TFPATH
                command: 'echo "terraform${DEFAULT_TERRAFORM_VERSION}"'
            - env:
                name: TF_VAR_author
                command: 'git show -s --format="%ae" $HEAD_COMMIT'
            # Allow for imports as not supported for Terraform wrappers by default
            - run: terragrunt import -input=false $(printf '%s' $COMMENT_ARGS | sed 's/,/ /' | tr -d '\\')
        state_rm:
          steps:
            - env:
                name: TERRAGRUNT_TFPATH
                command: 'echo "terraform${DEFAULT_TERRAFORM_VERSION}"'
            # Allow for state removals as not supported for Terraform wrappers by default
            - run: terragrunt state rm $(printf '%s' $COMMENT_ARGS | sed 's/,/ /' | tr -d '\\')
    repos:
      - id: /inf.applications/
        workflow: withsubmodules
      - id: /inf.shared-infra/
        workflow: terragrunt
        pre_workflow_hooks:
          - run: terragrunt-atlantis-config generate --output atlantis.yaml --workflow terragrunt --automerge --autoplan --create-workspace
        import_requirements: [approved]
        allowed_overrides: [workflow]
cvirtucio commented 2 months ago

I'm curious where this landed. seems like there was a PR to ensure the directory exists, but was closed because an alternate resolution was found. I tried searching through the atlantis codebase and cannot find any references to sshd anywhere.

cvirtucio commented 2 months ago

ok, think I've found the issue. the wrong uid is being used during operation. we've been hard-coding the runtime user to 100:1000 because of this issue, and I guess AWS ECS was assigning 100 to the /run/sshd user. as a workaround, we essentially did something more or less equivalent to this PR:

  1. forcefully set the uid of the atlantis user to 1000
  2. chown its home directory to 1000:1000
  3. set the runtime user to 1000:1000

the PR I mentioned would probably do away for the need of this workaround.