runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.75k stars 1.05k forks source link

terraform lock file committed on arm, linux amd deploy, init command throws error with tf 0.14 #1408

Open ghostsquad opened 3 years ago

ghostsquad commented 3 years ago

I ran into the following issue:

running "/atlantis/data/bin/terraform0.14.6 init -input=false -no-color -upgrade" in "/atlantis/data/repos/tunein/atlantis/16/default/deploy/environments/production": exit status 1

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Finding hashicorp/aws versions matching "3.28.0"...
- Using hashicorp/aws v3.28.0 from the shared cache directory

Error: Failed to install provider from shared cache

Error while importing hashicorp/aws v3.28.0 from the shared cache directory:
the provider cache at .terraform/providers has a copy of
registry.terraform.io/hashicorp/aws 3.28.0 that doesn't match any of the
checksums recorded in the dependency lock file.

and after looking up the error (https://www.terraform.io/docs/cli/commands/providers/lock.html)

I think that -upgrade is the problem here, but I can't be sure.

ghostsquad commented 3 years ago

changing the workflow to look like this:

+      "workflows": {
+        "default": {
+          "apply": {
+            "steps": [
+              "apply"
+            ]
+          },
+          "plan": {
+            "steps": [
+              {
+                "run": "terraform init -input=false -no-color"
+              },
+              "plan"
+            ]
+          }
+        },

I now get this:

exit status 1: running "terraform init -input=false -no-color" in "/atlantis/data/repos/tunein/atlantis/16/default/deploy/environments/production": 

Error: Unsupported Terraform Core version

  on main.tf line 15, in terraform:
  15:   required_version = "0.14.6"

This configuration does not support Terraform version 0.13.0. To proceed,
either choose another supported Terraform version or update this version
constraint. Version constraints are normally set for good reason, so updating
the constraint may lead to other errors or unexpected behavior.

despite having my .atlantis.yaml set as:

projects:
  - name: production
    dir: ./deploy/environments/production
    terraform_version: 0.14.6

and main.tf with:

terraform {
  ...

  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "3.28.0"
    }
  }

  required_version = "0.14.6"
}
ghostsquad commented 3 years ago

I was able to fix the version issue by changing the workflow to terraform${ATLANTIS_TERRAFORM_VERSION} init -input=false -no-color. I think this needs to be better called out in the documentation, right now, it makes it seem that simply using terraform in a custom workflow will do the right thing, but it won't.

nishkrishnan commented 3 years ago

if you use extra-args it'll use the version you specify in your atlantis.yaml.

referencing the binary directly in a custom run command doesn't work if you're using the non-default. We can make this clearer in our docs.

bryankaraffa commented 3 years ago

+1 on this issue as we encountered it with Terraform v0.14. Seems like the two workaround are:

ghostsquad commented 3 years ago

if you use extra-args it'll use the version you specify in your atlantis.yaml.

referencing the binary directly in a custom run command doesn't work if you're using the non-default. We can make this clearer in our docs.

Are extra args deduplicated? Such that if I specify an argument that is already a default (but with a different value), are they both passed to terraform? Or does last arg win?

davidmontoyago commented 3 years ago

Are extra args deduplicated? Such that if I specify an argument that is already a default (but with a different value), are they both passed to terraform? Or does last arg win?

They don't seem to get dedup. Adding extra_args: ["-upgrade", "false"] duplicates the -upgrade flag.

"/atlantis/bin/terraform0.14.7 init -input=false -no-color -upgrade -upgrade=false"
davidmontoyago commented 3 years ago

To follow up on this one... with fix https://github.com/runatlantis/atlantis/pull/1651 the -upgrade flag is deduped, however, atlantis will still fail with the error below (that is, when the .terraform.lock.hcl is committed):

Error: Failed to install provider from shared cache
Pluies commented 2 years ago

For what it's worth, I ran into the same issue, and it appears the root cause of the issue is that the terraform lock file was generated in OS X but Atlantis was running in linux_amd64?

Running the following line added extra checksums for the linux_amd64 version of the providers:

terraform providers lock -platform=linux_amd64

After committing and pushing this change to the lockfile, Atlantis is happy to use the cached version of the provider and runs without issues.

(I discovered this thanks to https://zenn.dev/shonansurvivors/scraps/7dd3ab1188c956 – I assume this is the same issue based on error messages and the step to fix it, even though I don't read Japanese 😄 )

tomharrisonjr commented 2 years ago

For what it's worth, I ran into the same issue, and it appears the root cause of the issue is that the terraform lock file was generated in OS X but Atlantis was running in linux_amd64?

Running the following line added extra checksums for the linux_amd64 version of the providers:

terraform providers lock -platform=linux_amd64

After committing and pushing this change to the lockfile, Atlantis is happy to use the cached version of the provider and runs without issues.

(I discovered this thanks to https://zenn.dev/shonansurvivors/scraps/7dd3ab1188c956 – I assume this is the same issue based on error messages and the step to fix it, even though I don't read Japanese 😄 )

Thanks @Pluies -- that was our issue. And it was the sole reason we were using custom workflows for all of our root modules ... and custom workflows don't work with the new streaming output in the Atlantis UI. So now, we can have our 🍰 and 😮‍💨 it too 😄

It's possible to generate the checksums for multiple architectures in a single go, such that lockfiles will work with old and new macs, intel and amd (Graviton) instances. I added a script terraform_lockfile.sh to our repo like this:

#!/usr/bin/env bash
#
# Generates .terraform.lock.hcl file having hashes for each architecture we run on
# https://www.terraform.io/cli/commands/providers/lock

terraform providers lock -platform=darwin_arm64 -platform=darwin_amd64 -platform=linux_amd64 -platform=linux_arm64
nitrocode commented 1 year ago

Sounds like the workaround is to either

Thanks for everyone investigating this and coming up with a solution that works.

It would be nice to create a new doc to mention how to commit this file properly.

cilindrox commented 1 year ago

Just chiming in - we're not vendoring/committing the lockfiles and we're still running into this.

Workaround is to delete the plugin cache dir or vendor/commit the lockfile with the platform atlantis is running on (+ any local envs etc)

vincentgna commented 1 year ago

is there a regression on this workaround for v0.25.0?

Ref:

I tried upgrading (listing all changes to highlight the issue seems related to v0.25.0): Atlantis. Terraform TF provider AWS
from v0.24.4 v1.5.4 ~> v4
to v0.25.0 v1.5.7 ~> v5
revert v0.24.4 v1.5.7 ~> v5

I am using this in my atlantis.env snippet

# Atlantis issues with TF 1.4+
# https://github.com/runatlantis/atlantis/issues/3201
TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE=true
# ...

Note: I run atlantis in a systemd unit on an EC2 instance, no container / no k8s configmaps or secrets and everything works in v0.24.4

I do consider finding out a way to make sure the terraform lock files are committed (we run across windows/linux/mac and amd64/arm64 machines so we're not comiting lock files yet, but if anyone has some type of pre-commit checks that help validate the lock file, I'll make sure the lock files are added to resolve this issue instead.

the only change log entries mentioning lock files for v0.25.0 release seems to be:

vincentgna commented 1 year ago

I was storing the plugin-cache on an EBS volume and while doing provider upgrades, there would be issues with the versions in there.

So perhaps there's no regression and I just had to rm -rf the plugin-cache and force a new copy running terraform init