microsoft / azure-pipelines-terraform

Azure Pipelines tasks for installing Terraform and running Terraform commands in a build or release pipeline.
MIT License
95 stars 60 forks source link

TerraformTask fails in multistage pipelines with Google provider 4.x.x #145

Open dmaltsiniotis opened 1 year ago

dmaltsiniotis commented 1 year ago

Hello,

I believe I have identified a behavior of this Terraform Task that prevents it from being used in multistage pipelines with a GCP/Google provider (from at least v4.0.0 onward).

Context What I'm trying to do is set up a multistage YAML pipeline where one stage runs a Terraform plan and the next stage runs a Terraform Apply using the plan generated from the previous stage. This is accomplished by using the Terraform Task and the -out planfile command parameters in the plan stage, then the publishing the plan file as an artifact. In the apply stage, this plan is downloaded, and an attempt is made to apply it.

Error Unfortunately, what I have found is that the apply fails with the following error:

╷ 
│ Error: the string provided in credentials is neither valid json nor a valid file path 
│ 
│ 
╵ 
##[error]Error: The process '/opt/hostedtoolcache/terraform/1.4.4/x64/terraform' failed with exit code 1 

Analysis This was very perplexing at first because this process works when all the tasks are in a single stage. Additionally, no change was made to the GCP service connection properties or name being passed into the Terraform Task.

After some time tracing the task source code and adding debug steps to the YAML pipeline, I have identified the root cause of the error, but am unsure as to how to proceed to fix the issue:

When the Plan stage runs (using a Microsoft-hosted ubuntu-latest image), the Terraform task will read the GCP for Terraform service connection and generate a JSON file in the workspace named credentials-<uuid>.json. Then, when the Terraform binary is invoked, an environment variable called "GOOGLE_CREDENTIALS" is set to the path of this JSON file. When the plan is generated, this credential parameters is saved/hard-coded into the plan file.

On the next stage, which is most likely running on a different ephemeral image/VM, the apply stage runs. Once again, a new credential file with a random UUID is generated, but this time the plan fails with the error that it can't find the credential path (the original path embedded in the plan file). This because as of v4.0.0 of the provider the order of precedence of where to look for credentials no longer favors the environment variable first, but the path in the plan file first.

This behavior seems to have been a change with the 4.x.x version of the Google provider:

From 4.0.0 onward, config takes precedence over environment variables

Now, we find ourselves in a situation where the apply step is looking for a credential file path (embedded in the plan) that does not exist, and get the error above.

Possible solutions

  1. Modify the logic in the GCP Terraform task section that generates the credentials JSON file to use a deterministic name, rather than appending a random uuid to the name. Perhaps something like: credentials-<serviceConnectionName>-<BuildID>.json instead of credentials-<new uuid>.json? This would have the effect of creating the same credential file path across stages and builds. I am unsure what, if any, security implications there would be in doing this.
  2. Check if using command line parameters to override the credential path instead of environment variables overcomes the order of precedence issue, and if so, modify the Terraform invocation step to use command line arguments instead.
  3. Rather than use a multistage pipeline with Environments controlling security, use a single stage pipeline with the ManualValidation task.

Thank you,

Demetri

mericstam commented 1 year ago

Hi, Sorry for late reply, swarmed with day-job tasks. I will investigate if we can fix this in some way.

thegooddalton commented 1 year ago

Bumping, i've also encountered this, lost a day trying to figure out what's going on. @dmaltsiniotis have you managed to work around this, if so how?

mericstam commented 8 months ago

Hi, started to look into this. I never worked with GCP before so it will take a little effort to get going.

raf-tdc commented 6 months ago

Same issue over here, any news on the investigation or resolution?

mericstam commented 6 months ago

I tried a bit but got stuck on general GCP knowhow. I have little time before xmas break. If anyone would like to contribute I will be happy to review and help out.

dmaltsiniotis commented 6 months ago

Hi @mericstam, I'll take a stab a PR for this. I think I have fairly good understanding of the fix. Thanks.

dmaltsiniotis commented 6 months ago

Bumping, i've also encountered this, lost a day trying to figure out what's going on. @dmaltsiniotis have you managed to work around this, if so how?

Sorry for the super late reply here. Yes, I did end up working around the issue, but the workaround is really not ideal:

The risk here of course is that if the state of the infrastructure changes between the stages, Terraform will NOT error out as it's expected to. Instead, the new plan will take over and happily apply the changes. In practice though, the likelihood of this happening is very low on cloud infrastructure that is managed by Terraform anyway.