hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.31k stars 9.49k forks source link

Terraform Cloud Agent to cache provider packages between runs #32851

Open LaurentLesle opened 1 year ago

LaurentLesle commented 1 year ago

Terraform Version

Terraform v1.4.0
on linux_amd64

But also tested with v1.3.6

TFE Agent version 1.7.0

Terraform Configuration Files

docker run -it -e TFC_AGENT_TOKEN -e TFC_AGENT_NAME -e TF_DATA_DIR='~/agent/.tfc-agent' hashicorp/tfc-agent:1.7.0
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
agent: Starting: name=caf_runner_macos version=1.7.0
core: Starting: version=1.7.0
core: Agent registered successfully with Terraform Cloud: agent.id=agent-iMKqP8GXY7fcemkM agent.pool.id=apool-mGhZCGVd3L3ZXBba
agent: Core version is up to date: version=1.7.0
core: Waiting for next job
core: Job received: job.type=plan job.id=run-4djmFTxHNrCZxwbL
terraform: Handling run: run.id=run-4djmFTxHNrCZxwbL run.operation=plan organization.name=aztfmod workspace.name=sandpit_level0_debug
terraform: Extracting Terraform from release archive
terraform: Terraform CLI details: version=1.4.0
terraform: Downloading Terraform configuration
terraform: Running terraform init
terraform: Running terraform plan
terraform: Generating and uploading plan JSON
 terraform: Finished handling run with errors: error="operation failed: failed generating plan JSON: failed running command (exit 1)" error_class=user
2023-03-15T09:19:32.690Z [INFO]  core: Waiting for next job

or run with trace logs

docker run -it -e TFC_AGENT_TOKEN -e TFC_AGENT_NAME -e TFC_AGENT_LOG_LEVEL=trace -e TF_DATA_DIR='~/agent/.tfc-agent' hashicorp/tfc-agent:1.7.0

working commands:

docker run -it -e TFC_AGENT_TOKEN -e TFC_AGENT_NAME -e TFC_AGENT_LOG_LEVEL=trace -e TF_DATA_DIR= hashicorp/tfc-agent:1.7.0

or

docker run -it -e TFC_AGENT_TOKEN -e TFC_AGENT_NAME -e TFC_AGENT_LOG_LEVEL=trace hashicorp/tfc-agent:1.7.0

Debug Output

Some extract from the log shows: Terraform init and plan working. The provider cache directory is populated successfully when using TF_DATA_DIR. But when the terraform show -json command executes, the tfe job fails on the hosted agent as terraform show is trying to access the cache folder '.terraform/providers' and not the one set in the TF_DATA_DIR

terraform: Generating and uploading plan JSON
2023-03-15T09:24:52.749Z [DEBUG] terraform: Running command: cmd="/home/tfc-agent/.tfc-agent/component/terraform/runs/run-z8pbc5jANcbrKQU3/bin/terraform show -json /home/tfc-agent/.tfc-agent/component/terraform/runs/run-z8pbc5jANcbrKQU3/config/terraform.tfplan"
2023-03-15T09:24:53.351Z [DEBUG] terraform: Closing output stream
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: [removed]
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: [removed]
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: [removed]
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: [removed]
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: Failed generating plan JSON
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: Exit code: 1
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: 
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: ╷
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: │ Error: registry.terraform.io/hashicorp/azurerm: there is no package for registry.terraform.io/hashicorp/azurerm 3.36.0 cached in .terraform/providers
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: │ 
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: │ 
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: ╵
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: 
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: Operation failed: failed generating plan JSON: failed running command (exit 1)

Expected Behavior

When the TF_DATA_DIR is set to the TFE Agent context it is expected terraform show -json plan will generate the plan in TFE jobs.

Actual Behavior

image

Failed generating plan JSON
Exit code: 1

╷
│ Error: registry.terraform.io/hashicorp/azurerm: there is no package for registry.terraform.io/hashicorp/azurerm 3.36.0 cached in .terraform/providers
│ 
│ 
╵

Operation failed: failed generating plan JSON: failed running command (exit 1)

Steps to Reproduce

1 - Register a self hosted agent to TFE and set the TF_DATA_DIR

docker run -it -e TFC_AGENT_TOKEN -e TFC_AGENT_NAME -e TFC_AGENT_LOG_LEVEL=trace -e TF_DATA_DIR='~/agent/.tfc-agent' hashicorp/tfc-agent:1.7.0

2 - Using the API flow, create a simple config with the azurerm provider and add a data.client_config in the output variable

3 - Trigger the job on the self-hosted runtime.

Additional Context

Not a permission issue within the container Cache got populated on terraform init when TF_DATA_DIR is set Terraform plan working as expected.

References

No response

apparentlymart commented 1 year ago

Hi @LaurentLesle! Thanks for reporting this.

I don't think overriding the data directory is intended as a supported configuration for running Terraform in the Terraform Cloud Agent; that environment variable is an implementation detail intended for those implementing systems like the agent, but as a user of the agent you are supposed to leave Terraform Cloud in control of this so that it can lay out the directories in the filesystem the way it expects.

However, I assume you had a particular goal in mind when you tried to set this, and if you can say more about what that is then I might either be able to suggest a different way to achieve it or pass the request on to the Terraform Cloud teams as a feature request if it isn't something the Terraform Cloud Agent currently supports.

Thanks!

LaurentLesle commented 1 year ago

Sure the goal is to be able to map an external data volume to store the provider data and avoid them to be reloaded at each run.

apparentlymart commented 1 year ago

Thanks for the extra detail, @LaurentLesle!

Reusing the entire data dir isn't really a suitable way to achieve that kind of caching because there are other values in the data directory that are not related to providers which could cause quirky behavior if you preserve them between runs.

However, I think a way to cache provider packages between runs on the same agent is a reasonable desire, and so I'm going to reframe this issue as a feature request for Terraform Cloud to support that for self-hosted agents, and I'll pass it on to the Terraform Cloud teams for consideration.

Thanks again!

LaurentLesle commented 1 year ago

Yes for a external cache env variable would help with agents deployed on kubernetes.

@apparentlymart I still want to raise the original problem as I think there is an issue with terraform show cli not picking the TF_DATA_DIR or the TFC_AGENT_DATA_DIR. Outside of the TFC agent terraform show is working fine when the TF_DATA_DIR is set. Difficult to me to check if this is a terraform cli issue or tfc agent issue not passing through the variable value.

TFC agent call terraform cli but is not picking up the TF_DATA_DIR or the TFC_AGENT_DATA_DIR (terraform plan does)

terraform: Running command: cmd="/home/tfc-agent/.tfc-agent/component/terraform/runs/run-z8pbc5jANcbrKQU3/bin/terraform show -json /home/tfc-agent/.tfc-agent/component/terraform/runs/run-z8pbc5jANcbrKQU3/config/terraform.tfplan"

..

terraform.cli: Failed generating plan JSON
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: Exit code: 1
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: 
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: ╷
2023-03-15T09:24:53.352Z [TRACE] terraform.cli: │ Error: registry.terraform.io/hashicorp/azurerm: there is no package for registry.terraform.io/hashicorp/azurerm 3.36.0 cached in .terraform/providers

Trying to pickup the provider from default location

there is no package for registry.terraform.io/hashicorp/azurerm 3.36.0 cached in .terraform/providers
apparentlymart commented 1 year ago

Hi @LaurentLesle,

I understand that you saw this not work but my original point is that it isn't clear that it's even supposed to work because when using a Terraform Cloud Agent it is supposed to be entirely in control of how Terraform CLI gets run, or else your extra settings can break its assumptions about how Terraform CLI ought to behave when running in this situation.

When configuring the agent you should consider it to be a closed box and only use the agent's own documented environment variables to configure it. The fact that it's directly running Terraform CLI with particular arguments and environment variables set is an implementation detail that is subject to change at any time.

I reclassified this as a feature request because you want the agent to do something it doesn't currently support. I would not recommend trying to trick it into doing what you want despite the lack of a feature because that seems likely to cause you to be broken in future if the details of how the agent runs Terraform Core change in some way that overrides your unsupported workaround.