Terraform CLI logs support

katronquillo commented 3 months ago

Description of your changes

Added optional spec.logConfig to the ProviderConfig resource to enable writing of Terraform Plan and Apply output to log file

spec.logConfig.enableLogging: Specifies whether logging is enabled (true) or disabled (false). When enabled, Terraform CLI command logs will be written to a file. Default is false.
- spec.logConfig.backupLogFilesCount: Specifies the number of archived log files to retain. When a new log file is created due to a change detected by terraform diff, the previous log file is archived and renamed with a timestamp. This parameter controls how many archived log files are kept before older ones are deleted. Default is 0

By default, Terraform CLI stores log files in the workspace directory. The default log file name is terraform.log. When a backup is taken (e.g., due to a new change detected by terraform diff), the current log file is renamed to terraform.log.<timestamp>, where <timestamp> is a placeholder for the actual timestamp of the backup.

Fixes #163

I have:

[x] Run make reviewable to ensure this PR is ready for review.
[x] Run make e2e with UPTEST_EXAMPLE_LIST="examples/workspace-random-generator.yaml"

How has this code been tested

On a local cluster created using make run, applied the following ProviderConfig:

apiVersion: tf.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: default
spec:
  configuration: |
      terraform {
        backend "kubernetes" {
          secret_suffix     = "providerconfig-aws-eu-west-1"
          namespace         = "upbound-system"
          in_cluster_config = true
        }
      }
  logConfig:
    backupLogFilesCount: 1
    enableLogging: True

To the same cluster, applied the following Workspace:

apiVersion: tf.upbound.io/v1beta1
kind: Workspace
metadata:
  name: example-random-generator
  annotations:
    meta.upbound.io/example-id: tf/v1beta1/workspace
    # The terraform workspace will be named 'random'. If you omit this
    # annotation it would be derived from metadata.name - e.g. 'example-random-generator.
    crossplane.io/external-name: random
spec:
  forProvider:
    source: Inline
    module: |
      resource "random_id" "example_id" {
        byte_length = 4
      }
      resource "random_password" "password" {
        length = 16
        special = true
      }
      // Non-sensitive Outputs are written to status.atProvider.outputs and to the connection secret.
      output "random_id_hex" {
        value       = random_id.example_id.hex
      }
      // Sensitive Outputs are only written to the connection secret
      output "random_password" {
        value = random_password.password
        sensitive = true
      }
      // Terraform has several other random resources, see the random provider for details

  writeConnectionSecretToRef:
    namespace: default
    name: terraform-workspace-example-random-generator

Observed that terraform.log file was populated with the expected Terraform Plan and Apply logs
Made changes to the Workspace YAML and re-applied
Observed that the terraform.log file was updated with the new Terraform Plan and Apply logs
Observed that no archived files were maintained, since spec.logConfig.backupLogFilesCount was set to 1
- When this was set to a number greater than 1, observed that previous log files had the Datetime appended and that a new terraform.log file was created with the most recent Terraform Plan and Apply logs
- Also observed that the oldest archived files were deleted to maintain the specified backupLogFilesCount

Upbound-CLA commented 3 months ago

All committers have signed the CLA.

ccrockatt commented 3 months ago

Hello @ytsarev @bobh66, I'm wondering if anyone would be able to share a timeline for the review of this PR? Thanks so much!

ytsarev commented 3 months ago

Sorry for the delay, the kubecon took some energy :) I plan to take care of the review this week

ytsarev commented 3 months ago

/test-examples="examples/workspace-inline-aws.yaml"

negz commented 3 months ago

@katronquillo Could you provide an example of what this log output looks like? Is it essentially just typical terraform apply output?

I'm coming into this conversation late (sorry!) but I'm pretty wary of writing logs to file paths inside the container, and especially of implementing our own log rotation logic. Ideally we'd outsource log processing to the many existing tools that handle it already (systemd etc).

I'm wondering if there's a more lightweight option here.

For example, I'm wondering if it would be possible to emit Terraform logs to the provider's container stdout. I'm imagining something like this:

{"ts": "...", "level": "info", "workspace": "some-ws", "source": "tf-cli", "msg": "<terraform log line goes here>"}

If we took this approach Terraform CLI logs would be interspersed with the providers other logs, but given we already support structured logging it would be possible for any log parsing backend to reconstruct the Terraform logs for a particular workspace.

bobh66 commented 3 months ago

If we took this approach Terraform CLI logs would be interspersed with the providers other logs, but given we already support structured logging it would be possible for any log parsing backend to reconstruct the Terraform logs for a particular workspace.

If we take this approach we may also want to be able to control the log output at the Workspace level to limit the amount of data being sent to the pod logs.

negz commented 3 months ago

If we take this approach we may also want to be able to control the log output at the Workspace level to limit the amount of data being sent to the pod logs.

Using an MR or ProviderConfig to control how a provider process logs to stdout does seem a little unusual to me. My inclination would be for it to be an all or nothing CLI flag.

What would be the consequences of lots of Terraform logs being written to stdout?

bobh66 commented 3 months ago

My inclination would be for it to be an all or nothing CLI flag.

Meaning it gets set by a DeploymentRuntimeConfig? The downside of that is it requires a pod restart to enable/disable, and for large numbers of Workspaces a pod restart can take a long time to recover since it has to run terraform init on every Workspace. This can be mitigated by using a PVC for /tf so the workspace directories persist across the reboot, but that's not the default configuration and I suspect most users don't do that.

What would be the consequences of lots of Terraform logs being written to stdout?

Mostly the sheer volume of data to sift through and the tendency of the pod logs to roll over just when you need them. There are ways to work around this but typical development debugging scenarios will just be using kubectl logs and not relying on external log collection systems.

katronquillo commented 2 months ago

Hi @negz @bobh66 @ytsarev! Thank you for reviewing our PR. We agree with your feedback/concerns and we'll go ahead and implement the changes you've proposed. To be specific, we'll create a new PR where we will...

Add a boolean flag to the Workspace spec to enable/disable logging
If the flag is True, we will write the Terraform CLI logs to container stdout as structured records, similar to what @negz suggested

suramasamy commented 2 months ago

hi @negz @ytsarev @bobh66 We have created this new PR based on the above discussions. Kindly review when you get a chance.

bobh66 commented 2 weeks ago

One other note on this implementation - for Remote Workspaces we delete the directory on every reconcile due to issues with go-getter so the logs for the previous reconcile will be removed and the current logs will only persist until the next reconciliation happens. I'm thinking that may be another reason to go with #258 instead

bobh66 commented 2 days ago

One other note on this implementation - for Remote Workspaces we delete the directory on every reconcile due to issues with go-getter so the logs for the previous reconcile will be removed and the current logs will only persist until the next reconciliation happens. I'm thinking that may be another reason to go with #258 instead

This is no longer true after #276

upbound / provider-terraform

Terraform CLI logs support #248

Description of your changes

How has this code been tested