spacelift-io / terraform-provider-spacelift

Terraform provider to interact with Spacelift
MIT License
74 stars 28 forks source link

Removing `terraform_version` from `terragrunt` block results in "No changes" #540

Closed lorengordon closed 4 weeks ago

lorengordon commented 5 months ago

Following https://github.com/spacelift-io/terraform-provider-spacelift/issues/524, I went ahead and removed the terraform_version from my stack configs:

   terragrunt {
-    terraform_version      = "1.5.6"
-    terragrunt_version     = ">= 0.48.6"
     use_smart_sanitization = true
     use_run_all            = true
   }

When I ran the plan, it claimed "No changes":

No changes. Your infrastructure matches the configuration.

I don't think that is correct. The goal is to remove the terraform_version from the stack config, so that (per the docs) the stack will use either the latest version, or the version set in .spacelift/config.yaml. If no changes are made to the stack config when I remove the argument, then the stack config will continue to be pinned to the version still set in its config.

Also note, this is using v1.12.0 of the spacelift provider...

Initializing provider plugins...
- Finding latest version of spacelift-io/spacelift...
- Installing spacelift-io/spacelift v1.12.0...
- Installed spacelift-io/spacelift v1.12.0 (self-signed, key ID E302FB5AA29D88F7)

So I tested it by promoting the run claiming "No changes" and then updated the spacelift config.yaml:

stack_defaults:
  terraform_version: "1.5.7"

And confirmed that the stacks are still using 1.5.6...

[...] Downloading Terraform 1.5.6...
[...] Terraform 1.5.6 download is GO (/bin/terraform)
adamconnelly commented 4 months ago

@lorengordon thanks a lot for the report.

When I ran the plan, it claimed "No changes"

This is the expected behaviour right now. We only set the versions during stack creation, and then don't update them after that point. If you want to always use the latest version, probably the best option is to use a SemVer version range instead. That way you can explicitly tell us to always use the latest version.

The goal is to remove the terraform_version from the stack config, so that (per the docs) the stack will use either the latest version, or the version set in .spacelift/config.yaml

Currently we don't support specifying the Terraform or Terragrunt versions for Terragrunt stacks via the config.yaml file. We're taking a look at implementing this functionality, and will post an update once it's available.

lorengordon commented 4 months ago

Thanks @adamconnelly. Comments inline:

We only set the versions during stack creation, and then don't update them after that point.

This doesn't seem correct? Can you clarify? When I edit the terraform_version in the stack config:

   terragrunt {
-    terraform_version      = "1.5.6"
+    terraform_version      = "1.5.7"
     terragrunt_version     = ">= 0.48.6"

I do get a planned change:

      ~ terragrunt {
          ~ terraform_version      = "1.5.6" -> "1.5.7"
            # (4 unchanged attributes hidden)
        }

Also, the spacelift docs indicate that when the Terraform version is not set, then latest is used. That implies a stack may set nothing at all for the Terraform version argument, and also that it should then be possible to remove the argument from the stack config, as I am trying to do, to restore the default behavior.

In order to determine the version of the Terraform binary to mount, Spacelift will first look at the runtime configuration. If it does not contain the version setting for the current stack, the stack setting is then considered. If there is no version set on the current stack, the newest supported Terraform version is used. https://docs.spacelift.io/vendors/terraform/version-management#terraform-versions-in-spacelift

This use case is important, due to the way the runtime config is evaluated:

Considering the precedence of settings, below is the order that will be followed, starting from the most important to the least important:

  1. The configuration for a specified stack defined in config.yml
  2. The stack configuration set in the Spacelift UI.
  3. The stack defaults defined config.yml https://docs.spacelift.io/concepts/configuration/runtime-configuration/#runtime-configuration

Once the terraform version is set in the stack config, that's it. The stack default from the runtime config gets ignored. So we must be able to remove the argument from the stack config, in order to use the runtime config defaults.

And regarding:

Currently we don't support specifying the Terraform or Terragrunt versions for Terragrunt stacks via the config.yaml file. We're taking a look at implementing this functionality, and will post an update once it's available.

I was actually quite confused that the terraform_version argument is duplicated in the terragrunt block. Why not just use the top-level argument? Then it would match against the config.yaml also, and my use case would just work. 😁 Terragrunt support is still in beta, so maybe just deprecate the argument out of the terragrunt block? Same for the new tool argument, also...

adamconnelly commented 4 months ago

@lorengordon

Also, the spacelift docs indicate that when the Terraform version is not set, then latest is used. That implies a stack may set nothing at all for the Terraform version argument, and also that it should then be possible to remove the argument from the stack config, as I am trying to do, to restore the default behavior.

Apologies - the wording in my previous reply was a bit sloppy. What I should have said is that we only default the version during stack creation, and after that point the stack has a specific version set. You can 100% change the version as you say, but if you remove the terraform_version property from your Terraform code we won't reset to the latest available version - instead we keep the current version you have set.

The part that would be tricky about changing this is that it could break backwards compatibility.

Also, the spacelift docs indicate that when the Terraform version is not set, then latest is used.

You're 100% correct here, but I actually think the docs are slightly misleading here. The current behaviour for Terraform stacks where no version is specified is that the version is not set until after the first successful run completes, at which point it gets pinned to that version. Future runs will continue to use that version even if a new version of Terraform is released.

Thanks for raising this - we'll discuss it internally and figure out what to do here. I suspect changing that behaviour at this point might be quite difficult, so we may just have to make the docs clearer and suggest the SemVer option if you always want to use the latest version.

This is also quite interesting because of the next point you raised:

So we must be able to remove the argument from the stack config, in order to use the runtime config defaults.

Is this behaviour that you currently use successfully? I'm just surprised it works given that previous point, so if that's the case we can double check what's going on.

I was actually quite confused that the terraform_version argument is duplicated in the terragrunt block. Why not just use the top-level argument?

This is just one of those situations where there's no great approach. For legacy reasons the Terraform vendor settings are top-level items on the spacelift_stack resource, but for all the other vendors we use blocks for the vendor-specific settings. We decided to be consistent with that approach for Terragrunt when adding Terragrunt support to the provider, rather than special casing it just because it's related to Terraform.

The other thing that we were slightly concerned about was that the config.yml file could be confusing in a monorepo situation if you had both Terraform and Terragrunt stacks, but wanted to use a different version of Terraform depending on whether it was a plain Terraform or Terragrunt stack.

lorengordon commented 4 months ago

So we must be able to remove the argument from the stack config, in order to use the runtime config defaults.

Is this behaviour that you currently use successfully? I'm just surprised it works given that previous point, so if that's the case we can double check what's going on.

Lol well no, I don't know if it works, but I was expecting it to, going off how the docs were written. And you're saying the docs are misleading in how it works, so... I haven't actually been able to test it, because of this issue and the prior one I opened...

I have a monorepo using terragrunt with dozens of Spacelift stacks linked to the one repo. We pin the terraform version in the terraform config, using the terraform block and required_version argument. That requires Spacelift to use the same version. If we set the version directly in the Spacelift stack config, then we have to update the stack config to the new version, deploy that so the stacks will use the new version, then separately update the terraform config, and deploy again. I find this really cumbersome, and confusing for anyone else since it creates a discrepancy between the stack version and the repo version. And also if someone "forgets" to follow through, then everything is broke until we get back to it.

Where if we have spacelift use the version from the runtime config defaults, we can update both the terraform config and the spacelift runtime config at the same time, and we don't have to list every stack in the runtime config. Everything works and everyone is happy.


Ok, I understand on the terraform version in the terragrunt block. I guess I'll just have to wait until you get support for the terragrunt block in the runtime config.

lorengordon commented 4 months ago

The part that would be tricky about changing this is that it could break backwards compatibility.

I would be content if the change in behavior were gated by a separate argument that I could set on the stack config, terraform_version_null_means_null or something.

Edit: Less cheeky, maybe, terraform_version_null_behavior with some kind of enum values... maybe KEEP_EXISTING and DEFAULT_USE_LATEST...

lorengordon commented 4 months ago

Oh, that's interesting, cross-posting for visibility, https://github.com/spacelift-io/terraform-provider-spacelift/pull/552

adamconnelly commented 4 months ago

@lorengordon we've discussed this internally, and we've got another proposal I'd like to run past you. It sounds like the main issue here is caused by the precedence of settings when using the config.yml file. Right now the precedence goes (from highest priority to least priority):

  1. config.yml stack-specific settings.
  2. Stack settings.
  3. config.yml stack defaults.

So this means that you can't use stack defaults to override the Terraform version for all stacks in a repo. The reason we haven't changed this in the past was to avoid breaking things for customers who may now be relying on the current precedence behaviour.

What we're thinking we could do is introduce a new config.yml version, and at that point we could flip the precedence to:

  1. config.yml stack-specific settings.
  2. config.yml stack defaults.
  3. Stack settings.

So in this case the settings in the config.yml file would "always win". All that you'd need to do is specify a different version in the config.yml file, so instead of:

version: "1"
stack_defaults:
  ...

You'd do:

version: "2"
stack_defaults:
  ...

And we're also planning on adding support for configuring the Terragrunt stack settings to the config.yml file.

Would that work for your scenario?

lorengordon commented 4 months ago

That would work for me also, certainly.

Precedence preference could be an odd thing. I can see someone considering a stack setting to be more specific than something labeled a "default", and so to their mind the stack setting should have higher precedence. Which would match the current behavior.

So, I don't entirely know if a config version would be the "most correct" approach, vs maybe somehow explicitly allow the user to choose/set the precedence order? Maybe as a "stack-only" setting, to avoid precedence-selection recursion 😅

adamconnelly commented 4 months ago

That's a completely valid concern. I think in this case the change makes sense, on the grounds that we already use the proposed precedence order for module test cases, so we'd actually be fixing a long running inconsistency between stacks and modules.

I think if it became clear that there were use-cases for both precedence orders, we could revisit and think about adding some kind of configuration option to adjust precedence.

lorengordon commented 4 months ago

Sounds like a plan!

adamconnelly commented 4 months ago

@lorengordon that's the v2 config format available now with the updated precedence rules. The docs have been updated here.

We've started working on configuring the Terragrunt settings via runtime config, and we'll let you know once it's available.

lorengordon commented 4 months ago

Ok, great, thanks, I'll be waiting for the terragrunt support so I can update our project and test the new config version!

truszkowski commented 4 months ago

@lorengordon We've added support for Terragrunt in config.yml (documentation).

For example:

stacks:
  my-stack:
    terragrunt:
      terragrunt_tool: OPEN_TOFU

Terragrunt and OpenTofu versions are not specified, so they default to the latest.

In settings, we have: Screenshot 2024-05-29 at 13 19 15

And we've used: Screenshot 2024-05-29 at 13 18 42

lorengordon commented 4 months ago

Thanks @truszkowski, it does appear to be working! Perhaps one suggestion for the doc...

What isn't clear from the current writeup, is what happens when a monorepo has both terraform and terragrunt stacks associated, and the merge interaction with stack_defaults. I was a bit concerned that if I specify:

stack_defaults:
  terragrunt:
    terraform_version: "1.5.7"
    terragrunt_version: ">= 0.48.6"
    use_smart_sanitization: true
    use_run_all: true

then my terraform stacks would suddenly become terragrunt stacks, since they'd all merge in the terragrunt config. Particularly the way the warning is written, about how it only merges the top-level key, not a deep-merge.

But it seems to be "doing the right thing," where it only merges the terragrunt block if the stack is already configured as a terragrunt stack. The terraform stack remains a terraform stack.

So with that in mind, for the workflow I want, I think I have to set the terragrunt block in the stack config, so the stacks get configured to use terragrunt in the first place:

  terragrunt {
    terragrunt_version     = ">= 0.48.6"
    use_smart_sanitization = true
    use_run_all            = true
  }

And then also duplicate that config into the config.yaml stack_defaults in order to set the terraform_version?

I guess the doc suggestion would be to somehow cover this interaction a little better between the stack config and the config.yaml. The terragrunt block will only merge if the stack is already configured as a terragrunt stack. And to create a terragrunt stack, I must set something the terragrunt block of the stack config, I can't just rely on the config.yaml entirely, and remove the terragrunt block from the stack config (because then it would become a terraform stack).

truszkowski commented 4 weeks ago

Doc suggestion is on the way, closing this issue.