try() function errors on nonexistent resource

hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

https://www.terraform.io/

Other

42.6k stars 9.54k forks source link

try() function errors on nonexistent resource #24402

Open danieladams456 opened 4 years ago

danieladams456 commented 4 years ago

Terraform Version

Terraform v0.12.23
+ provider.aws v2.53.0

Terraform Configuration Files

provider "aws" {}

output "nonexistent_file" {
  value = try(file("nonexistent"), "no file")
}

output "nonexistent_role" {
  value = try(aws_iam_role.nonexistent.arn, "no role")
}

Debug Output

https://gist.github.com/danieladams456/3037dd17100be21f816806450aba6ef8

Expected Behavior

The try function should catch the error and return no role to the output.

Actual Behavior

Error: Reference to undeclared resource

  on main.tf line 8, in output "nonexistent_role":
   8:   value = try(aws_iam_role.nonexistent.arn, "no role")

A managed resource "aws_iam_role" "nonexistent" has not been declared in the
root module.

Steps to Reproduce

terraform init
terraform apply

Additional Context

My use case is for a multi-module terraform project that is generated off a standard template. One resource in one of the modules is sometimes not needed. I would like to be able to gracefully detect if that resource is not present and return null in the output. Other modules that consume that output would implement null handling, but I don't want to have to edit my main.tf to stop passing around that variable if the resource isn't there. This also lets me just delete the hcl file with that single resource in it vs having to set a flag variable and have both the resource and output be conditional on that.

References

none I could find

DavidGamba commented 2 years ago

It would be great if you could extend the data resources to have a flag for when they fail but without failing the entire plan so you could handle that as a conditional.

sami12rom commented 1 year ago

any news on this?

soniadas0210 commented 1 year ago

any update on this requirement ?

scalp42 commented 1 year ago

Another perfect example for this is the chicken/egg issue with Route53 private zones and associations to VPCs, where you have different states (say at VPC level and a global level like Route53).

elduds commented 1 year ago

My god I can't believe it took me so long googling to find this.

What is try() for except conditionally catching and handling errors!?

Use case is conditionally adding a workspace identifier tag for resources provisioned from a TFC workspace, typically added to AWS provider default_tags {}. Feels like a pretty obvious requirement:

If a tfe_workspace datasource is passed to the module, resolve the value of the of technical:terraform:workspace_url tag to be the workspace URL from that data source.
Otherwise, such as a local execution, ignore the tag.

locals {
  tfe_workspace_html_uri = try(
    data.tfe_workspace.current.html_url, null)

  tf_context_tags = {
    "technical:terraform:workspace_url": local.tfe_workspace_html_uri
  }
}

provider "aws" {
  region = var.region
  default_tags {
    tags = merge(var.resourcetags, local.tf_context_tags)
  }  
}

algo7 commented 1 year ago

Another perfect example for this is the chicken/egg issue with Route53 private zones and associations to VPCs, where you have different states (say at VPC level and a global level like Route53).

I am currently facing exactly the same condition as you mentioned

andrewmackett commented 11 months ago

Has anybody found a workaround for this issue?

rpgd60 commented 11 months ago

Like others, I need it for data_sources - fail graciously if the underlying query returns no values.

Pasqual24 commented 9 months ago

Same issue on Azure. How do you handle a case where an Azure resource is deleted and Terraform doesn't "see" it during the Terraform Plan ?

The Plan generated seems good, Terraform will deploy a child resource... but there's no validation to check whether the parent ressource really exists or not. The Plan looks good, but Apply fails because the parent resource has been deleted outside of Terraform.

I'd love to use a data source query to explicitly check if the parent resource exists, since Terraform can't handle it, but then an error is generated and the whole deployment stops :-/

Nyque commented 7 months ago

The workaround so far (not a very good one though) is to use AWS CLI to get the necessary info.

data "external" "example" {
  # Call AWS CLI in shell script and return a boolean in JSON string format
  program = ["bash", "${path.module}/example.sh"]
}

output "nonexistent_role" {
  value = data.external.example.result ? aws_iam_role.nonexistent.arn : null
}

omry-arpaly commented 5 months ago

Adding my support for looking into this issue - try() should be able to handle "Error: Reference to undeclared resource" errors: it's one of its major use cases. The fact that the documentation provides examples that just happens to involve accessing potentially missing parts of an existing structure (e.g., array member) just increases the confusion, and never highlights that limitation (=bug).

MrTrustworthy commented 5 months ago

➕ 1 on this feature request

Not sure if this would work for all of the use cases mentioned above, but there's a proposal for a (hopefully) simple and backwards compatible approach that might work for all(?) use cases:

Add an attribute to data blocks that behaves like nullable = optional(bool, false)
If set to false (the default), it behaves as it currently does
If set to true, a failure of the data block to look up the corresponding resource will lead to the data being set to null, and not cause an failure & abort.
All attempts to read from a nullable = true data block can either use simple null-checks or try(data.x.myattribute), depending on what they want/need to do, and it will just work as expected.

It's similar to how nullable works within modules/variables, so it's just an extension of this concept to data blocks and wouldn't be a completely new & unexpected mechanic.

It also doesn't rely on try() retroactively being able to catch hard errors during the data evaluation, so at least this sounds easier to implement.

apparentlymart commented 4 months ago

Hi all,

The try function is specifically for catching dynamic errors, by which I mean errors that occur based on invalid types or values rather than on references to undeclared objects. This is similar to how in many general-purpose languages the exception handling mechanism cannot "catch" statically-invalid code such as a reference to a variable that wasn't declared, or a syntax error.

While I can see that code generation does blur the line between "static" and "dynamic", in most cases there's no reason to dynamically check whether a resource is declared because that decision cannot be made dynamically based on runtime data. The try function design prioritizes still returning an error for situations where something cannot possibly ever be valid for the current configuration, because that reduces the chance of someone making a mistake where try would always fail but then not notice that. Changing that would give worse feedback in the common case in support of a relatively-rare situation.

It's also not really feasible to catch static errors with a function, because a function call is itself a dynamic operation. A statically invalid expression causes a validation error long before Terraform even begins expression evaluation. If there is something to be solved here then we'll need to solve it in a different way.

If you are using code generation to decide whether or not a particular resource block is generated, a possible solution is to also make the code generator produce a local value whose definition varies depending on whether the resource block was generated.

For example, if the code generator decides to generate the resource then it could also generate a local value that refers to it:

resource "aws_iam_role" "example" {
  # ...
}

locals {
  example_iam_role_arn = aws_iam_role.example.arn
}

...but if the code generator decides not to generate the resource then it would still generate a declaration for the same local value name but set it to null instead:

locals {
  example_iam_role_arn = null
}

Then other code in the module can refer to local.example_iam_role_arn regardless of the code generation decision, as long as it's able to deal with the value possibly being null:

output "example_role_arn" {
  value = local.example_iam_role_arn
}

If you're already doing code generation anyway then I would expect the code generator to use techniques like this to deal with its differences as a code generation concern, rather than using a weird mix of code generation and dynamic decision making together.

The other examples given in subsequent comments don't seem to be about code-generation, but I don't really understand what they are about. If you shared a non-code-generation-related use-case in the comments above I'd appreciate if you could share a fuller example of what you are doing so that I can understand more.

I suspect that this issue has come to conflate multiple different use-cases just because they led you all to the same error message, and so I'd like to understand those use-cases so that the team can think about potential solutions.

MrTrustworthy commented 2 months ago

If you shared a non-code-generation-related use-case in the comments above I'd appreciate if you could share a fuller example of what you are doing so that I can understand more.

I can give one example:

Let's say I have 2 resources. For example, a VM and a Service Account. To create the VM, I need to configure it to use the service account - in short, I need a reference to the Service Account.
Now, I need to create this VM + Service Account pair for each of my applications (or users, or ...). So, I have a list of applications in my locals, and I do a for_each over the list/set of applications to generate those 2 resources for each.
But I don't actually manage the Service Accounts in Terraform - or, at least not in this TF workspace/repo. Let's assume this is generated/managed somewhere centrally for the whole organisation. So, to get the reference of the Service account, I can't use a resource, I have to use a data block to import it. I can easily do that based on the application name, so that's easy as well. So far, so good.
Now the issue: There might not always be a Service Account for each application. My app bananas might simply not have a Service Account, for some organisational reasons. In cases like those, I want to have a slightly different behaviour - for simplicity, let's say I have one central, shared "catch-all" Service Account that I use for all VMs that don't have a dedicated one.

That example is simplified, but I hope it gives you a situation that's not code-generation related. Ultimately, it's about being able to use a data block (generally in dynamic situations that combine it with for_each) when you can't guarantee that it (it = at least one element of the list) is always present.

Again, the issue here is not that the entire "block of text" like data.myresource.myinstance is not defined at all, but that the lookup to that entity at data.myresource.myinstance["bananas"] returns a "non existent" status. I'd like my TF code to be able to deal with this situation, instead of forcing it to be a complete abort. Ultimately, I don't even need the try function to catch it - a data block that's nullable would work already.

sworisbreathing commented 11 hours ago

@apparentlymart my situation is pretty similar to what @MrTrustworthy described, especially with respect to combining the data block with for_each and also to "thing a central team must manage".

To add to that, there are other situations where the tech you're interfacing with has "virtual" resources. For example, in AWS LakeFormation you can assign permissions to various principals such as IAM roles, IAM users, etc, but also there's this magic principal called IAM_ALLOWED_PRINCIPALS which doesn't really exist. With the current terraform behavior we have to engineer special logic to handle this edge case when managing LF permissions at scale.

There are other cases where third-party tech insists on managing some of the resource lifecycle on its own, meaning there will be inevitable conflicts if you try to import the resources into terraform. You might need to still manage other aspects of it though, so you might do something like:

resource "myresource" "foo" {
  # ...
}

data "myresource" "bar" {
  for_each   = # ...
  must_exist = false # or nullable = true
}

resource "myresource" "foo_bar_association" {
  for_each = data.myresource.bar
  foo_id   = myresource.foo.id
  bar_id   = each.value.id
}

I think having a nullable (default to false) or must_exist (default to true) attribute on data sources would be a reasonable feature addition. In either case, terraform would default to the current behavior (which is to throw an error when the lookup fails) but with the option turned on, it'd just add an appropriate warning message instead. In the above example, as soon as terraform picks up the fact that data.myresource.bar["x"] had ceased to exist, the plan would correctly attempt to destroy myresource.foo_bar_association["x"] rather than bombing out with an error

apparentlymart commented 11 hours ago

Hi @MrTrustworthy and @sworisbreathing,

I appreciate you taking the time to answer my question.

Unfortunately, I don't work on the Terraform team at HashiCorp anymore, so I can't personally do anything to act on your responses, but I did want to note that both of you seem to be discussing the use-case of https://github.com/hashicorp/terraform/issues/16380 rather than the use-case that this issue was about, and so maybe the discussion over there will give you some ideas about different ways to solve your problems.

The following is just a personal response and not a statement on behalf of the Terraform team, but for what it's worth...

The main challenge with using the existence of something to decide whether to declare something else is that it's an inherent contradiction. You can see this for yourself using the following configuration that uses an existing data source that is already capable of returning an empty result:

data "aws_vpcs" "maybe" {
  tags = {
    Name = "exists"
  }
}

resource "aws_vpc" "exists" {
  count = length(data.aws_vpcs.maybe.ids) == 0 ? 1 : 0

  cidr_block = "10.111.0.0/16"
  tags = {
    Name = "exists"
  }
}

If you plan and apply this when no VPC exists, the first plan/apply round will indeed detect that data.aws_vpcs.maybe.ids is empty and so propose to create aws_vpc.exists.

But then if you run another plan/apply round it will then find out that the VPC exists, and so data.aws_vpcs.maybe.ids won't be empty anymore, and so Terraform will propose to destroy aws_vpc.exists.

And then if you run again, it'll propose to create again, and so on. This configuration can never converge because it contradicts itself... it says "this aws_vpc should exist if it doesn't exist", which is an impossible state to reach.

Therefore if there is to be a solution for your use-cases, it's gotta be something other than a data source that returns an empty result, or a decision made based on the failure of a data source. However, I can't say if the Terraform team is open to discussing anything like that since the existing issue for this suggestion was already closed. :man_shrugging: