Allow "count" for non-null check on resource attributes that are always present

hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

https://www.terraform.io

Other

42.76k stars 9.56k forks source link

Allow "count" for non-null check on resource attributes that are always present #26755

Closed sgrimm closed 3 months ago

sgrimm commented 4 years ago

Current Terraform Version

Terraform v0.13.4
+ provider registry.terraform.io/hashicorp/aws v3.10.0
+ provider registry.terraform.io/hashicorp/local v1.4.0
+ provider registry.terraform.io/hashicorp/template v2.2.0

Use-cases

I want a module to conditionally create a particular AWS Route53 entry if a zone ID is passed in. I'm using count and checking whether the variable is null. I'd like Terraform to be able to create everything, including the zone, in one plan/apply so a CI system can do the actual plan application.

variable "public_zone_id" {
  type    = string
  default = null
}

resource "aws_route53_record" "public_cname" {
  count = var.public_zone_id != null ? 1 : 0

  zone_id = var.public_zone_id
  name    = "${var.hostname}"
  type    = "CNAME"
  ttl     = 1800
  records = [aws_instance.this.public_dns]
}

And then in the calling project, I pass in the zone ID, like so:

module "foo" {
  source = "my_module"

  hostname       = "example"
  public_zone_id = module.vpc.public_zone_id
}

But because the count depends on a value from a resource that isn't created yet, terraform plan gives me

Error: Invalid count argument

  on ../modules/ec2/main.tf line 40, in resource "aws_route53_record" "public_cname":
  40:   count = var.public_zone_id != null ? 1 : 0

The "count" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the count depends on.

Attempted Solutions

In the calling project, I have tried various tricks to make it impossible for the variable to ever be null regardless of whether or not the original resource exists, all with the same result as above.

public_zone_id = coalesce(module.vpc.public_zone_id, "dummy")

public_zone_id = trimsuffix("${module.vpc.public_zone_id}!", "!")

public_zone_id = module.vpc.public_zone_id != null ? module.vpc.public_zone_id : "dummy"

Proposal

Although the specific value of the zone ID is unknown here, it seems like Terraform ought to be able to tell that there will definitely be some value that will definitely not be null, and thus that the count should be 1, even if the actual value can't be filled in until later. There are a lot of attributes on a lot of resources that are required to be present if the resource creation succeeds, and since Terraform defines null as the absence of a value, checking for != null should only require the presence of a value.

So the proposal is that in the specific case of count = some_resource.some_attribute != null ? X : Y where the resource will be created in the same Terraform invocation, Terraform evaluate the expression based on whether or not the attribute is guaranteed to be present, rather than based on the actual attribute value. Obviously the "guaranteed" part is important; for optional attributes the current behavior would be unavoidable and expected. But for attributes that are always exported and can never possibly be null, it should be possible to evaluate that expression statically at plan time.

References

26078 overlaps with this but isn't quite the same; what I'm asking for here is more narrowly focused than what people were asking for in the discussion there.

apparentlymart commented 4 years ago

Hi @sgrimm! Thanks for sending in this enhancement request.

I think the root problem here is that the Terraform schema model doesn't have any explicit sense of an attribute that is guaranteed to be non-null, and so Terraform conservatively assumes that any attribute could potentially end up being null. To fix this would require giving providers some way to either declare that a particular attribute is never null or to say dynamically in the plan "unknown value but definitely not null". The first seems significantly easier to implement than the second, but neither is trivial because they both involve changes to the provider wire protocol, and thus we must coordinate with the provider SDK team and the provider teams to make the change.

In the meantime, I think you could make this work by having the calling module handle the decision about whether it's null. You didn't show how module.vpc decides whether or not to return a zone id, but let's say for example that it's an enable_public_dns_zone variable on your root module, something like this:

variable "enable_public_dns_zone" {
  type = bool
}

module "vpc" {
  source = "./modules/vpc"

  enabled = var.enable_public_dns_zone
}

module "foo" {
  source = "./modules/foo"

  hostname       = "example"
  public_zone_id = var.enable_public_dns_zone ? module.vpc.public_zone_id : null
}

The key here is that Terraform does know the value for var.enable_public_dns_zone, so we can rely on that to take the module.vpc.public_zone_id value out of the decision entirely when var.enable_public_dns_zone is false, and thus it will be known in the case where it's null.

Another potential answer is to design your modules to deal in lists rather than single values, and then you can use an empty list to represent it not being active and a single-element list to represent it being active. This is consistent with Terraform's usual model of disabling things by having zero of them, and is advantageous because it separates the information about whether it's enabled (the length of the list) from the final value (the value of the first element of the list, which might be unknown). Terraform's language is primarily oriented around repetition of objects based on lists, and so modules designed around that principle tend to gel better with other language features.

lievertz commented 3 years ago

It appears that count depending on variable is fine, but if count depends on a variable through a transformation function like templatefile it is no longer fine and will likewise hit this issue. Perhaps I am wrong, but I think templatefile will always give a non-null result (or raise an error)? I wanted to bring up this situation because there is no provider involved.

My specific situation I hit this with is I have a module that may create an IAM policy if one is provided. I could use a var.json_doc AND a var.create_policy, but the boolean is implied by the presence of the doc, so the interface is (arguably) better without the explicit var.create_policy. Instead I just have count = var.json_iam_doc == null ? 0 : 1. This has been working, but I recently called this module from another module which provides the json_iam_doc variable as the result of a templatefile which itself accepts a string var -- that's where I hit this issue.

I confirmed this behavior on terraform v0.14.3 and v0.14.4.

To summarize -- it seems like the enhancement here also applies to function results which could arguably be known to be non-null. Since provider dependency was raised as an implementation barrier, it seemed relevant to raise that there are a number of relevant conditions that do not rely on the provider, but are functions within terraform itself.

I hope this is useful input -- thanks for all of your work!

Cajga commented 3 years ago

I run into the same. I use terraform 0.14.7. The simplest workaround that I found for the "using freshly created resource ID in count" is the following. (@apparentlymart may I ask you to verify it, just to make sure :))

Put his into the foo module:

locals {
  # note: putting [*] behind a single value will create a list (if it is null then it will be an empty list)
  some_id_list = var.some_id[*]
}

resource "something_which_depends_on_the_fresh_input_id" "this" {
  count = length(local.some_id_list)
.
.
}

variable "some_id" {
  type        = string
  default     = null
}

Then you call the module with the fresh id:

resource "aws_eip" "test" {
}

module "foo" {
  source = "../../"

  some_id = aws_eip.test.id
}

This will run with a single terraform apply.

Nuru commented 2 years ago

@Cajga I tried your solution, and it generally appears to work, but it is dangerous. When var.unknown is a string that is unknown at plan time, such as aws_vpc.vpc.id:

  count = var.unknown == null ? 0 : 1 # fails with  "count" value depends on resource attributes that cannot be determined until apply
  count =  length(var.unknown[*]) > 0 ? 1 : 0 # succeeds

However, it appears to me this works due to a bug or bad trade-off in Terraform. It look to me like when var.unknown is unknown, length(var.unknown[*]) always evaluates to 1. This causes weird behavior when var.unknown ultimately evaluates to null, such as requiring terraform apply to be run twice in order to apply all the changes. See #29973 for more details.

Cajga commented 2 years ago

@Nuru, Thanks for checking it and opening a related bug report. I must say that I stopped using this as I also had the feeling that this behavior is not correct/intended. I am curious about the outcome for this.

Cajga commented 2 years ago

Just to have it here as well: terraform 1.1.0 update breaks my "recommended" solution: https://github.com/hashicorp/terraform/issues/29973#issuecomment-983835758

RyanS-C commented 2 years ago

I have this same issue on:

PS > terraform.exe --version
Terraform v1.1.5
on windows_amd64
+ provider registry.terraform.io/betr-io/mssql v0.2.4
+ provider registry.terraform.io/cloudflare/cloudflare v3.5.0
+ provider registry.terraform.io/hashicorp/azurerm v2.71.0
+ provider registry.terraform.io/hashicorp/http v2.1.0
+ provider registry.terraform.io/hashicorp/random v3.1.0

Your version of Terraform is out of date! The latest version
is 1.1.6. You can update by downloading from https://www.terraform.io/downloads.html

Spent ages trying to resolve and I cant. Having this functionality would be a huge help!

apparentlymart commented 2 years ago

I previously hinted at an approach of using a complex type (I said specifically a list at the time, but any collection or structural type will work) to allow separating the null-ness of the collection itself from the unknown-ness of the value inside it. It looks like the subsequent discussion was trying to build on that but I wasn't clear enough about what I meant.

I want to follow up with a practical example, though coming to this with fresh eyes I think using an object type as the container would make for a more intuitive API for the module caller, rather than a list of zero or one items. This is based on the example use-case shown in the original comment for this issue:

variable "dns_record" {
  type = object({
    hostname        = string
    route53_zone_id = string
  })
  default = null
}

resource "aws_route53_record" "public_cname" {
  count = var.dns_record != null ? 1 : 0

  zone_id = var.dns_record.route53_zone_id
  name    = var.dns_record.hostname
  type    = "CNAME"
  ttl     = 1800
  records = [aws_instance.this.public_dns]
}

In this example, the null-ness of the whole object is what decides whether to declare a route53 record. That null-ness is independent of the known-ness of var.dns_record.route53_zone_id, and so this will work regardless of whether the zone ID is known during planning.

From the caller's perspective, it would look like this in the case where the caller wants to include a DNS record:

module "foo" {
  source = "my_module"

  dns_record = {
    hostname        = "example"
    route53_zone_id = module.vpc.public_zone_id
  }
}

This echoes a common design decision in providers where a resource type will have an attribute or block type whose mere presence means to activate a feature, but whose contents provide the configuration once enabled. Because the hostname and route53_zone_id attributes of that object type are both required, it would be an error to declare dns_record without setting both of them, but the caller can omit the argument entirely to disable that module feature and not provide any of its required values.

This also echoes the usual design pattern for for_each when unknown values are involved, where you construct a map which has keys statically defined in the configuration, but can have unknown values because those don't contribute in any way to the number of instances or the instance identifiers.

While I do understand that this isn't exactly what was requested -- solving that still requires finding some way to statically prove that an unknown value can't possibly be null -- the above is an approach that can work with the Terraform language as it exists today, and subjectively I think it's a reasonably intuitive design from the perspective of the module caller, where each component of the declaration serves a particular distinct purpose and nothing is overloaded.

Nuru commented 2 years ago

@apparentlymart wrote above:

In this example, the null-ness of the whole object is what decides whether to declare a route53 record. That null-ness is independent of the known-ness of var.dns_record.route53_zone_id, and so this will work regardless of whether the zone ID is known during planning.

The problem with your example, as in the original, is when the caller passes

dns_record = module.other.output

there is no guarantee that module.other.output is non-null. This is closer to the the use cases I am usually solving for. I have adopted your suggestion of taking a list, which helps when the author can assert that the output is always non-null:

dns_record = [module.other.output]

but of course still fails when null is a possibility:

dns_record = module.other.output == null ? [] : [module.other.output]
# Error: Invalid count argument

Originally I thought Terraform could just get away with assuming any unknown but provided input would be non-null, but then we saw how that turned out: #29973.

Of course, your suggestion would be an improvement over the current situation, where if any member of an object has an unknown value, the entire object is tainted as having an unknown value and object == null is treated as unknown at plan time even though it is definitely known to be non-null.

gygitlab commented 2 years ago

I've been struggling with this a lot recently and it's certainly quite frustrating when trying to handle dynamic setups with clean code. To get around this my code has had to be more complicated and hard to maintain.

I appreciate there's complexity here in trying to determine nulls but to me having a Terraform function that checks merely for existence of something and not checking it's contents could help massively here with counts. If we had something like try for example that only tries the call and not the contents (which I thought it was doing until recently). A [safe navigation](Safe Navigation) function in otherwords.

For example, say if we had an exists function that checks only if a call to a resource value is valid:

  count = exists(aws_iam_policy.policy.arn) ? 1 : 0

exists in this context merely checks that the call to aws_iam_policy.policy.arn is valid and will return something at runtime. It would fail if aws_iam_policy.policy isn't declared anywhere as we know this during the initial phase. It could return null later but that's ok and up to the user to handle.

apparentlymart commented 2 years ago

Unless I'm misunderstanding, I think this hypothetical function exists runs into the same underlying problem I mentioned before: if a provider tells Terraform Core that something will be "known after apply" during planning, the provider is allowed to finally set that value to null during the apply step, if the situation it was trying to represent is that it didn't know if something would exists or not. That can arise in situations like IP addresses assigned to networking objects, where e.g. an object may or may not have a public IP address depending on whether it was placed in a network with Internet access.

In order to meaningfully answer the question "will this be set after apply?", providers would need to give Terraform more information and Terraform would have to preserve that information through derived expressions so that what we currently call "known after apply" can be split into two states: "known after apply but definitely set" and "known after apply but possibly null".

I must admit I'm not sure I'm following the situation in the objection to my recent example. The point of that example was that the decision about whether to include the object would be made statically by whoever is writing that module block; the variable as a whole would not itself be derived from some other dynamic value, but rather it would be written out statically and only the attributes inside would be dynamic. You're right that if you conditionally build that object based on unknown values then you're back to square 1; my suggestion was to not do that, and instead to design your module interactions so that the object value is always known during planning (but possibly null) and only the contents be potentially unknown.

Nuru commented 2 years ago

@apparentlymart wrote:

I must admit I'm not sure I'm following the situation in the objection to my recent example.

If you are referring to my objection, the situation is that your solution covers only a tiny fraction of the situations I have encountered in practice (or so I thought).

Using the same example module (click to reveal)

```hcl variable "dns_record" { type = object({ hostname = string route53_zone_id = string }) default = null } resource "aws_route53_record" "public_cname" { count = var.dns_record != null ? 1 : 0 zone_id = var.dns_record.route53_zone_id name = var.dns_record.hostname type = "CNAME" ttl = 1800 records = [aws_instance.this.public_dns] } ```

this invocation fails (or so I thought):

locals {
  dns_record = {
    hostname = "example"
    route53_zone_id = aws_route53_zone.example.id
  }
}

module "example" {
  source = "example/source"
  dns_record = local.dns_record
}

When I tried something like this in the past (though now I cannot find a simple way to reproduce it), it failed with Invalid count argument because aws_route53_zone.example.id would be unknown at plan time, which tainted local.dns_record as unknown, which generated the count error. That is why I said "your suggestion would be an improvement over the current situation". I guess your suggestion works today, and I was wrong about that.

My objection to your solution was not that it does not work, but that it is a rare case in my experience if the object tainting works as I thought it did. Most often we want to trigger a count off a single value. For example, we want to create a security group if the user does not supply one, so we have (using your list solution):

variable "target_security_group_id" {
  type        = list(string)
  default     = []
}

resource "aws_security_group" "default" {
  count = length(var.target_security_group_id) > 0 ? 1 : 0
  ...
}

Most naturally we would have target_security_group_id be a string instead of a list, and check to see if it were null or empty. That did not work, so I have been using the lists at your suggestion and it works, although it is a bit confusing, and implies that you could supply more than 1 item.

It does feel a bit more natural to do it the way you now suggest, but as I said before, I thought it would not work due to object tainting:

variable "target_security_group_id" {
  type        = object({
    id: string
  })
  default     = null
}

resource "aws_security_group" "default" {
  count = var.target_security_group_id == null ? 0 : 1
  ...
}

apparentlymart commented 2 years ago

I would expect the example you shared at the top of your comment to work, FWIW. The known-ness of an attribute value is independent of the known-ness of the object value it's contained within.

What you might've tried before is putting an unknown value in the key or attribute position, rather than in the value position. In that case it would indeed make the entire container unknown, because keys/attribute values need to be unique per the definition of a map, and so a map with individual unknown keys can't work. Terraform treats that as an unknown map (or, for an object type, an unknown value of unknown type).

johnjelinek commented 1 year ago

@apparentlymart regarding your example, how would you then make sure that resources don't get created when the string values of your object are empty?

I think the intent of the OP is not just to bypass the error, but also to get the desired result of making sure resources only get provisioned when all the properties of the object are not "". If you changed this line:

count = var.dns_record != null ? 1 : 0 to count = var.dns_record != null && var.dns_record.hostname != "" ? 1 : 0

you would get the error in the OP, yes?

apparentlymart commented 1 year ago

The point of my suggestion is that only the nullness of the containing object is used to make the decision, and then the containing object is always known to be either null or not.

I expect that if the object isn't null then the attributes inside must always have valid values and are not used as part of the decision.

johnjelinek commented 1 year ago

That is fine for non-null, but how should decisions be made based on content?

apparentlymart commented 1 year ago

If the content is something that will be decided only during the apply phase then the answer is that you just shouldn't do that. The point of this alternative approach is to avoid the need to make decisions based on values that won't be known until the apply step, by wrapping in a value that will be known during the apply step.

haytham-salhi commented 1 year ago

Hi @apparentlymart and guys, is there a way to check if a variable is unknown but not null to avoid the error: The "count" value depends on resource attributes that cannot be determined until apply,...?

theherk commented 1 year ago

Yes @haytham-salhi. @apparentlymart shows an example in this comment above. Use a complex type and count on it being null or not. Then you can access the resource attribute contained within, without the count depending on its value.

haytham-salhi commented 1 year ago

Thanks @theherk. Yes I noticed that solution. The issue with making a wrapper object is you need to modify the variable definition which goes against OCP principal in coding. Moreover, I do believe it is a good idea to provide a built-in function to check if this variable is unknown but not null at the plan time. Another workaround (that I went with) could be by introducing a new boolean flag indicating that variable is set, and then using the boolean flag in the count argument.

theherk commented 1 year ago

There are other workarounds. You can for over a list with enumeration, this keying an index rather than the value. I'm not disagreeing at all, just throwing out some methods.

This is discussed a bit on this community discussion where @apparentlymart does a great job explaining why the solution isn't straight forward. In it I actually say:

It is, in my view, the most frustrating shortcoming in terraform after years of use.

david-wb commented 1 year ago

I encountered the exact same issue in terraform v1.3.2. Changing the variable from a string to an object stops the Invalid count argument error. This bug is very confusing.

In case it's helpful, here is my solution:

Change the input variable to an object:

variable "x" { // Results in "invalid count argument" error
  type = string
  default = null
}

variable "x" { // Works
  type = object({
    val = string,
  })
  default = null
}

Then the count based on a local variable will work:

locals {
  foo = var.x != null
}

resource "foo" "bar" {
   count = local.foo ? 0 : 1
   ...
}

apparentlymart commented 1 year ago

In the forthcoming Terraform 1.6 there is a new concept at the language level which I hope we will use to improve this in future versions of Terraform.

This new concept is the ability to track certain extra details about unknown values that constrain what final values the unknown could possibly be a placeholder for. And for this issue in particular one of these extra details is particularly useful: the language can track if it knows that a particular unknown value is definitely not null, in which case foo != null can return true instead of unknown when foo is unknown.

This will not have immediate benefit because the rule for whether a resource attribute might be null would need to be decided by the provider that the resource type belongs to, rather than by the core language runtime, and so the full benefit of this new mechanism won't be apparent until this concept is also somehow integrated into the provider plugin protocol and the libraries that provider developers use to implement that protocol.

However, in the short term this means that there will be some additional ways to influence Terraform's treatment of unknown values via explicit workarounds, for those who find the idea of wrapping a primitive value into an object objectionable. For example, since the definition of the coalesce function is that it fails with an error if all of its arguments are null, Terraform should be able to infer that its result cannot possibly be null and so coalesce(anything) would, if "anything" were an unknown value, return an unknown value that is known not to be null. The same would be true for any other function or operation that by definition cannot produce null, assuming that we've already implemented the extra logic to propagate that information. (It probably won't have 100% coverage for the initial release.)

I don't consider this issue resolved until this functionality is fully implemented at least to the provider development libraries so that provider developers can start to make use of it and thus you wouldn't need any extra weird "hints" in the configuration anymore. I'm sharing the above only to give an update that there has been some initial work towards this but since it's a cross-cutting concern we will need to iterate in multiple steps rather than complete this all in one round.

jason-johnson commented 1 year ago

I don’t think it should be a null check (I.e “var.x == null”). As others have mentioned above, the value could end up being null after the apply.

In the case where we really must know if the final value is null or not, then the recommendation of doing apply with a specified target is correct. However, the case I am usually dealing with is simply “did the user of this module specify something?”. In that case, a new function “isUnknown” would actually be the best solution IMO. The providers already have access to this functionality.

For example we could do:

count = isUnknown(var.x) or length(var.x) > 0 ? 1 : 0. # can only be unknown if user specified

EDIT: an “exists” function, as mentioned above, that simply tells us if something was set or not would also work.

apparentlymart commented 1 year ago

If var.x were an unknown list that turns out to be an empty list then is_unknown(var.x) || length(var.x) > 0 would return true during plan and then false during apply. It isn't acceptable for a known value to change between plan and apply.

The possible lengths of a collection is another detail that Terraform v1.6 can track for unknown values, and so if in future any providers or functions become capable of promising "this unknown list definitely has a length of at least one" then length(var.x) > 0 would return true during planning without any need to explicitly program with the unknown values.

jason-johnson commented 1 year ago

The point is: we sometimes only want to know if the user set a value. Today, you can if you write a provider but not if you write a module.

apparentlymart commented 1 year ago

As far as I know, providers have exactly the same "problem" today: if they receive an unknown value then they have no way to know whether the final known value could possibly be null. In many cases, as in the main Terraform language, providers handle that by being optimistic and just deferring any error handling until the apply phase. In some specific cases, just as with count and for_each in the Terraform language, providers choose to fail because they are missing some crucial information about how to proceed.

The new possibility of tracking a constrained range for an unknown value that I described above is the beginning of the solution to both of these situations: the Terraform language itself will be able to return true from something != null if we know that something cannot possibly be null, and providers will be able to use the information that an unknown value is definitely not null to make decisions earlier if that would give useful information sooner, and avoid some situations where unknown values cause errors.

It is true that a provider can "program with unknowns" in a way that the Terraform language doesn't allow, but a provider attempting to make assumptions beyond what the Terraform language guarantees is likely to fail to uphold Terraform's evaluation rules, and thus cause an error message from Terraform Core saying that the provider is buggy. Providers have more access to implementation details of the language, but they must still follow the language's evaluation rules in order for their outcomes to be considered valid.

As I mentioned above, work on this issue is in progress. The first step comes in Terraform v1.6, in the form of being able to track additional information about unknown values. The remaining steps belong to the Terraform plugin framework and SDK (to provide the means for provider developers to inspect refined unknown values and to refine the unknown values they are returning) and to the individual provider codebases (to make use of those new framework/SDK features). That other work will follow once v1.6 is released and we've got some experience with the new capabilities in the main Terraform language, since it'll be more practical to quickly fix any bugs and quirks only in Terraform Core rather than having to negotiate fixes across both Terraform Core and the provider ecosystem all at the same time.

apparentlymart commented 1 year ago

Terraform v1.6 now contains the language-level building block that would be required to fully solve this.

One way you can use it today is to declare that your module's input variables are non-nullable:

variable "example" {
  type     = string
  nullable = false
}

Terraform itself now knows that this declaration means that var.example cannot be null -- if it were then that would be reported as an error inside the calling module block -- and so var.example != null would return true even if var.example's final value isn't known yet.

The same effect emerges if you use any built-in function that Terraform knows cannot possibly return null, although relying on that would of course be more a workaround than a solution since it is depending on a side-effect of a function rather than the function's documented purpose. For example, lower returns an error if given a null value as its argument, so if you know your string is case-insensitive or guaranteed to be all lowercase anyway then you could pass it through lower and then Terraform will infer that the result of that function cannot possibly be null.

output "instance_id" {
  # Terraform knows that the value of this cannot possibly
  # be null even if aws_instance.example.id isn't known yet,
  # because lower would fail if given a null value.
  value = lower(aws_instance.example.id)
}

(If you decide to use this temproary workaround, please do so sparingly and leave comments nearby for future maintainers of your configuration to see. This is very implicit behavior and so likely to be non-obvious to future maintainers.)

The next step here is to teach providers themselves to be able to indicate via the provider protocol when an unknown value is guaranteed not to be null. That's going to take some more design work since it changes part of the external API of Terraform (the provider protocol) rather than just its implementation. Part of that effort is to get more experience using the inference mechanisms built into the language itself first, so that we can feel confident that these new mechanisms are a good enough fit for the problem before making the design be frozen by its inclusion in a public API.

rubenvw-ngdata commented 1 year ago

@apparentlymart I would expect that this also works with a validation on the input variable, e.g.

variable "example" {
  default     = null
  validation {
    condition     = length(var.example) > 0
    error_message = "The variable example should not be null when provided"
  }
}

But still getting the same error when I supply an output variable of another module in it :(

apparentlymart commented 1 year ago

length(null) is invalid, because null represents the absence of a value and an absent value does not have a length.

You can make that work by testing the length of the string only if it isn't null:

variable "example" {
  type        = string
  default     = null
  validation {
    condition     = var.example != null ? length(var.example) > 0 : true
    error_message = "The variable example should not be empty if provided."
  }
}

The above guarantees that the value of var.example elsewhere in the module will be either null or a string containing at least one character. However, Terraform cannot automatically infer that an unknown value in this variable cannot be null because validation rules only prevent progress of they are not met; they do not modify the given value in any way.

This example is not related to what I shared in my previous comment because this variable can be null, and so your validation rule must describe how to handle that situation. If you were to change the default to "" and then set nullable = false you would be guaranteed a value that cannot be null, but that means you would no longer be able to tell the difference between the variable being totally unset and it being set to the empty string.

apparentlymart commented 3 months ago

At this point I think all of the work in Terraform Core for this is as complete as it can be, and so the remaining work is to update the plugin framework and SDK to allow providers to report that attributes are definitely not null, and for the providers themselves to make use of those new features.

Building on the workaround I shared earlier -- using a function whose result definitely cannot be null to allow Terraform to infer "definitely not null" even though the provider doesn't announce it -- I subsequently built a provider that more explicitly represents that workaround, called apparentlymart/assume.

My earlier workaround of using the lower function could therefore be replaced with the following, if you use my "assume" provider by including it in your module's required_providers block:

output "instance_id" {
  value = provider::assume::notnull(aws_instance.example.id)
}

In the long run I hope it will cease to be necessary for module authors to hint this explicitly, with the providers themselves annotating their results with equivalent refinements. If they do start adding such annotations then using my notnull helper function will no longer be needed, but would also be harmless because it would just confirm the refinement that was already present on the input value.

The Plugin Framework already has https://github.com/hashicorp/terraform-plugin-framework/issues/869 to represent the work that needs to complete before providers could participate in this, so I'm going to close this issue in favor of that one since it's a better representation of what work remains to fully complete this, and because there's no further work planned in this repository and so this issue would likely get forgotten and just remain open forever if I don't close it today.

If you're interested in the more automatic solution where providers can include their own annotations of values that cannot possibly be null then I suggest adding your vote to that other issue in the Plugin Framework repository, or if you've found this after that has already been resolved then consider opening provider feature requests for specific situations where you think such annotations would be useful. Thanks!

github-actions[bot] commented 2 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashicorp / terraform

Allow "count" for non-null check on resource attributes that are always present #26755

Current Terraform Version

Use-cases

Attempted Solutions

Proposal

References

26078 overlaps with this but isn't quite the same; what I'm asking for here is more narrowly focused than what people were asking for in the discussion there.