Feature: Allow circular dependencies in resources

hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

https://www.terraform.io/

Other

42.29k stars 9.49k forks source link

Feature: Allow circular dependencies in resources #27188

Open dansimau opened 3 years ago

dansimau commented 3 years ago

Current Terraform Version

v0.14.0-rc1

Use-cases

I have a aws_cognito_user_pool resource
On this resource there is a block where you can configure the ARN of Lambda triggers
I want to configure a Lambda trigger to have the Cognito user pool ID as an environment variable

When I configure this in Terraform, it obviously doesn't work. I get:

$ terraform apply

Error: Cycle: module.auth.aws_lambda_function.presignup, module.auth.aws_cognito_user_pool.default

However, the use case above is a real-world circular dependency that is legitimate. Outside of Terraform, it would be a 3-step process to configure this, e.g. one of the ways would be:

1) Create Cognito user pool 2) Create Lambda function, using Cognito user pool ID as an input 3) Update Cognito user pool to add Lambda trigger

(You could also create the other resource first, but the steps are the same: Create resource A; Create resource B; Update resource A).

Attempted Solutions

As far as I know today, there is no built in way for Terraform to handle this type of situation
Providers can do things like create a "virtual" resource in Terraform, that connects two resources and performs some kind of update. However, this requires a common use case to appear and dev work to happen to support it in each case.
Otherwise, this problem is just pushed to the user to do workarounds: they can configure a data source for example, but it will require two Terrrform runs and the first one may result in an error. The user then has to know to ignore this and run Terraform again, which is not very elegant or good for CI.

Proposal

Is there a way in which Terraform could attempt to resolve cycles automatically by doing a create A, create B, then update A?

Sorry if this suggestion seems naïve, I admit I'm not familiar with Terraform internals. However, I imagine it to require:

Tracking specific attributes that are causing the cycle and deferring setting that during create
Taking into consideration which attributes can be updated in-place (e.g. attributes that require the resource to be recreated would not be candidates for this)
Being able to manage this behaviour per provider or per resource to overcome edge cases

The idea here is that this would be a general solution. The observation here is that resource cycles are a legitimate and real-world use case that need to be dealt with in a general way.

References

I did a search to try and find prior discussions on this but I couldn't find any specific feature request around representing or allowing resource dependency cycles.

apparentlymart commented 3 years ago

Hi @dansimau! Thanks for sharing this use-case.

As you've noted, the typical way to deal with this today is for the provider to explain to Terraform that "update cognito user pool to add lambda trigger" is a separate operation by representing it as a separate resource. That creates a relatively easy to explain execution model: there is only one action for each resource per plan (with the special exception of "replace", which is internally a combined destroy/create), and the ordering of those actions is derived from the dependencies between those resources.

Off the top of my head I'm not able to imagine a general solution to this which doesn't require the provider to give Terraform enough information to understand that, in your case, it's allowable and reasonable to create a cognito user pool without a lambda trigger at first and then update it later. Any design that requires additional information in the provider schema would not meet the use-case as you framed it, where additional work in the provider was your criteria for failing the current design as a suitable solution.

Since your request here explicitly excludes the current design as a possible answer, but there isn't yet a candidate new design to evaluate, I'm going to leave this open for the moment but I want to be explicit that it will likely be closed unless someone suggests a concrete technical design for further discussion, because we (the Terraform team at HashiCorp) consider this problem already "solved" in the sense that there is a way for a provider to represent the sequence of three operations you described.

In our conception of Terraform's architecture, we consider it the provider's primary responsibility to map from the concepts of the remote API onto Terraform's workflow, and so although it would be nice to find some way to "automate away" this design problem, architecturally there is no particular need to do so, and if the AWS provider doesn't offer a way to associate a lambda trigger with a cognito pool as a separate operation then I expect it will be far more expedient to work on a specific technical design for the AWS provider to address that than to try to design a generalized solution for hypothetical additional problems that we are not yet aware of. At the very least, we'll need several more examples of similar problems in order to start to analyze what they all have in common and thus how the problem might generalize.

dansimau commented 3 years ago

Thanks for the considered reply @apparentlymart.

we'll need several more examples of similar problems in order to start to analyze what they all have in common and thus how the problem might generalize.

Indeed, I'd be interested to know how often this comes up. Judging by the fact that nobody filed an issue before, maybe not as much as I originally assumed when I hit this use case.

okaros commented 3 years ago

I don't have a proposed solution, but I can provide another example. This circular dependency scenario happens in the AzureRM provider with app services and Azure-managed SSL certificates.

The azurerm_app_service resource can be given a custom hostname and SSL certificate via the azurerm_app_service_custom_hostname_binding resource. You normally specify the SSL certificate to use via the 'fingerprint' attribute, which is the SSL fingerprint of the desired certificate.

If you wish to use a free Azure Managed Certificate via the azurerm_app_service_managed_certificate resource, a circular dependency is created: azurerm_app_service_managed_certificate requires an azurerm_app_service_custom_hostname_binding, but azurerm_app_service_custom_hostname_binding requires the fingerprint from azurerm_app_service_managed_certificate in order to attach the certificate.

(I work around the problems with a judicious use of a local-exec provisioner and ignoring changes to some attributes, so I bring this up just to provide another example use-case)

Edit: Ironically, as I was typing this out a new release of the AzureRM provider eliminated this particular circular dependency. 🤣

okaros commented 3 years ago

There's also the more general use-case of wanting to build resources that communicate with each other in Terraform. Consider: You want to create two Azure App Services that need to communicate with each other via connection strings configured in their environment variables. This is a circular dependency in Terraform, regardless of provider/platform, since you cannot have their resource objects reference each other unless they already exist, and they don't. You can cheat this in any number of ways (multi-stage terraform deployments with variables to control how far along you are in the process, for example, or by importing manually-created resources, or...) but it would terribly nice to not have this limitation.

From a what-does-a-solutiosn-look-like perspective, perhaps a configuration block similar in structure/functionality to a provisioner that fires after resource creation, but used to provide for a delayed attribute update/change instead?

As a vague example for such a post_create block:

resource "azurerm_app_service" "example1" {
  name                = "example-app-service"
...
...
  app_settings = {
    "SOME_KEY" = "some-initialvalue"
  }

  post_create {
    app_settings = merge(self.app_settings, { "SOME_OTHER_KEY" = azreurm_app_service.example2.default_site_hostname }
  }
}

resource "azurerm_app_service" "example2" {
  name                = "example-app-service"
...
...
  app_settings = {
    "SOME_KEY" = "some-value"
  }

  post_create {
    app_settings = merge(self.app_settings, { "SOME_OTHER_KEY" = azreurm_app_service.example1.default_site_hostname }
  }
}

The end-result being that each resource gets created first without SOME_OTHER_KEY being present in app_settings{}, then updated post-creation in the same plan to add it.

Referencing the other resource like this would allow for appropriate dependency ordering, hopefully? And after successful creation the results of the post_create can be merged into the regular state so future plans work normally? This would solve almost all of the use-cases for circular dependencies I've run into, I think, including the original Cognito-/Lambda-oriented presented here, and would also allow for more natively-Terraform workarounds for cases where providers haven't caught up with addressing circular resource dependencies like the one I mentioned in my previous comment.

apparentlymart commented 3 years ago

That's an interesting idea, @okaros, and reminds me a bit of functional reactive programming where programs react to events by merging the event data in with a previous value.

It does seem like an idea worth researching in some more detail. Some initial thoughts I have for questions to consider would be:

How might this work if the value to be updated later is in a nested block rather than a top-level attribute? That would presumably require a way to refer to a specific nested block, and nested blocks aren't always directly addressable.
Are there situations that might require more than one subsequent update, where two separate operations elsewhere in the configuration contribute separate updates?
How would this be best presented in the plan output? Currently we show only one operation per resource instance but this creates the possibility of multiple. (It also means changing Terraform's internal models a bit, but I think better to figure out how it looks externally and then work backwards to the internals from there.)

It does seem like a promising direction to investigate, but also not an easy thing to prototype with Terraform as it exists today. 🤔 I would like to consider it more though, so thanks for suggesting it.

okaros commented 3 years ago

I don't have full answers, @apparentlymart, but some thoughts from my end-user perspective :

An initial or even final implementation might simply say "You can't do that" with regards to nested blocks that aren't addressable, or even nested blocks altogether. Other Terraform functionality has limitations on what can be interacted with ("destroy" provisioners come immediately to mind as an example of something with heavy restrictions), so such limitations wouldn't be unprecedented. A solution that worked with everything would of course be ideal, but for me, at least, even a limited solution be a welcome improvement.

Multiple post_create updates would be interesting, although I'm not sure I can see a use-case where they'd be needed (at least, not without introducing additional layers and the concept of post_post_create, which strikes me as being... too much) . But if they are, perhaps they could be handled in the same fashion as provisioners, with multiple blocks simply being handled serially both in the written HCL blocks and in planning/execution? I'd envisioned the post_create block as being limited to addressing attributes on the attached resource and not being capable of adjusting other resources, and two different resources with post_create blocks would only be able to reference attribute values available during the initial creation. i.e. If my example2 app service from above tries to reference example1.app_settings, it only sees SOME_KEY and not SOME_OTHER_KEY (but, once the resources were created SOME_OTHER_KEY would be available and indistinguishable from SOME_KEY).

Currently provisioners aren't shown at plan-time at all, and those are my closest analogue to this idea, so.... 🤣 More reasonably, I think that the standard (known after apply) message would probably be appropriate? My limited understand of how the plan is built suggests it ought to be possible to know which attributes will be modified in the second pass and simply include those as attributes being changed/set on the resource, but with the circular-dependency scenarios we're talking about here I don't think there would be many cases where the final values could be known at plan time. After all, if they were know-able it really wouldn't be a circular dependency... And generally, I'm not sure there would be much use for knowing details of the intermediate stage except during a failed apply, which should probably just be treated the same as a provisioner that failed (taint the entire resource so it's recreated on the next attempt and that recreation can cascade changes out to other, dependent resources appropriately). I think that from the perspective of the person running terraform the fact that it's two disparate operations "under the hood" rather than one to get to the end-state probably doesn't matter very much, and the actual details of how the execution behaves can simply be in the documentation for the post_create block.

lmmattr commented 3 years ago

@okaros I know it doesn't solve this particular issue but I feel in needs pointing out that the event data passed to the lambda trigger does include the Cognito user pool id.

jeffg-hpe commented 3 years ago

we'll need several more examples of similar problems in order to start to analyze what they all have in common and thus how the problem might generalize

I have two Okta orgs managed via https://registry.terraform.io/providers/oktadeveloper/okta and I want to set up a SAML based trust between them.

Creating the resources in step 1 and 2 generates unique identifiers that must be exchanged, and cannot be known in advance.

Steps:

Create resource okta_saml_idp in org1
Create resource okta_app_saml in org2, using values from step 1
Update okta_saml_idp, using values from step 2

Using the post_create proposed above, it might look something like this.

# idp is created first, with placeholder for argB
resource "okta_saml_idp" "external-idp" {
  argA = "value"
  argB = "placeholder"
  post_create {
    argB = okta_app_saml.sp.some.value
  }
}

# sp is created later, due to dependency on idp.somevalue
resource "okta_app_saml" "idp-provider" {
  argA = okta_saml_idp.idp.someother.value
  argB = "value"
}

# finally, post_create can execute as its dependency is satisfied now.

Note, I simplified for brevity (removed the needed multiple providers, used example arg/attrib names).

I see three alternatives to solving this via post_create block:

ignore_changes + local-exec block and call out APIs directly for step 3
multistage terraform deployment, with a resource import
new provider resource that configures a subset of okta_saml_idp for an existing instance

dzrtc commented 3 years ago

we'll need several more examples of similar problems in order to start to analyze what they all have in common and thus how the problem might generalize

AWS Transit Gateway provides routing between multiple VPCs, replacing VPC Peering. Setting this up involves circular dependencies because the TGW must be explicitly attached to the VPCs (requiring knowledge of the vpc_id) while the VPCs must setup routes through the TGW (requiring knowledge of the ec2_transit_gateway_id).

It makes a lot of sense to manage the (many) VPCs with their TGW routes independently (note1) of the (one) TGW with its VPC attachments. However, if you break the dependency cycle by setting up the VPC route tables after the VPCs and TGW exist, then you can't manage the VPC because the "new" routes are discovered in subsequent plans.

On the other hand, if you setup the TGW first without any attachments, then manage the attachments and route tables from inside each VPC, then that undermines the value of using TGW to centrally administer routes between VPCs.

I'm not sure how I could use the proposed post_create to solve this problem.

note1: By "independently", I mean, "resources managed in distinct tfstate files".

bmilesp commented 1 year ago

I've been working around this issue using a blue/green and dev environments for my app, but then I ran into an AWS issue that left appsync domains in a state where it was unusable for hours (seperate Terraform issue regarding Custom domain disassociation).

This was the original stacks:

GlobalStack (route53 hosted zone, SES, IAM, ACM)
  |       \------------------------------
  |                                      | 
LiveDataStack                      DevDataStack (databases and Cognito user pools)
  |                                      |
  |                                      |
Blue/GreenAppStacks                DevAppStack (Appsync, StepFunctions, Lambdas, Cloudwatch)

This worked well as i could change a config var and point the route53 domain name to either blue or green stacks easily, but to help illustrate the point below, notice that the global resources and datastack resources are required by other resources in the stacks downstream.

So now were to the key problem. I wanted to safeguard against this aforementioned issue (and potential others like downed resources between regions, etc) by creating a "region" stack layer, so that i could replicate the LiveDataStack, DevDataStack, Blue/Green/Dev AppStacks into another/multiple regions like this:

GlobalStack
    |  \----------------------------------
    |                                     |
RegionalStack us-east-2          RegionalStack us-west-1
    |                                     |
Live/Dev DataStacks                  Live/Dev DataStacks 
    |                                     |
Blue/Green/Dev AppStacks       Blue/Green/Dev AppStacks

But because of the circular dependencies, this is not possible (or at least I have not been able to find a way to do this).

griffinator76 commented 1 year ago

we'll need several more examples of similar problems in order to start to analyze what they all have in common and thus how the problem might generalize

I tried to use Terraform to set up a Snowflake "Storage Integration" object that links to an AWS S3 bucket using the "chanzuckerberg" Snowflake provider from the Terraform registry in addition to the standard AWS provider.

Part of the process to create the integration requires the following sequence of actions:

Create an IAM Role with S3 access policy
Create a Snowflake Storage Integration object, specifying the IAM Role created in step 1
Modify the IAM Role access policy using values from the Storage Integration object (complete list of steps here)

Hence there is a circular dependency between the IAM Role and Storage Integration. Steps 1 and 2 are straightforward but step 3 involves modifying an object's state after it has been created.

The IAM Role access policy cannot be modified separately from the role itself.

spectria-limina commented 1 year ago

I've run another use case: trying to manage content with a series of messages, each with a "back to top" link which would link to the table of contents. The table of contents, of course, needs to be able to link to all the other posts. This is another instance of "I need mutually referential identifiers".

My alternative suggestion is that two-phase created could be explicitly supported at the platform level, as there are APIs that allow reservation of resources much more cheaply than full creation. Something like:

A Resource can optionally be a ResourceWithReserve. If it is, it must have at least one Attribute with a new Reservable flag, and may also set a new RequiredForReserve flag (or maybe this one should be inverted).
When the resource is to be created, it yields two nodes in the dependency graph: one for the reservation step, and one for the creation step. Dependencies through RequiredForReserve or Reservable are applied to the reservation node; dependencies through other attributes are applied to the creation node. The creation node depends on the reservation node. I hope this is similar to how an update can be sometimes internally represented as a create and a delete.
If there are any nodes strictly between the reservation and creation on the dependency graph, then the creation must be two-phase. Variables filled in by the reservation are marked as "(known after reservation)" in the plan.
If there are no such nodes, then the planner might still to do a two-phase create as an optimization, if it would unblock other resources' application. Perhaps a resource could hint when it can reserve much more quickly than it can actually create.
When a resource supports reservation, there is a new reservation meta-argument. It can have some depends_on and lifecycle meta arguments applied, as well as be the target of depends_on—these are simply ignored if the reservation step is never performed. There could also be reserve = always or reserve = never to control behaviour when needed (the latter being a hard error if it creates a loop).

This design could be extended to multiple phases, but it's not immediately clear you'd want that.

mm-col commented 1 year ago

I'm running into this in OCI.

Creating a custom route table and assigning that route table to the subnet works without issue. The problem comes when I also want to create route rules in that route table.

For example, I have a subnet defined and that subnet will have an Ubuntu instance in it along with a Palo Alto firewall instance. I need a route table assigned to the subnet that makes the trust interface IP of the firewall the default gateway for the subnet.

Here are the components that need to work together: subnet private IP route table route rules

The problem is the circular dependencies. The route rule depends on the network_entity_id of the private IP. That private IP depends on the subnet. The subnet depends on the route table at creation.

Everything works until a route rule is specified that includes the id of the private IP as that creates the circular reference. The subnet can't use the route table because the route table has a rule in it that points to the private IP which can't be created before the subnet is created.

Huang-W commented 1 year ago

https://github.com/hashicorp/terraform-provider-aws/pull/1824

Another valid use case is cycles in AWS security groups or prefix lists.

BWeesy commented 1 year ago

We bumped into this while trying to pass the invokeURL of an AWS gateway resource to a lambda as an environment variable because the gateway has endpoints that route to the lambda.

sgal-dm commented 1 year ago

I have the same use case that jeffg-hpe posted above, and to expand on it a little, the provider can't cleanly handle this one because building it requires multiple instances of the provider, one targeting the IdP tenant, the other targeting the SP tenant each with distinct API endpoints and auth tokens.

So the typical approach of adding a virtual resource to the provider that manages multiple resources under the hood doesn't work here because those resources exist in disparate environments.

His third alternative approach, while novel, seems sloppy for a provider. It'd require something like:

Define resourceA, ignoring attributes w & x.
Define resourceB, which depends on A, and reads attributes y & z from it.
Define the new resourceC, which depends on A & B, and under the hood is an imported replica of A that sets attributes w & x based on the output of B.

So we're left with local-exec or multi-stage deployment unless this can be handled as a feature of Terraform.

patmaddox commented 1 year ago

the typical way to deal with this today is for the provider to explain to Terraform that "update cognito user pool to add lambda trigger" is a separate operation by representing it as a separate resource. That creates a relatively easy to explain execution model: there is only one action for each resource per plan (with the special exception of "replace", which is internally a combined destroy/create), and the ordering of those actions is derived from the dependencies between those resources.

Do you have an example of this typical way? I am researching and not sure how two separate operations (to the same resource I assume?) are modeled as separate resources.

Our use case is configuring snowflake. The manual process is:

Create an AWS role with a dummy account ID
Create a Snowflake integration, referring to the role ARN
Update the role with the account ID generated by the integration

Some possible mechanisms I've heard referenced in my research are dynamic data sources, dynamic variables, and now this multiple operations. But I haven't worked out yet how to implement any of them.

apparentlymart commented 1 year ago

One example of this pattern that I can think of quickly is in the hashicorp/aws provider:

There are separate resource types for aws_s3_bucket and aws_s3_bucket_policy, which allows the policy to refer to the arn attribute of the bucket itself when describing rules about specific sub-paths inside the bucket, which typically involves writing an ARN whose prefix is the arn attribute of the bucket as a whole.

dbaynard commented 1 year ago

There are separate resource types for aws_s3_bucket and aws_s3_bucket_policy, which allows the policy to refer to the arn attribute of the bucket itself when describing rules about specific sub-paths inside the bucket, which typically involves writing an ARN whose prefix is the arn attribute of the bucket as a whole.

Oh, is that why so many aws features have separate resource types?

Does that mean that in places where there are blocks that could be separate resources, the direction of travel is towards the latter?

apparentlymart commented 1 year ago

There is a separate team responsible for the hashicorp/aws provider and so I don't know all of what motivates their design decisions, but in this particular case (the S3 operations) the structure with separate resource types for different features matches the structure of the underlying API, which has separate write operations for the two resource types I mentioned: s3:CreateBucket for aws_s3_bucket and s3:PutBucketPolicy for aws_s3_bucket_policy.

I suspect you're recalling that earlier versions of the provider just had a single aws_s3_bucket resource type which covered a large portion of the Amazon S3 API surface. And indeed, the lesson learned from that initial design is that providers should typically follow as closely as possible the separation of concerns in the underlying API, because the finer details of the API typically rely on characteristics of the coarser decisions. We can see that in the example I shared, where the underlying API assumes you can create a bucket to find out its ARN before you create a policy for that bucket. The Terraform provider merging those two into a single operation therefore made that particular detail of the API design not work properly in Terraform.

From discussions from the provider teams my understanding is that their modern design approach is to closely match the structure of the underlying API to avoid this sort of design inconsistency in the fine details. That goal might explain other API changes where certain single resource types were split into many separate resource types in later releases, but I'm not involved with the detailed planning of that and I only know about the S3 example because I've previously helped folks in the community who had problems caused by the old design.

If you'd like to discuss more about how the hashicorp/aws provider is designed then I suggest doing so in its own repository, because the folks who monitor this repository are not directly involved in the design or implementation of that provider.

Thanks!

glerb commented 1 year ago

Another use case analogous to @apparentlymart 's S3 case above: locking KMS keys to any resource that uses them with a Resource restriction in the key policy:

resource "aws_sns_topic" "log_processing" {
  name = "LogProcessingTopic"
  kms_master_key_id = aws_kms_key.log_processing.arn

with a key policy for the KMS key of:


data "aws_iam_policy_document" "log_processing_kms_key" {
  statement {
    actions = [
      "kms:GenerateDataKey*",
      "kms:Decrypt"
    ]
    resources = [aws_sns_topic.log_processing.arn]
    effect    = "Allow"

    principals {
      type = "Service"
      identifiers = [
        "sns.amazonaws.com",
      ]
    }
  }

Vingtoft commented 1 year ago

Any update?

nibblesnbits commented 9 months ago

3 years later. Any updates here?

crw commented 9 months ago

@nibblesnbits Based on scanning @apparentlymart's comments, I would not expect this behavior to change in Terraform v1.x. This is the type of issue the team likes to leave open to generate ideas and use cases for a "hypothetical v2."

For future viewers, if you are viewing this issue and would like to indicate your interest, please use the 👍 reaction on the issue description to upvote this issue. Thanks!

stevemckenney commented 3 months ago

i'll add my use case to the pile.

Using the respective hashicorp AWS modules for lambda and eventbridge. Dependency exists when creating a lambda with a eventbridge rule as a trigger. The lambda needs the ARN of the rule to add the required resource permissions to the lambda so it can be invoked by the event tule. The event rule also needs the lambda function ARN to be able to create the lambda target

apparentlymart commented 3 months ago

Hi @stevemckenney! Thanks for sharing that feedback.

Could you link to the specific modules you're referring to? I'm not aware of any HashiCorp-maintained modules for either Lambda or EventBridge, and I wasn't able to find any relevant-seeming modules in the partner-maintained AWS modules. I'd like to be able to see exactly how those modules are configuring Lambda and EventBridge to understand how the circular dependency arises.

In the hashicorp/aws provider itself -- ignoring any modules that might wrap it for the moment -- the aws_lambda_permission resource type is separate from the aws_lambda_function resource type to allow for the following order of operations when "creating from nothing":

Create the function
Create the other object in some other AWS service that will call the function
Create the permission, using both the ARN of the object from step 2, and the name of the function from step 1

That sequence therefore avoids any circular dependency because the permission is modeled as a separate resource.

I'm guessing that the modules you are trying to use make it hard or impossible to declare that sequence of events. Therefore I'd like to study those modules to understand why that is, and thus what specific changes we might potentially make to the Terraform language to avoid that problem.

Thanks again!

stevemckenney commented 3 months ago

Sorry, these lines are a bit blurry when it comes to who maintains these, it isn't hashicorp

https://github.com/terraform-aws-modules/terraform-aws-lambda https://github.com/terraform-aws-modules/terraform-aws-eventbridge

To your point, I may just have to call the permission resource separately after the creation via the modules.

Steve McKenney | BISSELL Homecare, Inc. | Sr. ITS Enterprise Solutions Architect | p. 616.791.0690

From: Martin Atkins @.> Sent: Thursday, May 16, 2024 1:18 PM To: hashicorp/terraform @.> Cc: McKenney, Steve @.>; Mention @.> Subject: Re: [hashicorp/terraform] Feature: Allow circular dependencies in resources (#27188)

External Sender

Hi @stevemckenneyhttps://cas5-0-urlprotect.trendmicro.com/wis/clicktime/v1/query?url=https%3a%2f%2fnam04.safelinks.protection.outlook.com%2f%3furl%3dhttps%253A%252F%252Fgithub.com%252Fstevemckenney%26data%3d05%257C02%257Csteve.mckenney%2540bissell.com%257Ce7ebf52bbde94c697b9608dc75cc3491%257C18b2408003c742fe8ceecd8304b76d88%257C0%257C0%257C638514767131487473%257CUnknown%257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%253D%257C0%257C%257C%257C%26sdata%3dXUmrkUb2lTqn4QT3Pvfm5wzbYivtXYakvVxVfv5W2%252Fo%253D%26reserved%3d0&umid=3e747425-f3c9-4325-81f3-06cb4b563d65&auth=9d7db5738c0ab76d2bceec349a50854f80a7b638-9bc406895c59101d58b2aaa3603c92c879852d54! Thanks for sharing that feedback.

Could you link to the specific modules you're referring to? I'm not aware of any HashiCorp-maintained modules for either Lambda or EventBridge, and I wasn't able to find any relevant-seeming modules in the partner-maintained AWS moduleshttps://registry.terraform.io/browse/modules?partner=true&provider=aws. I'd like to be able to see exactly how those modules are configuring Lambda and EventBridge to understand how the circular dependency arises.

In the hashicorp/aws provider itself -- ignoring any modules that might wrap it for the moment -- the aws_lambda_permissionhttps://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_permission resource type is separate from the aws_lambda_functionhttps://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function resource type to allow for the following order of operations when "creating from nothing":

Create the function
Create the other object in some other AWS service that will call the function
Create the permission, using both the ARN of the object from step 2, and the name of the function from step 1

That sequence therefore avoids any circular dependency because the permission is modeled as a separate resource.

Thanks again!

- Reply to this email directly, view it on GitHubhttps://cas5-0-urlprotect.trendmicro.com/wis/clicktime/v1/query?url=https%3a%2f%2fnam04.safelinks.protection.outlook.com%2f%3furl%3dhttps%253A%252F%252Fgithub.com%252Fhashicorp%252Fterraform%252Fissues%252F27188%2523issuecomment-2115806464%26data%3d05%257C02%257Csteve.mckenney%2540bissell.com%257Ce7ebf52bbde94c697b9608dc75cc3491%257C18b2408003c742fe8ceecd8304b76d88%257C0%257C0%257C638514767131516701%257CUnknown%257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%253D%257C0%257C%257C%257C%26sdata%3dx9j83L0qrEEC7sBGWpaki6w9hspfY8Qg%252BgLaOHWLO0E%253D%26reserved%3d0&umid=3e747425-f3c9-4325-81f3-06cb4b563d65&auth=9d7db5738c0ab76d2bceec349a50854f80a7b638-3c88ef3b6587942613b329a66da03d5593dc6877, or unsubscribehttps://cas5-0-urlprotect.trendmicro.com/wis/clicktime/v1/query?url=https%3a%2f%2fnam04.safelinks.protection.outlook.com%2f%3furl%3dhttps%253A%252F%252Fgithub.com%252Fnotifications%252Funsubscribe-auth%252FASEMA3MBQIQDMQEIWPHESU3ZCTS6LAVCNFSM4USCTEU2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJRGU4DANRUGY2A%26data%3d05%257C02%257Csteve.mckenney%2540bissell.com%257Ce7ebf52bbde94c697b9608dc75cc3491%257C18b2408003c742fe8ceecd8304b76d88%257C0%257C0%257C638514767131522249%257CUnknown%257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%253D%257C0%257C%257C%257C%26sdata%3dMQdWlsGffMp1BywpCt25C0ccU4wxyRIUHRDcyBZMh18%253D%26reserved%3d0&umid=3e747425-f3c9-4325-81f3-06cb4b563d65&auth=9d7db5738c0ab76d2bceec349a50854f80a7b638-37fe448b2baef905cdb4664e85567566918647f6. You are receiving this because you were mentioned.Message ID: @.**@.>>

This communication is the property of BISSELL Homecare, Inc. and affiliates ("BISSELL") and is considered confidential. Price and other contract terms discussed in this email are not final and remain subject to BISSELL internal approval.

apparentlymart commented 3 months ago

Thanks for those links, @stevemckenney.

I'm not super familiar with these modules, but from peeping in the source code for a little while it seems like something like this might work:

module "lambda" {
  source = "terraform-aws-modules/lambda/aws"

  # ...

  runtime     = "..."
  source_path = "..."

  # ...

  allowed_triggers = {
    for k, arn in module.events.eventbridge_rule_arns : k => {
      service    = "events"
      source_arn = arn
      # ...
    }
  }
}

module "events" {
  source = "terraform-aws-modules/eventbridge/aws"

  # ...

  targets = {
    crons = [
      {
        name  = "lambda-cron"
        arn   = module.lambda.lambda_function_arn
        input = jsonencode({ "job" : "cron-by-rate" })
      }
    ]
  }

  # ...
}

This relies on the fact that the parts of the Lambda module that configure the function itself don't refer to var.allowed_triggers, and so therefore module.lambda.lambda_function_arn also doesn't depend on the triggers, and so there shouldn't be a dependency cycle here. This should be able to achieve the same order of operations I described in my previous comment, just with two of the resources in the Lambda module and one of them in the EventBridge module.

I don't have an AWS account handy with which to test this right now, but I notice that the with-lambda-scheduling example in the EventBridge module's repository seems to show a similar construction.

Of course, if that doesn't work then putting a permission resource separately outside both of the modules ought to work, as you said.

esirK commented 3 months ago

I'm having an issue where a lambda function is adding the ARN of a stepfunction as it's environment variable

{
    "STATE_MACHINE_ARN" = var.state_machine_arn
  }

However, the stepfunction also needs the ARN of this lambda function

Type     = "Task",
Resource = var.x_handler_lambda_arn,

How can I go about this? Is it possible to set the environment variable after the stepfunction has been created?

apparentlymart commented 3 months ago

Hi @esirK,

For situations like that a typical strategy would be to add some sort of indirection. That means that instead of passing the step function ARN directly to the Lambda function, you'd instead pass some information that the Lambda function can use to find the step function dynamically at runtime. Of course, you will need to be able to tolerate there being a brief period at the start of the Lambda function's life when the step function doesn't exist yet.

Another possibility would be to split your function into two functions, where one is triggered by the step function and the other uses the step function itself.

Unfortunately, the AWS Lambda API expects environment variables to be set in the same API call that creates the Lambda function and so the hashicorp/aws provider follows this convention and expects you to provide the environment variables as part of the function's settings. The API does not treat individual environment variable names as independent objects that can be managed separately from the overall variable table or the function's other configuration settings. The provider in turn does not offer capabilities that are not reflected in the underlying API.

esirK commented 3 months ago

Hi @esirK,

For situations like that a typical strategy would be to add some sort of indirection. That means that instead of passing the step function ARN directly to the Lambda function, you'd instead pass some information that the Lambda function can use to find the step function dynamically at runtime. Of course, you will need to be able to tolerate there being a brief period at the start of the Lambda function's life when the step function doesn't exist yet.

Another possibility would be to split your function into two functions, where one is triggered by the step function and the other uses the step function itself.

Unfortunately, the AWS Lambda API expects environment variables to be set in the same API call that creates the Lambda function and so the hashicorp/aws provider follows this convention and expects you to provide the environment variables as part of the function's settings. The API does not treat individual environment variable names as independent objects that can be managed separately from the overall variable table or the function's other configuration settings. The provider in turn does not offer capabilities that are not reflected in the underlying API.

Thanks @apparentlymart I went with the first approach. Thanks ❤️