gruntwork-io / terragrunt

Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
https://terragrunt.gruntwork.io/
MIT License
8.07k stars 980 forks source link

Terragrunt Plan requiring a real value for mock_outputs with resource azurerm_windows_function_app #2324

Open bdorplatt opened 2 years ago

bdorplatt commented 2 years ago

Running a plan against the new azurerm_windows_function_app resource results in the error below and will not accept a mock output and continue. The exact same setup with the deprecated azurerm_function_app works as expected and honors the mock values. Based on the error details, it appears the new resource requires actually seeing what tier the service plan is on and it can't get that from a mock value. Both tests were performed using the azurerm_service_plan resource for the app service plan.

Error: could not read new Service Plan to check tier Service Plan: (Serverfarm Name "mock-asp" / Resource Group "mock-rg"): web.AppServicePlansClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceGroupNotFound" Message="Resource group 'mock-rg' could not be found."

A plan does run successfully when using the resource ID of an existing App Service Plan (actual resource existing in Azure) entered as the mock output in the terragrunt.hcl of the Function app. For reference: https://github.com/hashicorp/terraform-provider-azurerm/issues/17376

Since the whole point of mock outputs is to not have resources already existing, an adjustment to how Terragrunt handles this resource type will probably be needed. We shouldn't have to use a real value for a mock output. While using a real ASP ID is a sort of workaround, it isn't sustainable since if the referenced App Service Plan is deleted, the code will be broken. This also doesn't work if the app is being deployed in a new subscription or a subscription that doesn't have any App Service Plans deployed yet.

Also of note, using any of the 3 options for mock_outputs_merge_strategy_with_state, results in the same: Error: could not read new Service Plan to check tier Service Plan

Function App resource resource "azurerm_windows_function_app" "function_app" { app_settings = var.fa_app_settings location = var.region name = "test-azf" resource_group_name = var.resource_group_name service_plan_id = var.fa_service_plan_id }

Dependency and Input in Terrarunt.hcl include { path = find_in_parent_folders() } dependency "app_service_plan01" { config_path = "../../appserviceplan/appserviceplan01" mock_outputs = { asp_id = "/subscriptions/11111111-1111-1111-1111-111111111111/resourceGroups/mockrg/providers/Microsoft.Web/serverfarms/mockasp" } mock_outputs_allowed_terraform_commands = ["validate", "plan"] mock_outputs_merge_strategy_with_state = "shallow" }

inputs = { fa_service_plan_id = dependency.app_service_plan01.outputs.asp_id }

Output from the App Service Plan module output "asp_id" { description = "ID of ASP" value = module.asp.asp_id }

denis256 commented 2 years ago

Hi, I suspect existing resource ID should be imported to Terraform state to be picked correctly during Terragrunt execution

https://developer.hashicorp.com/terraform/cli/import

bdorplatt commented 2 years ago

We are not working with existing infrastructure here that needs to be imported. These are net new deployments that need the mock outputs to work without requiring real resources to exist in advance.

Felipewdc commented 2 years ago

Seeing the same issue.

bargokr commented 2 years ago

any update to this? we are seeing this issue as well and it is a massive pain point when deploying to different subscriptions.

bargokr commented 1 year ago

bump - this is a priority for our organization. Is anyone actively working to resolve this?

fabianboerner commented 1 year ago

now ran into the same issue cant execute terragrunt run-all plan to actually deploy a service function without an existing deployment

nnsense commented 1 year ago

Same here: 2 folders deployment, a secret for secret manager is created into the first folder, the second folder terrafrom is supposed to read it and use it into an RDS deployment, mock value added as part of outputs from the first folder:

Error: Secrets Manager Secret "mock-secret" not found

(mock-secret is the name of the secret set as output mock)

bdorplatt commented 1 year ago

@denis256 Could you take a second look at this? I believe this was originally misdiagnosed as something needing to be imported. These are net new deployments that need the mock outputs to work without requiring real resources to exist in advance.

Most recently, one of the "real" app service plans that we had our mock outputs pointing to as a workaround was decommissioned. This resulted in further deployments failing until we pointed the mock output to another "real" ASP.

We need to be able to use a dummy value instead of a real resource ID of an existing resource. That is the entire purpose of mock outputs and they are not working here as they work with every other resource we have deployed thus far.

Instead of a real value, we need to be able to do this: mock_outputs = { asp_id = "/subscriptions/11111111-1111-1111-1111-111111111111/resourceGroups/mockrg/providers/Microsoft.Web/serverfarms/mockasp" }

dave0783 commented 1 year ago

I'm experiencing the same issue, please fix.

denis256 commented 1 year ago

Hi, can be shared example code/repository where this issue happens?

bdorplatt commented 1 year ago

We don't have the code in a public repo but the example provided should hopefully be enough to reproduce:

Function App resource: Main.tf resource "azurerm_windows_function_app" "function_app" { app_settings = var.fa_app_settings location = var.region name = "test-azf" resource_group_name = var.resource_group_name service_plan_id = var.fa_service_plan_id }

Dependency and Input: Terragrunt.hcl include { path = find_in_parent_folders() } dependency "app_service_plan01" { config_path = "../../appserviceplan/appserviceplan01" mock_outputs = { asp_id = "/subscriptions/11111111-1111-1111-1111-111111111111/resourceGroups/mockrg/providers/Microsoft.Web/serverfarms/mockasp" } mock_outputs_allowed_terraform_commands = ["validate", "plan"] mock_outputs_merge_strategy_with_state = "shallow" }

inputs = { fa_service_plan_id = dependency.app_service_plan01.outputs.asp_id }

bdorplatt commented 1 year ago

@denis256 It's great to see this added to the roadmap. Any timeline on getting this one fixed?

yhakbar commented 3 months ago

Hey @bdorplatt ,

I'd like to understand this issue better. Please confirm if my understanding is correct, and let me know if my response makes sense.

My understanding of the issue is that the resource azurerm_windows_function_app throws an error during applies with mocked values for the attribute service_plan_id, because the Azure provider is trying to lookup a corresponding service plan with an ID that doesn't exist within Azure.

It attempts to make a network request to Azure to fetch information about the service_plan_id, and returns the following error (formatted to make it easier to parse):

Error: could not read new Service Plan to check tier 
Service Plan: (Serverfarm Name "mock-asp" / Resource Group "mock-rg"): web.AppServicePlansClient#
Get: Failure responding to request: StatusCode=404 -- 
Original Error: autorest/azure: Service returned an error. Status=404 
Code="ResourceGroupNotFound" 
Message="Resource group 'mock-rg' could not be found."

This is an error returned from the OpenTofu/Terraform Azure provider based on the value passed in at the OpenTofu/Terraform layer. So on plan, OpenTofu/Terraform attempts to make a network request using the value set on service_plan_id.

Assuming I'm understanding the issue correctly, what you're looking to accomplish with Terragrunt is populate the value of fa_service_plan_id with something that the provider will use in such a way that the network request it emits to Azure won't return a 404 HTTP status code like with the value of https://.../mock-rg used from the mocked value.

In order to accomplish this, you'll want to use Terragrunt to populate the input such that the provider will submit a network request and get a 200 HTTP status code back, which will allow you to finish the plan.

There are two ways you can accomplish this.

First, you can hard-code a valid asp_id:

dependency "app_service_plan01" {
    config_path = "../../appserviceplan/appserviceplan01"
    mock_outputs = {
        asp_id = "/subscriptions/<a real id>/resourceGroups/<a real resource group id>/providers/Microsoft.Web/serverfarms/<a real value>"
    }
    mock_outputs_allowed_terraform_commands = ["validate", "plan"]
    mock_outputs_merge_strategy_with_state = "shallow"
}

inputs = {
    fa_service_plan_id = dependency.app_service_plan01.outputs.asp_id
}

What this will do is make it so that the network request that's sent by the provider always sends the request to that hard-coded value, regardless of the context of how it's being used.

Note that if you do this, the plan that you will see will show the value of the service_plan_id set to that hard-coded value. During applies, the real value will be fetched from the dependency, so you will end up with the real provisioned resource being referenced.

If the resource is going to be deployed in many different contexts, and you cannot use a single hard-coded value, see if you can leverage something like run_cmd to solve this problem.

I don't know much about the Azure CLI, and I don't have access to an Azure account, but you would presumably be able to combine the command az appservice plan show with run_cmd to accomplish something like the following:

dependency "test" {
  config_path = "test"

  mock_outputs = {
    mock_id = run_cmd("--terragrunt-quiet", "date")
  }
  mock_outputs_allowed_terraform_commands = ["validate", "plan"]
  mock_outputs_merge_strategy_with_state = "shallow"
}

generate "main" {
  path = "main.tf"
  if_exists = "overwrite"
  contents = <<EOF
variable "mock_id" {}

output "mock_id" {
  value = var.mock_id
}
EOF
}

inputs = {
  mock_id = dependency.test.outputs.mock_id
}

In this example, you can see that the value used for mock_id is something dynamic like date. When the dependency test doesn't have any values, Terragrunt will instead execute the date command, and use that as the value for mock_id. Thus, my plan looks like this:

❯ terragrunt plan

Changes to Outputs:
  + mock_id = "Fri Jul 19 14:43:51 EDT 2024"

In your use-case, you would replace this: mock_id = run_cmd("--terragrunt-quiet", "date") with something using the Azure CLI to return a valid fa_service_plan_id, then pass that through when mocked.

Let me know if that helps, or if I've misunderstood your problem.

bdorplatt commented 3 months ago

Hello @yhakbar the solution you are suggesting, using a real resource ID is the exact workaround we are currently implementing. We would like a real solution to this where using a generic mock value will not return an error. This works correctly with every other Azure resource type we have coded mock inputs for. azurerm_windows_function_app is the only one resource we see this issue with, and as mentioned in my original submission, the exact same setup with the deprecated azurerm_function_app works as expected and honors the mock values. So certainly, this should be something that can be fixed, and not use the workaround as a permanent solution.

yhakbar commented 3 months ago

Hey @bdorplatt ,

Is it clear that the thing returning an error is the Azure OpenTofu/Terraform provider?

This is where that would be adjusted: https://github.com/hashicorp/terraform-provider-azurerm

Terragrunt provides tooling for orchestrating updates to Terraform/OpenTofu, but it's the providers that actually send the network requests based on the inputs they're given.

If you would like to have it so that the provider will accept a mock value as input to the service_plan_id attribute of azurerm_windows_function_app, that's something the team managing the Azure provider will have to change. We have no control over that.

bdorplatt commented 3 months ago

Why does this work for every other resource? Why is this one different from all of the others? If this is a provider issue interacting with Terragrunt, can someone from Gruntwork team engage the AzureRM provider team to find a fix?

yhakbar commented 3 months ago

According to the explanation in https://github.com/hashicorp/terraform-provider-azurerm/issues/17376, this works on other resources because they don't need to perform a schema check based on the service plan's configuration to determine whether or not the value is appropriate.

Maybe I'm misunderstanding, but didn't the maintainers of the AzureRM provider indicate that this behavior was intentional in that issue? I don't think they're open to discussing a fix, as they want the resource to work this way.

To confirm, you can replicate this issue using OpenTofu/Terraform directly, correct? Terragrunt calls one of those tools installed locally on your machine, which uses the AzureRM provider to dispatch API requests to Azure.

What Terragrunt is going to be able to do is help you set up appropriate values expected by the provider in OpenTofu/Terraform.

bdorplatt commented 3 months ago

That doesn't make sense since this also works for azurerm_windows_web_app which interacts with the azurerm_service_plan in the same way. In the referenced Terraform issue that we opened, it was concluded that this was a Terragrunt issue: "this is specific to Terragrunt mocking and isn't a provider bug " @denis256 added this issue to the roadmap, so surely there is something that can be done to fix it on the Terragrunt side.

yhakbar commented 3 months ago

I'll make sure to double-check with him, but as far as I can tell, there isn't actually a fix that Terragrunt can make that will result in the provider accepting an invalid value for the service_plan_id on the azurerm_windows_function_app resource.

How are you looking to have this issue solved, though? The provider needs a valid value for service_plan_id, right? Would a solution that fetches the ID for a different service plan as the mocked value be sufficient?

yhakbar commented 3 months ago

Hey @bdorplatt ,

I've confirmed with @denis256 , and we were adding this to the roadmap to investigate why the issue happened and to document potential workarounds. I can see why that was confusing, and agree that putting it on the roadmap can reasonably give the impression that there's a fix that can be made when there isn't.

We'll update our process to account for this to make it clear that something is either going on the roadmap to be worked on in the future, or give an explanation on the issue why else it is being placed on the roadmap.

bdorplatt commented 3 months ago

If Gruntwork can't fix this then who is able to? Is it Microsoft, Hashicorp? The inconsistency hasn't been addressed as to why azurerm_windows_web_app which interacts with the azurerm_service_plan works with a mock (not real) value but azurerm_windows_function_app which interacts with the azurerm_service_plan in the same way does not.

Using a real value as a mock input, which we have been using as a workaround all along, defeats the purpose of using mock inputs. If the real service plan that is referenced goes away, our code becomes broken.

odgrim commented 3 months ago

Hello! I tried to reach out to Felipe to try to schedule a call about this.

If you can, please reach out to support@gruntwork.io so we can set up a call. We want to chat about what the problem is and why it's happening and why it feels inconsistent in a medium that has a little more interaction.

FWIW, I just ran into the exact same issue plotting multi-account examples using mock outputs for subnet id's in the AWS provider.