hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.71k stars 9.55k forks source link

Inconsistent refresh order between an update-only/no change apply, and an apply with destroy operations #31437

Open magodo opened 2 years ago

magodo commented 2 years ago

Terraform Version

v1.2.5

Terraform Configuration Files

provider "azurerm" {
  features {}
}
resource "azurerm_resource_group" "test" {
  name     = "acctestRG-220714100747456849"
  location = "westeurope"
}

resource "azurerm_virtual_network" "test" {
  name                = "acctestvirtnet220714100747456849"
  address_space       = ["10.0.0.0/16"]
  location            = azurerm_resource_group.test.location
  resource_group_name = azurerm_resource_group.test.name
  subnet {
    name           = "subnet1"
    address_prefix = "10.0.1.0/24"
  }
}

resource "azurerm_subnet" "test" {
  name                 = "internal"
  resource_group_name  = azurerm_resource_group.test.name
  virtual_network_name = azurerm_virtual_network.test.name
  address_prefixes     = ["10.0.2.0/24"]
}

Steps to Reproduce

  1. Run terraform apply -auto-approve
  2. Run terraform plan/terraform apply to observe the refreshing order, which is: azurerm_resource_group -> azurerm_virtual_network -> azurerm_subnet
  3. Comment the azurerm_virtual_network and azurerm_subnet
  4. Run terraform apply/terraform plan to observe the refreshing order, which is then azurerm_resource_group -> azurerm_subnet -> azurerm_virtual_network (in fact, they are read in parallel)

I'm not sure whether this is a bug or by design. In my specific use case, I'd want the order in step 4 is the same as in step 2, that during the "refresh" stage, it should honor the dependency as is recorded in the state file, to do the read. Typically, for the azurerm provider this dependency is fatal for us to workaround some issues, e.g. https://github.com/hashicorp/terraform-provider-azurerm/issues/11059, where we implemented a cache to store information of a parent resource, which then will be read by the child resource (this depends on the parent resource), to avoid the child resource to call some problematic API.

jbardin commented 2 years ago

Hi @magodo,

Thanks for filing the issue. This is working as intended, and I'd like to see if we can figure out exactly what the underlying problem is here before deciding a new behavior is warranted. When you remove the configuration for the existing resources, the plan phase, which is driven via configuration, has no references between these resources and hence no order. This should not be a problem, because the order of operations should only matter when there are side-effects from the operation, and ReadResource should have no side-effects. The dependencies stored in the state are not used for planning, since that is when those stored dependencies are updated. A case might be made for using these dependencies during plan only when the configuration has been removed, but that would take some research to determine if it still creates a valid plan graph in all cases.

I'm not sure I understand how this related to the linked issue. Could you provide a summary of what was concluded over there?

Thanks!

jbardin commented 2 years ago

ok, I re-read your initial comment again and I think I get what you are trying to do -- using the order of operations to cache some state from one resource for the next, hence creating a hidden side-effect which Terraform does not expect. This is definitely not something we want to encourage, but it's still not clear to me if there is another solution which works within the existing provider protocol. What exactly is the information you need to cache between resources? Could it be stored from the prior apply operation in the state (either as a computed attribute or out of band in the private state data)?

apparentlymart commented 2 years ago

@jbardin already said most of what I was going to say... I also wanted to add: refreshing "respecting dependencies" during a normal plan is, I think, just a side-effect of us having combined the refresh step into the plan step in an earlier version of Terraform. The goal of that was to order data resource reads correctly with regard to the managed resource planning operations they depend on, but not explicitly to make the managed resource refreshes be ordered.

Our current protocol considers every managed resource instance to be independent from any other with regard to refreshing, and so any controlled ordering Terraform currently does is an implementation detail which has changed over time and is likely to change in future just because these details tend to often change as a side-effect of other changes, even if we aren't intentionally changing the refresh order in particular. (The switch to doing refreshing during planning above being a prominent example, but not the only one.)

Trying for opportunistic optimization if requests happen to arrive in a suitable order is a reasonable idea for operations that can be particularly slow otherwise and where the data tends to be requested all together. But I would recommend making sure that you can still support the operations being in a less-ideal order to, because there are various reasons beyond just Terraform's graph traversal process that things might happen in an unusual order.

It seems like the underlying need here is for an ability for a provider to explicitly tell Terraform Core in the schema that there is an optimization opportunity if we can combine read requests for all of these resource types into a single provider call. That would then be a particular example of the "batch request" functionality we've been discussing for many years now, but sadly I could not quickly find the open issue that describes that in order to tag this new example into it.

magodo commented 2 years ago

Thanks @apparentlymart and @jbardin! I totally understand that refresh is not necessarily to be ordered. Also, I agree that it is not ideal that one resource depends on a side effect of another resource's change.

What exactly is the information you need to cache between resources?

In our case, there is a key vault and a key vault key (abbreviated as key below). The key vault's ID is in its mgmt plane form, while the key's ID is in its data plane form. So when doing CRUD on the key, we will need the key vault ID. This is intuitive for create as that resource already has the key_vault_id property. For read/update/delete, we normally tend to resolve the parent id (the key vault) from the current id. Since the current id only has one information about the parent id, it's name, we have to call Azure resource list API with a query to retrieve the exact resource id of the key vault. Due to there are several issues with this API, we are trying to avoid calling it. So for update and delete, as @jbardin has mentioned, we should be able to just read the key_vault_id from the state (note that property is a required and force new one), I have a PR for this: https://github.com/hashicorp/terraform-provider-azurerm/pull/17536. The last bit is for the read. If the resource is created by terraform, then it should still be able to read the key vault id from state.