Open timwebster9 opened 2 years ago
Hi @timwebster9, I tried to reproduce this issue with the following configuration for dozens of times, but the issue never happened. How often did you meet this issue ? Could you provide an executable configuration which can steadily reproduce such issue ? Then I can do more investigation on it. Thanks.
resource "azurerm_resource_group" "test" {
name = "demo-rg"
location = "westus2"
}
resource "azurerm_virtual_network" "test" {
name = "acctest-vnet-demo"
location = azurerm_resource_group.test.location
resource_group_name = azurerm_resource_group.test.name
address_space = ["10.0.0.0/16"]
}
resource "azurerm_subnet" "public" {
name = "acctest-sn-public-demo"
resource_group_name = azurerm_resource_group.test.name
virtual_network_name = azurerm_virtual_network.test.name
address_prefixes = ["10.0.1.0/24"]
delegation {
name = "acctest"
service_delegation {
name = "Microsoft.Databricks/workspaces"
actions = [
"Microsoft.Network/virtualNetworks/subnets/join/action",
"Microsoft.Network/virtualNetworks/subnets/prepareNetworkPolicies/action",
"Microsoft.Network/virtualNetworks/subnets/unprepareNetworkPolicies/action",
]
}
}
}
resource "azurerm_subnet" "private" {
name = "acctest-sn-private-demo"
resource_group_name = azurerm_resource_group.test.name
virtual_network_name = azurerm_virtual_network.test.name
address_prefixes = ["10.0.2.0/24"]
delegation {
name = "acctest"
service_delegation {
name = "Microsoft.Databricks/workspaces"
actions = [
"Microsoft.Network/virtualNetworks/subnets/join/action",
"Microsoft.Network/virtualNetworks/subnets/prepareNetworkPolicies/action",
"Microsoft.Network/virtualNetworks/subnets/unprepareNetworkPolicies/action",
]
}
}
}
resource "azurerm_subnet" "privatelink" {
name = "acctest-snpl-demo"
resource_group_name = azurerm_resource_group.test.name
virtual_network_name = azurerm_virtual_network.test.name
address_prefixes = ["10.0.3.0/24"]
enforce_private_link_endpoint_network_policies = true
}
resource "azurerm_network_security_group" "nsg" {
name = "acctest-nsg-demo"
location = azurerm_resource_group.test.location
resource_group_name = azurerm_resource_group.test.name
}
resource "azurerm_subnet_network_security_group_association" "public" {
subnet_id = azurerm_subnet.public.id
network_security_group_id = azurerm_network_security_group.nsg.id
}
resource "azurerm_subnet_network_security_group_association" "private" {
subnet_id = azurerm_subnet.private.id
network_security_group_id = azurerm_network_security_group.nsg.id
}
resource "azurerm_databricks_workspace" "test" {
name = "acctestDBW-demo"
resource_group_name = azurerm_resource_group.test.name
location = azurerm_resource_group.test.location
sku = "premium"
managed_resource_group_name = "acctestRG-DBW-demo-managed"
public_network_access_enabled = false
network_security_group_rules_required = "NoAzureDatabricksRules"
custom_parameters {
no_public_ip = true
public_subnet_name = azurerm_subnet.public.name
private_subnet_name = azurerm_subnet.private.name
virtual_network_id = azurerm_virtual_network.test.id
public_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.public.id
private_subnet_network_security_group_association_id = azurerm_subnet_network_security_group_association.private.id
}
tags = {
Environment = "Production"
Pricing = "Premium"
}
}
resource "azurerm_private_endpoint" "databricks" {
name = "acctest-endpoint-demo"
location = azurerm_resource_group.test.location
resource_group_name = azurerm_resource_group.test.name
subnet_id = azurerm_subnet.privatelink.id
private_service_connection {
name = "acctest-psc-demo"
is_manual_connection = false
private_connection_resource_id = azurerm_databricks_workspace.test.id
subresource_names = ["databricks_ui_api"]
}
}
hi @ms-zhenhua,
I've included the exact configuration in the attached zip file. The only changes I've made to it are the tag values, and removed the tenant ID from the Key Vault ARM template (which you can just sub with your own).
I haven't tried to reproduce this in a different environment, so hopefully you can reproduce it. I was hoping the 5 minute thing would trigger someone's memory, as it seems like a deliberate timeout/wait setting somewhere.
This was normally run from a private Azure DevOps build agent using a private-linked storage account for Terraform state. It is the only configuration to have the issue. It was just a testing pipeline so I've disabled it for now (it also had other general flakiness issues).
I also have the same issue with a private endpoint. For me though, this is any action after the endpoint is created, including terraform plan
.
I see a similar issue here: https://github.com/hashicorp/terraform-provider-azurerm/issues/16754
terraform version: v1.3.5 azurerm version: v3.39.1
When checking the logs to see the time delay, I see a message that I do not see for any other resource, and relates to timeouts:
.timeouts: planned for existence but config wants absence
2023-03-01T17:44:00.168Z [WARN] Provider "registry.terraform.io/hashicorp/azurerm" produced an invalid plan for module.uks_recovery_services_vaults_backup["1"].azurerm_private_endpoint.secondary_recovery_services_vault_private_endpoints["AzureBackup"], but we are tolerating it because it is using the legacy plugin SDK. The following problems may be the cause of any confusing errors from downstream operations:
- .custom_network_interface_name: planned value cty.StringVal("") for a non-computed attribute
- .timeouts: planned for existence but config wants absence
- .private_service_connection[0].request_message: planned value cty.StringVal("") for a non-computed attribute
- .private_service_connection[0].private_connection_resource_alias: planned value cty.StringVal("") for a non-computed attribute 2023-03-01T17:48:56.950Z [WARN]..... next log
If I look in the state file, timeouts for every other resource looks like this:
"timeouts": null
but timeouts for the private endpoint looks like this:
"timeouts": {
"create": null,
"delete": null,
"read": null,
"update": null
}
I can't see any difference in implementation in the provider though.
Not sure if any of this helps.
In terms of 5 minute delays, I've seen this when you hit your API limits on Azure. So, depending on what you're doing, if you pass the allotted amount of API calls in 5 minutes, you will get locked, possibly for 5 minutes. Just a guess. It's the only place I've seen a '5 minute' delay before. I get it with Azure Storage Explorer or if I'm doing too many API calls creating/destroying storage accounts/blobs. TF hangs until the API opens up again and lets more calls through.
This is related to #20257. If you cannot connect to the storage account over the data plane (e.g. because it's only available from a private endpoint and/or DNS resolution fails), then terraform hangs for 5 minutes, and then seems to carry on without error.
Community Note
Terraform (and AzureRM Provider) Version
Affected Resource(s)
azurerm_private_endpoint
Terraform Configuration Files
I'm not certain this matters - it seems to usually be the private endpoint it hangs on, but in the log snippets below you'll see hangs on different types of resources.
Debug Output
I have a complete trace file, but cannot include here for privacy reasons. Below are snippets of what it's stuck on.
NOTE: these are from two different builds. In the first one the delay seems to be on a databricks workspace, and the second one seems to be on a private endpoint.
without trace (note the 5 minute delay between log messages):
with trace (this continues for exactly 5 minutes):
Panic Output
n/a
Expected Behaviour
It shouldn't hang
Actual Behaviour
It hangs at this point for pretty much exactly 5 minutes. After the 5 minute delay, it continues on successfully.
Steps to Reproduce
run 'terraform destroy` on the configuration
Important Factoids
n/a
References
n/a