hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.45k stars 4.53k forks source link

Azurerm | CosmosDB | Failure sending request: StatusCode=504 -- Original Error: context deadline exceeded #19455

Open andigwandi opened 1 year ago

andigwandi commented 1 year ago

Is there an existing issue for this?

Community Note

Terraform Version

1.2.9

AzureRM Provider Version

3.22.0

Affected Resource(s)/Data Source(s)

azurerm_cosmosdb_sql_container

Terraform Configuration Files

module "cosmos_db_container_master_data" {
  source     = "../shared/cosmos/container"
  depends_on = [module.cosmos_db]

  env_config = local.env_config
  container_config = {
    container_name         = "MasterData"
    account_name           = module.cosmos_db.db_config.account_name
    db_id                  = module.cosmos_db.db_config.db_id
    db_name                = module.cosmos_db.db_config.db_name
    connection_string      = module.cosmos_db.db_config.connection_strings[0]
    primary_key            = module.cosmos_db.db_config.primary_key
    read_endpoint          = module.cosmos_db.db_config.read_endpoints[0]
    write_endpoint         = module.cosmos_db.db_config.write_endpoints[0]
    partition_key_version  = local.cosmos_container_master_data_config.partition_key_version
    throughput             = local.cosmos_container_master_data_config.throughput
    default_ttl            = local.cosmos_container_master_data_config.default_ttl
    analytical_storage_ttl = local.cosmos_container_master_data_config.analytical_storage_ttl
    partition_key_path     = local.cosmos_container_master_data_config.partition_key_path
    autoscale_settings     = local.cosmos_container_master_data_config.autoscale_settings
  }

  indexing_policy = [{
    excluded_path = [{
      path = "/*"
    }]
    included_path = [{
      path = "/type/?"
    }]
    indexing_mode = "consistent"
  }]
}

Debug Output/Panic Output

I am seeing different errors related to 'context deadline' while re-running the same pipeline

╷
│ Error: reading CosmosDB Account "azr-ps2-cdb-dev10-r1" (Resource Group "azr-ps2-rg-01-r1"): documentdb.DatabaseAccountsClient#Get: Failure sending request: StatusCode=504 -- Original Error: context deadline exceeded
│ 
│   with module.ps2.module.cosmos_db_container_master_data.azurerm_cosmosdb_sql_container.container,
│   on ../modules/shared/cosmos/container/main.tf line 7, in resource "azurerm_cosmosdb_sql_container" "container":
│    7: resource "azurerm_cosmosdb_sql_container" "container" {

###################################

╷
│ Error: reading Throughput on Cosmos SQL Container PendingSalesTransactions (Account: "azr-ps2-cdb-dev10-r1", Database: "ps2") ID: documentdb.SQLResourcesClient#GetSQLContainerThroughput: Failure sending request: StatusCode=504 -- Original Error: context deadline exceeded
│ 
│   with module.ps2.module.cosmos_db_container_pending_sales_transactions.azurerm_cosmosdb_sql_container.container,
│   on ../modules/shared/cosmos/container/main.tf line 7, in resource "azurerm_cosmosdb_sql_container" "container":
│    7: resource "azurerm_cosmosdb_sql_container" "container" {

Expected Behaviour

Terraform Plan should generate the changes in the resources without any issue

Actual Behaviour

Stage: Terraform Plan

I am seeing different errors related to the 'context deadline' while re-running the same pipeline. Both errors are around cosmos db and after 2-3 retries it is proceeding further.

│ Error: reading CosmosDB Account "azr-ps2-cdb-dev10-r1" (Resource Group "azr-ps2-rg-01-r1"): documentdb.DatabaseAccountsClient#Get: Failure sending request: StatusCode=504 -- Original Error: context deadline exceeded
│ 
│   with module.ps2.module.cosmos_db_container_master_data.azurerm_cosmosdb_sql_container.container,
│   on ../modules/shared/cosmos/container/main.tf line 7, in resource "azurerm_cosmosdb_sql_container" "container":
│    7: resource "azurerm_cosmosdb_sql_container" "container" {
│ Error: reading Throughput on Cosmos SQL Container PendingSalesTransactions (Account: "azr-ps2-cdb-dev10-r1", Database: "ps2") ID: documentdb.SQLResourcesClient#GetSQLContainerThroughput: Failure sending request: StatusCode=504 -- Original Error: context deadline exceeded
│ 
│   with module.ps2.module.cosmos_db_container_pending_sales_transactions.azurerm_cosmosdb_sql_container.container,
│   on ../modules/shared/cosmos/container/main.tf line 7, in resource "azurerm_cosmosdb_sql_container" "container":
│    7: resource "azurerm_cosmosdb_sql_container" "container" {

Steps to Reproduce

No response

Important Factoids

No response

References

No response

sinbai commented 1 year ago

@andigwandi thanks for opening this issue here. Could you provided the raw Terraform config and repro steps as terraform module is not enough for reproduction and troubleshooting?

Beside, since the timeouts could be defined in tf config as follows, could you update the reading timeout( e.g. extend the timeout for reading to 20m) to see if that fixes the issue?

resource "azurerm_cosmosdb_sql_container" "container" {
...
...
...

timeouts {
    read = "20m"
  }
}
sam-h-bean commented 1 year ago

This has been happening to me as well. There definitely seems to have been some regression with refreshing the state of Cosmos infrastructure via Terraform.

I've been seeing errors like

Error: [0m Error: [ERROR] Unable to List connection strings for CosmosDB Account my-account: documentdb.DatabaseAccountsClient#ListConnectionStrings: Failure sending request: StatusCode=504 -- Original Error: context deadline exceeded
andigwandi commented 1 year ago

@andigwandi thanks for opening this issue here. Could you provided the raw Terraform config and repro steps as terraform module is not enough for reproduction and troubleshooting?

Beside, since the timeouts could be defined in tf config as follows, could you update the reading timeout( e.g. extend the timeout for reading to 20m) to see if that fixes the issue?

resource "azurerm_cosmosdb_sql_container" "container" {
...
...
...

timeouts {
    read = "20m"
  }
}

Here is the configuration for the example given in the issue:

cosmos_container_config_master_data = { analytical_storage_ttl = -1 autoscale_enabled = false autoscale_settings = [{ max_throughput = 1000 }] throughput = 400 }

other configurations can be hardcoded like name, db_name etc.

This error comes when I execute terraform plan to generate the changes.

andigwandi commented 1 year ago

This has been happening to me as well. There definitely seems to have been some regression with refreshing the state of Cosmos infrastructure via Terraform.

I've been seeing errors like

Error: [0m Error: [ERROR] Unable to List connection strings for CosmosDB Account my-account: documentdb.DatabaseAccountsClient#ListConnectionStrings: Failure sending request: StatusCode=504 -- Original Error: context deadline exceeded

I also received similar kind of error for one of my pipeline as well:

Error: [ERROR] Unable to List read-only keys for CosmosDB Account my-account: documentdb.DatabaseAccountsClient#ListReadOnlyKeys: Failure sending request: StatusCode=504 -- Original Error: context deadline exceeded
philspencer-owd commented 6 days ago

I am oddly getting this with azurerm_security_center_contact and the error: retrieving Contact: (Security Contact Name "Platform Team"): security.ContactsClient#Get: Failure sending request: StatusCode=504 -- Original Error: context deadline exceeded

prietolu commented 5 days ago

On Mon June 24th I started to get the same error message with azurerm_security_center_contact when running the same Terraform script on different Azure subscriptions that host different environments ( PROD, PREPROD, etc ):

Error: Reading Security Center Contact: security.ContactsClient#Get: Failure sending request: StatusCode=504 -- Original Error: context deadline exceeded

However, it seems that today Wed 26th is working fine again , and we stopped receiving that error message for every environment .

It looks like there was a temporary problem retrieving information for azurerm_security_center_contact resources . Please, @philspencer-owd , can you confirm if you´re still having this error message today ?

philspencer-owd commented 4 days ago

@prietolu Confirmed this is now working again for all our environments as well!