microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps
MIT License
365 stars 29 forks source link

Deployment fails with timeout, unable to find out the reason why #936

Open prutsert opened 1 year ago

prutsert commented 1 year ago

Please provide us with the following information:

This issue is a: (mark with an x)

Issue description

I created a Container Apps Environment (CAE) with internal VNet only connectivity:

{
    "properties": {
        "vnetConfiguration": {
            "infrastructureSubnetId": mySubnetId
            "internal": true
        }
    }
}

I created a Key Vault, and added a secret that needs to be used by the Container Apps Job I'm about to deploy. I enabled network access restrictions on the Key Vault, so that only the VNets/Subnets provided in the allow list have access. But (here's what I found out after a couple of hours troubleshooting): I forgot to add mySubnetId to that allow list.

I created a Managed Identity, that has Get Secret permission on the Key Vault.

I am using Bicep to deploy a Container Apps Job to the CAE, with secrets referencing the Key Vault (using the Managed Identity I just created). The deployment fails, after 10 minutes, with the message: Failed to provision revision for container app 'my-container-apps-job'. Error details: Operation expired. (Code: ContainerAppOperationError). There is no hint as to what the actual reason was, why this deployment failed.

Similar behaviour is mentioned as a side effect in #646 , where deploying a Container App with an identity that has no access to the container registry, ends up with the same timeout message, without telling what is actually wrong.

Steps to reproduce

  1. Deploy Container Apps Environment with VNet support and no external IP address (not sure if this matters).
  2. Deploy Key Vault, set access restrictions, but do not add the VNet/subnet that is used by the CAE to the allow list. Create a secret in the Key Vault.
  3. Create a Managed Identity, and grant access to Get Secrets in the Key Vault.
  4. Deploy Container Apps Job using Bicep, with the Managed Identity added to the user assigned identities, and with a secret referencing the secret in the Key Vault.

Expected behavior [What you expected to happen.] A descriptive error message, stating that the deployment failed because the Key Vault couldn't be accessed. Preferably including the reason why, "code": "ForbiddenByFirewall" in this case.

Actual behavior [What actually happened.] No relevant error message at all, just a deployment timeout.

Additional context

Ex. Did this issue occur in the CLI or the Portal? The problem occurs when deploying the Container Apps Job using Bicep. Not tested what would happen when the deployment is done using az containerapp job ....

cachai2 commented 10 months ago

@prutsert confirming, this scenario is working for you currently. However, the main ask here is to improve our error messaging?

prutsert commented 10 months ago

Hi @cachai2 That's correct, I eventually managed to get things up and running, but it took a lot of troubleshooting. Detailed logging was also missing in ContainerAppSystemLogs_CL.