microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps
MIT License
355 stars 27 forks source link

Health Probe configuration of initial delay #1186

Open pwinstead opened 3 weeks ago

pwinstead commented 3 weeks ago

Please provide us with the following information:

This issue is a: (mark with an x)

Issue description

Setting up health probes on a container app revision didn't seem to respect things like initial delay and period seconds. Reviewing ContainerAppSystemLogs I can see that Liveness and Readiness probes start reporting failures immediately after the new container revision is created.

Steps to reproduce

  1. Utilizing an existing Container Environment and Container App running a .Net Web API.
  2. Create a new revision of the container defining the following probe
  3. "probes": [
    {
        "type": "Liveness",
        "httpGet": {
            "path": "/health/Liveness",
            "port": 8080,
            "scheme": "HTTP"
        },
        "initialDelaySeconds": 60,
        "periodSeconds": 20,
        "timeoutSeconds": 5
    },
    {
        "type": "Readiness",
        "httpGet": {
            "path": "/health/Readiness",
            "port": 8080,
            "scheme": "HTTP"
        },
        "initialDelaySeconds": 60,
        "periodSeconds": 5,
        "timeoutSeconds": 5
    },
    {
        "type": "Startup",
        "failureThreshold": 5,
        "httpGet": {
            "path": "/health/Startup",
            "port": 8080,
            "scheme": "HTTP"
        },
        "initialDelaySeconds": 10,
        "periodSeconds": 30,
        "timeoutSeconds": 15
    }
    ]
  4. Review the system logs as the new revision is created.

Expected behavior [What you expected to happen.] Logs should indicate that Readiness and Liveness Probes are not attempted until after the initial delay.

Actual behavior [What actually happened.] Log indicate that Readiness and Liveness Probes are being attempted (and fail) immediately after the new container revision is started.

Query example:

ContainerAppSystemLogs
| where ContainerAppName == "scc-reporting"
| project TimeGenerated,  Log, Type, Reason, RevisionName
Query Results TimeGenerated Log Revision Type Reason
6/3/2024, 10:04:24.238 PM Successfully updated containerApp: scc-reporting Normal ContainerAppReady
6/3/2024, 10:04:24.246 PM Updating containerApp: scc-reporting Normal ContainerAppUpdate
6/3/2024, 10:04:24.277 PM Updating revision : scc-reporting--standard-health Normal RevisionUpdate scc-reporting--standard-health
6/3/2024, 10:04:24.461 PM Setting traffic weight of '100%' for revision 'scc-reporting--fnrgckg' Normal RevisionUpdate
6/3/2024, 10:04:24.486 PM Deactivating old revisions for ContainerApp 'scc-reporting' Normal RevisionDeactivating
6/3/2024, 10:04:24.497 PM Successfully provisioned revision 'scc-reporting--standard-health' Normal RevisionReady scc-reporting--standard-health
6/3/2024, 10:04:24.511 PM Successfully updated containerApp: scc-reporting Normal ContainerAppReady
6/3/2024, 10:04:34.331 PM Pulling image "xxxxxxxxx.azurecr.io/image.reporting:2024.06.03.56461" Normal PullingImage scc-reporting--standard
6/3/2024, 10:04:34.728 PM Successfully pulled image "xxxxxxxxx.azurecr.io/image.reporting:2024.06.03.56461" in 638.615398ms (638.627008ms including waiting) Normal ImagePulled scc-reporting--standard
6/3/2024, 10:04:34.733 PM Created container scc-reporting Normal ContainerCreated scc-reporting--standard
6/3/2024, 10:04:34.792 PM Started container scc-reporting Normal ContainerStarted scc-reporting--standard
6/3/2024, 10:04:34.803 PM Created dapr component container daprd. Please check https://learn.microsoft.com/en-us/azure/container-apps/dapr-overview Normal ContainerCreated scc-reporting--standard
6/3/2024, 10:04:34.872 PM Started dapr component container daprd. Please check https://learn.microsoft.com/en-us/azure/container-apps/dapr-overview Normal ContainerStarted scc-reporting--standard
6/3/2024, 10:04:37.777 PM Readiness probe failed: HTTP probe failed with statuscode: 500 Warning ReplicaUnhealthy scc-reporting--standard
6/3/2024, 10:04:38.785 PM Readiness probe failed: HTTP probe failed with statuscode: 500 Warning ReplicaUnhealthy scc-reporting--standard
6/3/2024, 10:04:41.708 PM Liveness probe failed: HTTP probe failed with statuscode: 500 Warning ReplicaUnhealthy scc-reporting--standard
6/3/2024, 10:04:41.717 PM Readiness probe failed: HTTP probe failed with statuscode: 500 Warning ReplicaUnhealthy scc-reporting--standard
6/3/2024, 10:05:43.010 PM Updating containerApp: scc-reporting Normal ContainerAppUpdate
6/3/2024, 10:05:43.055 PM Updating revision : scc-reporting--standard-health Normal RevisionUpdate scc-reporting--standard-health
6/3/2024, 10:05:43.181 PM Setting traffic weight of '100%' for revision 'scc-reporting--standard-health' Normal RevisionUpdate
6/3/2024, 10:05:44.194 PM Deactivating old revisions for ContainerApp 'scc-reporting' Normal RevisionDeactivating
6/3/2024, 10:05:44.229 PM Successfully provisioned revision 'scc-reporting--standard-health' Normal RevisionReady scc-reporting--standard-health
6/3/2024, 10:05:44.246 PM Successfully updated containerApp: scc-reporting Normal ContainerAppReady

Additional context

Ex. Did this issue occur in the CLI or the Portal? I utilized the Portal to create the new revision and query/stream the system logs.