Open maskati opened 11 months ago
Unfortunately I'm seeing the same issue. One of our apis went down for the same reason, restarting the container fixes the issue. Is there anyone from azure to take a look on it?
Could be the same issue in my case!
After healthcheck ProbeFailure, it couldn restart the container because of ImagePullFailure
my Terraform
// Identity to handle Environment access to other resources
resource "azurerm_user_assigned_identity" "containerapps_environment_identity" {
name = "${data.azurerm_container_app_environment.containerapps_environment.name}-identity"
location = var.location
resource_group_name = azurerm_resource_group.dataprocessing_rg.name
tags = var.tags_default
}
// Allow to pull images from the shared Container Registry
resource "azurerm_role_assignment" "containerapps_environment_identity_registry_rbacs" {
scope = data.azurerm_container_registry.sharedprodacr.id
// https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#acrpull
role_definition_name = "AcrPull"
principal_id = azurerm_user_assigned_identity.containerapps_environment_identity.principal_id
depends_on = [ azurerm_user_assigned_identity.containerapps_environment_identity ]
}
resource "azapi_resource" "containerapps_environment_app_my-svc-prod" {
type = "Microsoft.App/containerApps@2023-05-01"
name = "my-svc-prod"
parent_id = azurerm_resource_group.dataprocessing_rg.id
location = var.location
tags = var.tags_default
body = jsonencode({
properties = {
environmentId = data.azurerm_container_app_environment.containerapps_environment.id
workloadProfileName = "Consumption"
configuration = {
// In Single mode, a single revision is in operation at any given time.
activeRevisionsMode: "Single"
ingress = {
external = true
// Container Target Port
targetPort = 80
// http2 only does not work
transport = "auto"
stickySessions = {
affinity = "none"
}
// custom domain bindings for Container Apps' hostnames.
customDomains = [
{
bindingType = "SniEnabled"
certificateId = data.azurerm_container_app_environment_certificate.containerapps_environment_certificate.id
name = "my-svc-prod${var.environment_short == "prod" ? "" : "-${var.environment_short}"}.mydomain.com"
}
]
}
registries = [
{
server = data.azurerm_container_registry.sharedprodacr.login_server
// Resource ID for the User Assigned Managed identity to use when pulling from the Container Registry. Have to be added in identity_ids list
identity = azurerm_user_assigned_identity.containerapps_environment_identity.id
}
]
secrets = [
// ...
]
}
template = {
containers = [
{
name = "my-svc-prod"
image = "sharedprodacr.azurecr.io/my-svc-prod:${var.environment_short}"
resources = {
cpu = 1
memory = "2Gi"
}
probes = [
{
type = "Liveness"
httpGet = {
path = "/health"
port = 80
scheme = "HTTP"
}
periodSeconds = 10
}
]
env = [
// ...
]
}
]
scale = {
// Always 1 instance
minReplicas = 1,
maxReplicas = 1
}
}
}
})
identity {
type = "SystemAssigned, UserAssigned"
identity_ids = [
azurerm_user_assigned_identity.containerapps_environment_identity.id
]
}
depends_on = [
//...
azurerm_user_assigned_identity.containerapps_environment_identity
]
}
after 15 retry is has stopped. On the next day i have restart the revision manulay and it works insteed.
SourceSystem | TimeGenerated [UTC] | Computer | RawData | time_t [UTC] | Error_s | _timestamp_d | EventSource_s | Reason_s | ReplicaName_s | Type_s | RevisionName_s | EnvironmentName_s | Log_s | Count_d | ContainerAppName_s | Level | TimeStamp_s | Type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RestAPI | 22.11.2023 09:16 | 22.11.2023 09:16 | 1700644592 | ContainerAppController | ContainerCreated | my-svc-prod--eu2u6is-6c7ff49cb4-8f2d8 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Created container 'my-svc-prod' | 1 | my-svc-prod | info | 2023-11-22 09:16:31 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 22.11.2023 09:16 | 22.11.2023 09:16 | 1700644592 | ContainerAppController | ContainerStarted | my-svc-prod--eu2u6is-6c7ff49cb4-8f2d8 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Started container 'my-svc-prod' | 1 | my-svc-prod | info | 2023-11-22 09:16:31 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 22.11.2023 09:16 | 22.11.2023 09:16 | 1700644591 | ContainerAppController | PulledImage | my-svc-prod--eu2u6is-6c7ff49cb4-8f2d8 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Successfully pulled image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' in 9.8839824s | 1 | my-svc-prod | info | 2023-11-22 09:16:31 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 22.11.2023 09:16 | 22.11.2023 09:16 | 1700644580 | ContainerAppController | AssigningReplica | my-svc-prod--eu2u6is-6c7ff49cb4-8f2d8 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Replica 'my-svc-prod--eu2u6is-6c7ff49cb4-8f2d8' has been scheduled to run on a node. | 0 | my-svc-prod | info | 2023-11-22 09:16:20 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:55 | 21.11.2023 22:55 | 1700607330 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 15 | my-svc-prod | info | 2023-11-21 22:55:30 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:55 | 21.11.2023 22:55 | 1700607330 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 15 | my-svc-prod | info | 2023-11-21 22:55:30 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:50 | 21.11.2023 22:50 | 1700607030 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 15 | my-svc-prod | info | 2023-11-21 22:50:30 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:50 | 21.11.2023 22:50 | 1700607030 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 15 | my-svc-prod | info | 2023-11-21 22:50:30 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:45 | 21.11.2023 22:45 | 1700606730 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 14 | my-svc-prod | info | 2023-11-21 22:45:30 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:45 | 21.11.2023 22:45 | 1700606730 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 14 | my-svc-prod | info | 2023-11-21 22:45:30 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:40 | 21.11.2023 22:40 | 1700606430 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 13 | my-svc-prod | info | 2023-11-21 22:40:29 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:40 | 21.11.2023 22:40 | 1700606430 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 13 | my-svc-prod | info | 2023-11-21 22:40:30 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:35 | 21.11.2023 22:35 | 1700606130 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 12 | my-svc-prod | info | 2023-11-21 22:35:29 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:35 | 21.11.2023 22:35 | 1700606130 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 12 | my-svc-prod | info | 2023-11-21 22:35:29 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:30 | 21.11.2023 22:30 | 1700605830 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 11 | my-svc-prod | info | 2023-11-21 22:30:29 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:30 | 21.11.2023 22:30 | 1700605830 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 11 | my-svc-prod | info | 2023-11-21 22:30:29 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:25 | 21.11.2023 22:25 | 1700605529 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 10 | my-svc-prod | info | 2023-11-21 22:25:29 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:25 | 21.11.2023 22:25 | 1700605530 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 10 | my-svc-prod | info | 2023-11-21 22:25:29 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:20 | 21.11.2023 22:20 | 1700605229 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 9 | my-svc-prod | info | 2023-11-21 22:20:29 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:20 | 21.11.2023 22:20 | 1700605229 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 9 | my-svc-prod | info | 2023-11-21 22:20:29 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:15 | 21.11.2023 22:15 | 1700604929 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 8 | my-svc-prod | info | 2023-11-21 22:15:29 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:15 | 21.11.2023 22:15 | 1700604929 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 8 | my-svc-prod | info | 2023-11-21 22:15:29 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:10 | 21.11.2023 22:10 | 1700604629 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 7 | my-svc-prod | info | 2023-11-21 22:10:28 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:10 | 21.11.2023 22:10 | 1700604629 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 7 | my-svc-prod | info | 2023-11-21 22:10:29 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:05 | 21.11.2023 22:05 | 1700604329 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 6 | my-svc-prod | info | 2023-11-21 22:05:28 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:05 | 21.11.2023 22:05 | 1700604329 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 6 | my-svc-prod | info | 2023-11-21 22:05:28 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:00 | 21.11.2023 22:00 | 1700604029 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 5 | my-svc-prod | info | 2023-11-21 22:00:28 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 22:00 | 21.11.2023 22:00 | 1700604029 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 5 | my-svc-prod | info | 2023-11-21 22:00:28 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 21:57 | 21.11.2023 21:57 | 1700603868 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 4 | my-svc-prod | info | 2023-11-21 21:57:48 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 21:57 | 21.11.2023 21:57 | 1700603868 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 4 | my-svc-prod | info | 2023-11-21 21:57:48 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 21:56 | 21.11.2023 21:56 | 1700603788 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 3 | my-svc-prod | info | 2023-11-21 21:56:28 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 21:56 | 21.11.2023 21:56 | 1700603788 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 3 | my-svc-prod | info | 2023-11-21 21:56:28 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 21:55 | 21.11.2023 21:55 | 1700603748 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 2 | my-svc-prod | info | 2023-11-21 21:55:48 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 21:55 | 21.11.2023 21:55 | 1700603748 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 2 | my-svc-prod | info | 2023-11-21 21:55:48 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 21:55 | 21.11.2023 21:55 | 1700603728 | ContainerAppController | PullingImage | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Normal | my-svc-prod--eu2u6is | blackbay-65a2960e | Pulling image 'sharedacr.azurecr.io/my-svc-prod-api:cb06b467b2019eb46bf1cb2677c164fe6150c99c' | 1 | my-svc-prod | info | 2023-11-21 21:55:27 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 21:55 | 21.11.2023 21:55 | 1700603728 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ImagePullFailure' | 1 | my-svc-prod | info | 2023-11-21 21:55:28 +0000 UTC | ContainerAppSystemLogs_CL | |||
RestAPI | 21.11.2023 21:55 | 21.11.2023 21:55 | 1700603715 | ContainerAppController | ContainerTerminated | my-svc-prod--eu2u6is-846bcdff67-fvf74 | Warning | my-svc-prod--eu2u6is | blackbay-65a2960e | Container 'my-svc-prod' was terminated with exit code '' and reason 'ProbeFailure' | 1 | my-svc-prod | info | 2023-11-21 21:55:15 +0000 UTC | ContainerAppSystemLogs_CL |
I might also be incorrect about the time to failure, could in fact be 1 hour instead of 24 hours. I don't have the patience to wait long periods when testing things.
We identified an issue when customer app is running on consumption workload profile with managed identity used, if container exit, it will not come up again, we have fixed issue, and the fix has deployed to all regions.
Let us know if you still see issues.
@chinadragon0515 seems to be working now, thanks!
@chinadragon0515 since last week we're experiencing this issue again in WestEurope. Any chance it was somehow re-introduced ?
hi, experiencing this issue as of last week, started on 13th Jan at 8.30am UTC, my setup is a container app environment and a container app running in west europe (Workload profiles)
Having the same issue since two days. This is urgent. Container apps are just failing randomly to pull images on production.
@chinadragon0515 could you have a look at this?
WTF!?
Experiencing the same ATM.
Seeing this in UK South too. You can see our container was OOM killed at 03:59:23 and then followed an hour of pulling image/ImagePullFailure messages (I only included the first few).
Please fix this. this is causing major downtimes for our servers in us west2. any workaround?
Please fix this. this is causing major downtimes for our servers in us west2. any workaround?
The only known workaround is to not use managed identity but tokens.
Please fix this. this is causing major downtimes for our servers in us west2. any workaround?
The only known workaround is to not use managed identity but tokens.
Thanks. but we are using admin roles not managed identities.
I have only experienced this issue with managed identity authenticated image pull, and is also the topic of this bug report. If you are experiencing issues in other contexts then it would probably be advisable to create a separate issue.
in chn also not working
@bqstony @maskati @ericxl @klemmchr @nimro @dtcos @jellehellmann @davidkarlsen We did not aware any known issue now. can you please send us an email to acasupport at microsoft dot com with your containerapp, env info so we could follow up with you?
@chinadragon0515 Will it be debugged properly, or the usual mindtree ltd support (then I cannot be bothered)? Sent an email just now - hopefully we can get to the bottom of this...
All, we investigated the issue and have identified the root cause, we did a long-term fix, but part of the fix is not deployed yet and cause the regression. We are working to revert to short term fix and expect to deploy to all regions in next two days.
In the meanwhile, we already setup auto detect and mitigation workflow, all impacts container app should have auto mitigated.
Let me know if you still see the issue. Sorry for inconvenience. thanks
@chinadragon0515 at least the mitigation has not fixed failed replicas (1/3 replicas is still down due to the issue). I will perform a restart which generally fixes the issue for some time, usually 24 hours. I will report back if the issue reoccurs on fresh replicas.
@maskati can you please send us an email to acasupport at microsoft dot com with your containerapp, env info ? I want to check why it is not mitigated? whether it is same issue.
Note the issue I mentioned is when MSI is used to pull image and somehow the replica of container app terminated like OOM, then the replica could stuck in bad state.
If you do not use MSI, then it will be a different issue and you can send your container app and env info to us to investigate more.
I am having the same issue. I had a container app job that worked a few days ago, but now it it's unable to pull the image down from my private ACR with the reason of ImagePullFailure. I deleted the job and created a fresh one but received the same error. I have the job connecting to the private ACR with admin credentials. Any thoughts?
Update on my end. I created a managed identity, gave it push/pullACR rights, and had the job use that ID to pull the container. It then worked. I checked the job again and it automatically switched it to admin, but it's working now. Really odd.
@chinadragon0515 thanks for the fix. On our systems the fix has mitigated the total crash of the replica after 15 image-pull errors.
We're still seeing some image-pull errors on some environments in the logs, but with datetime of 2024-01-25T06:42:14.4108994Z and a failure count_d up to 6 or 7.
Using this KQL to query for the image-pull issues:
ContainerAppSystemLogs_CL | where Log_s contains "was terminated with exit code '' and reason 'ImagePullFailure'" and Count_d > 0
We had one container app running multiple replicas (3) where more pull-error happened also up till today. We recreated this one and now it seems to work.
I'm still having the same issue.
1/1 Pending:ImagePullBackOff on legion when I try to get the image from a private registry (this registry is in another resource group by the way, but I send the credentials) Same issue when I run az containerapp up with --source argument instead of image.
Do we have any news? @chinadragon0515
24/02/08: Issue solved.
Hello, the issue still exists to me...UK South Region.
We are also seeing image pull failures in apps using managed identity. Multiple container app environments, all in West Europe.
Also seeing the "Pending:ImagePullBackOff on legion" in West Europe across multiple container apps right now. We are using user assigned managed identity for ACR pull rights and using Consumption Workload Profile in the app environment. It fails immediately after deployment.
Same for us in West Europe. Multiple fails after deployment
We've also been seeing the post-deployment image pull failures that @JonasSamuelsson, @technight, and @jehell25 mentioned. It's not the same behaviour as the lead post for this issue: it happens immediately during deployment of a new revision with a new image tag, rather than after ~24 hours as before.
Using ACR with managed identity auth. Deployments via Azure Pipelines task.
Restarting the failed revision manually does allow it to successfully start up and pull the image, so the issue appears to be isolated to the initial deployment.
Hi - we have identified a race condition as the root cause of the issue and are in the process of producing and rolling out a fix.
Details:
The root cause of the issue with users experiencing ImagePullBackoff errors is a race condition within the platform impacting specifically apps running in the Consumption workload profile. This condition occurs when the system inaccurately updates the in-memory token, intended for image pulls, to an empty value for some replicas during the replica creation.
Will update this issue once the rollout with the fix is completed.
We are also experiencing this on consumption based Container app jobs, with admin credentials in North Europe. If we run a single execution it works, but when running multiple execution, the majority will fail in pulling the image. Same experience if running single execution in Parallelism.
Any update on the fix?
Hello, we are waiting for this as well. Any update on ETA?
We're also experiencing this issue, only arising after a .NET 8 Upgrade on the AppService/ Dockerfile. Any more information/ ETA? @vinisoto
We are also experiencing the same issue
We are experiencing this issue from time to time using Container App Jobs. We have a scheduled execution that runs once per hour. Most runs are fine but on some days there are single or even multiple runs failing because they cannot pull the image.
We are also experiencing this issue for one of our container app environments. @vinisoto when can we expect the problem to be solved?
This issue is a: (mark with an x)
Issue description
ACA with (user assigned) managed identity authorized with AcrPull to ACR. Initial container startup is fine, and also works if the replica crashes and restarts within a day. If the replica crashes and restarts 24 hours after initial start, the restart fails with an image pull error.
Microsoft.App/containerApps/revisions
state:ContainerAppSystemLogs show repeating logs every 5 minutes:
In ACR ContainerRegistryRepositoryEvents there are no Pull operations logged as part of restart.
After restarting the revision things start working again, and this can be also seen as a successful image pull in ACR ContainerRegistryRepositoryEvents.
My assumption is that ACA is authenticating using MSI and acquiring a token for ACR authentication valid for 24 hours, but is not renewing the token after 24 hours has passed. Restarting the revision forces reacquisition of this token and image pull works.
Steps to reproduce
Expected behavior Managed identity based image pull should work on replica restart regardless of how much time has passed since initial start.
Actual behavior Managed identity based image pull on replica restart fails 24 hours after initial start.