Closed AndyThurgood closed 1 week ago
Thanks for raising this. Can you please send the blocked out content to acasupport at microsoft.com. We would need your subscription, app and environment name please.
acasupport at microsoft.com
Hi @simonjj I've sent that detail across. Thanks
I'm having the same issue, also in UK South.
I am also seeing the same problem on multiple container apps in the uksouth region
Just to update the thread. We're investigating this issue and will update once it's resolved.
same issue here in West Europe
in our case reducing the max number of container instances resolves the problem. With 3 instances we would end up with 6 cores, which is more than the allowed 4:
With max 2 container instances, both container instances start without any problems.
scratch that thought, 3 instances running now:
Looks like something got fixed in Azure
Any update on this? This is a major problem..
I'm seeing something very similar.. specifically errors such as this:
{"TimeStamp":"2024-07-12 14:46:40 \u002B0000 UTC","Type":"Normal","ContainerAppName":"{CONTAINERAPPNAME}","RevisionName":"{REVISIONNAME}","ReplicaName":"{REPLICANAME}","Msg":"Replica {REPLICANAME} has been scheduled to run on a node.","Reason":"AssigningReplica","EventSource":"ContainerAppController","Count":0}
{"TimeStamp":"2024-07-12 14:47:23 \u002B0000 UTC","Type":"Warning","ContainerAppName":"{CONTAINERNAME}","RevisionName":"{REVISIONNAME}","ReplicaName":"","Msg":"ScaledObject doesn\u0027t have correct triggers specification","Reason":"ScaledObjectCheckFailed","EventSource":"KEDA","Count":9}
While terraform is used to deploy the containers, nothing has been changed with min/max replicas. Everything that has been deployed/changed since July 9th, the new revision isn't deploying and sits in a state of activating showing 0/0 ready with -Infinity
restarts.
Increasing to 1 min/2 max did not fix my issue
Edit: If it matters, our CAE and containers are on the consumption model.
An update from the original reporter. We have mitigated this issue for now by forcing our containers to never scale to zero, but we are fearful of those containers being force restarted by Azure.
If it helps, the only way we seem to be able to get those services back online, was to repeatedly scale the revision up and down in terms of minimum instances. Even then, it took sometime 15 minute plus for the revision to activate.
From our perspective, this was a really challenging issue because we had no way to diagnose what was happening other than looking at the limited logs that were (sometimes) being generated by the system console of a container, even then we didn't get any information that pointed to what was happening.
When a container eventually failed to activate, the azure portal had no information in any of the logs, meaning we were pretty stuck.
This seems like a capacity issue, which is (I think) why restarting the containers again and again eventually got our containers online.
The only way we were able to get the container to activate was to delete the container app and redeploy, the issue did eventually reappear when the container scaled to 0 and then tried to scale back up, but it's been intermittent, its been working "ok" today
This issue should be resolved now. The impact should have been limited to uksouth. Please notify us here if there continue to be issues with spinning up new revisions or replicas.
This impacted us in Central US as well. Looks to be resolved now
We are facing this issue here in Western Europe.
This issue is happening on our various container apps environments as well (dev, quality, production). We are also using the consumption model and are hosting this in West Europe. We opened a Microsoft support case, hopefully it gets resolved soon.
Same issue on all environments in West-Europe. Consumption workload profiles
We have the same issue in all environments in West Europe. New replicas are not activating.
I'm also having this issue in West-Europe, all my replica's are down and won't start. Scaling up and down like somebody suggested seems to work sometimes.
Seeing this error:
"ScaledObject doesn't have correct triggers specification","Reason":"ScaledObjectCheckFailed","EventSource":"KEDA"
I also had this issue with a customer's prod deployment in West Europe yesterday. Only thing that helped was throwing the whole CA away and redeploying (which I know might not be feasible for some unfortunately).
There might be a few more regions which exhibited this behavior we're mostly cleaned this up across the globe. Please open a new issue if it should pop up again. Thank you all for being patient/diligent/friendly with us.
This issue is a: (mark with an x)
Issue description
A clear and concise description of the observed issue.
We appear to be seeing the same issue as raised in this issue last week.
We have 4 Container app environments, and since this morning none of our app (20+ per env) are able to start a new revision. This started originally in our test environments as those container apps scale to zero and when traffic hit those environments, none of the apps were able to start.
It is now impacting our production environment, as those apps scale down overnight.
The limited logs that are generated on a revision are as per below, every 2 minutes it looks like the environment attempts to assign a replica, until the startup process fails.
If we tweak the scale rules to force at least one active revision, we see a new revision spawn, but the new revision will not start, and it appears that no replica ever gets assigned to the replica. The previous revision is never removed, as per below:
Its worth noting that we haven't made any changes to our images for these services, and we haven't seen any issues in these environments in the last 6 months.
We have also tried destroying a test environment, and recreating, which hasn't resolved the issue.
Steps to reproduce
Expected behavior [What you expected to happen.]
We expect that a container instance/revision should start
Actual behavior [What actually happened.]
No container apps are able to start
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Ex. Did this issue occur in the CLI or the Portal?
Azure Environment: UK South Workload Profile: Consumption (4vCPU / 8GB memory)