microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps
MIT License
370 stars 29 forks source link

Container App are not provisioned/status hangs with processing. #783

Open mortengjesing opened 1 year ago

mortengjesing commented 1 year ago

Please provide us with the following information:

This issue is a: (mark with an x)

Issue description

We have a similar development environment scriptet with terraform, that don't have the same issue. We have also verified that the container works with the configuration.

We are seeing this error in the system log Startup probe failed: connection refused and Startup probe failed: Get "http://100.100.0.17:80/_health": dial tcp 100.100.0.17:80: connect: connection refused but don't know how to resolve this.

Have tried a lot of different solution without luck.

torosent commented 1 year ago

Hi, Can you send us the subscription id, env name and app name to acasupport at microsoft dot com

mortengjesing commented 1 year ago

Sorry but we had to recreate the environment to get it up and running. Are you able to see anything after we deleted the setup?

vturecek commented 1 year ago

@mortengjesing

limoniapps commented 1 year ago

Running into the same issue. All containers in one specific container app environment are all down. The log in the container app environment reads the following when attempting to access the container: "Msg":"startup probe failed: connection refused","Reason":"ReplicaUnhealthy" Recreating the container app environment and containers resolves the issue.

limoniapps commented 1 year ago

Any attempts to make changes and deploy a revision also fail. Error message: Failed to deploy new revision: Internal server error occurred. correlation ID: 967df002-1219-428f-ab91-ed39cfa11e31

noahyaagoubi commented 1 year ago

Same issue here. Creating container app through Terraform, the host value for the startup probe defaults to the IP of the pod, yet it is refusing to connect despite the logs saying the application is listening on the specified port.

Khanplex commented 1 year ago

I am facing the same issue. I have recreated all resources still to no avail.

torosent commented 1 year ago

I am having facing the same issue. I have recreated all resources still to no avail.

Can you send your correlation id to acasupport at microsoft dot com?

ravick4u commented 1 year ago

I am also having the same issue. Not sure what is going on here. Recreated the Azure Container App but no luck.

torosent commented 1 year ago

I am also having the same issue. Not sure what is going on here. Recreated the Azure Container App but no luck.

please send you subscription id and correlation id to acasupport at microsoft dot com and we will investigate.

rphoon commented 1 year ago

Hi, I'm having the same issue. Trying to switch the container image to a new one from the demo image and getting the same error. The function I'm deploying uses Powershell runtime and uses a timer trigger.

Dockerfile was generated from Core Tools using VS Code:

FROM mcr.microsoft.com/azure-functions/powershell:4-powershell7.2-core-tools ENV AzureWebJobsScriptRoot=/home/site/wwwroot \ AzureFunctionsJobHostLoggingConsole__IsEnabled=true

COPY . /home/site/wwwroot

Errors from the Container App Environment streaming logs below:

{"TimeStamp":"2023-09-28 11:20:46 \u002B0000 UTC","Type":"Normal","ContainerAppName":"mcstest-ops-cap-managed-lz-alert","RevisionName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp","ReplicaName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp-998f65d4c-lr2cz","Msg":"Successfully pulled image \u0022mcstestaemgmtcr01.azurecr.io/mcstest-ops-ca-managed-lz-alerts:v1.1\u0022 in 67.969635ms (67.980124ms including waiting)","Reason":"ImagePulled","EventSource":"ContainerAppController","Count":1} {"TimeStamp":"2023-09-28 11:20:46 \u002B0000 UTC","Type":"Normal","ContainerAppName":"mcstest-ops-cap-managed-lz-alert","RevisionName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp","ReplicaName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp-998f65d4c-lr2cz","Msg":"Created container functions-container","Reason":"ContainerCreated","EventSource":"ContainerAppController","Count":3} {"TimeStamp":"2023-09-28 11:20:46 \u002B0000 UTC","Type":"Normal","ContainerAppName":"mcstest-ops-cap-managed-lz-alert","RevisionName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp","ReplicaName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp-998f65d4c-lr2cz","Msg":"Started container functions-container","Reason":"ContainerStarted","EventSource":"ContainerAppController","Count":3} {"TimeStamp":"2023-09-28 11:20:47 \u002B0000 UTC","Type":"Warning","ContainerAppName":"mcstest-ops-cap-managed-lz-alert","RevisionName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp","ReplicaName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp-998f65d4c-lr2cz","Msg":"startup probe failed: connection refused","Reason":"ReplicaUnhealthy","EventSource":"ContainerAppController","Count":1} {"TimeStamp":"2023-09-28 11:20:48 \u002B0000 UTC","Type":"Warning","ContainerAppName":"mcstest-ops-cap-managed-lz-alert","RevisionName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp","ReplicaName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp-998f65d4c-lr2cz","Msg":"Persistent Failiure to start container","Reason":"ContainerBackOff","EventSource":"ContainerAppController","Count":5} {"TimeStamp":"2023-09-28 11:20:49 \u002B0000 UTC","Type":"Warning","ContainerAppName":"mcstest-ops-cap-managed-lz-alert","RevisionName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp","ReplicaName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp-998f65d4c-lr2cz","Msg":"Persistent Failiure to start container","Reason":"ContainerBackOff","EventSource":"ContainerAppController","Count":6} {"TimeStamp":"2023-09-28 11:20:51 \u002B0000 UTC","Type":"Normal","ContainerAppName":"mcstest-ops-cap-managed-lz-alert","RevisionName":"mcstest-ops-cap-managed-lz-alert--gxslkq3","ReplicaName":"","Msg":"KEDA is stopping the watch for revision \u0027mcstest-ops-cap-managed-lz-alert--gxslkq3\u0027 to monitor scale operations for this revision","Reason":"KEDAScalersStopped","EventSource":"KEDA","Count":1} {"TimeStamp":"2023-09-28 11:20:51 \u002B0000 UTC","Type":"Normal","ContainerAppName":"mcstest-ops-cap-managed-lz-alert","RevisionName":"mcstest-ops-cap-managed-lz-alert--gxslkq3","ReplicaName":"","Msg":"ScaledObject was removed from KEDA watch and would not be auto-scaled. Please check https://learn.microsoft.com/en-us/azure/container-apps/dapr-overview","Reason":"ScaledObjectDeleted","EventSource":"KEDA","Count":1}

JakeDern commented 1 year ago

+1 For experiencing this issue, overriding the health probes to something else seems to help but I occasionally have still been having problems. Restarting the revision 1-2 times usually fixes it.

Will open a support ticket the next time I get a repro.

chinadragon0515 commented 11 months ago

@rphoon

From this error, it means the pod has been started but the probe checked failed, can you help to check your container will response on the probe pod?

{"TimeStamp":"2023-09-28 11:20:47 \u002B0000 UTC","Type":"Warning","ContainerAppName":"mcstest-ops-cap-managed-lz-alert","RevisionName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp","ReplicaName":"mcstest-ops-cap-managed-lz-alert--hn7g0kp-998f65d4c-lr2cz","Msg":"startup probe failed: connection refused","Reason":"ReplicaUnhealthy","EventSource":"ContainerAppController","Count":1}

snipikar commented 6 months ago

Also facing the same issue, was working fine until a week ago. Building a dev env. and would like to avoid rebuilding the entire aca env. or have to worry whether this would happen when it eventually goes to prod..

Edit: Managed to fix by adding custom health probe properties to my aca yaml config. Here is a link which gives an explanation to why this is caused and how to fix. Also a link to the probes properties.

Hope this helps someone.

koalazub commented 2 months ago

I was also receiving the error(truncated):

startup probe failed: connection refused","Reason":"ReplicaUnhealthy","EventSource":"ContainerAppController","Count":1}

I managed to pin this to a container I was using that utilised a fileShare in a volume. I have to clear that fileShare out before each deployment otherwise the container app just fails to deploy. There was no logging available for me to pin this down and I just chose the nuclear option for my volume just to see what happens.

I'm not sure if this helps narrow down what's going on. But it'd be nice if we could get a bit more detail in the Reason field other than ReplicaUnhealthy