microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps
MIT License
360 stars 29 forks source link

Container crashes in Container Apps When Running Specific Operation #860

Open jumpei-tsutsui opened 1 year ago

jumpei-tsutsui commented 1 year ago

Please provide us with the following information:

This issue is a: (mark with an x)

Issue description

When running FastAPI on Container Apps, the container seems to crash only when performing specific operation. The crash occurs when using LangChain to perform vector searches on local ChromaDB. Other operations do not cause the crash.

If a new container with the same code and configuration is launched on the Container Apps environment where this operation has been working, the operation works correctly. There are multiple Container Apps environments where it works as expected, and it also works without issue in a local PC's Docker container. However, when launching a new container on a newly created Container Apps environment, this crash occurs. Creating a new Container Apps environment in a different subscription also causes the crash. There are no signs of CPU or memory shortage in the containers that are working correctly. Both the environments where the operation works and doesn't work are using the same versions of KEDA (2.10.0) and Dapr (1.10.8), with no differences between them.

There are no special outputs in the application logs, including LangChain or ChromaDB debug logs, when the crash occurs.

Console output when crash occurred:

ERROR: {"Error":{"Code":"ClusterExecFailure","Message":"Cluster exec API returns error: command terminated with non-zero exit code: error executing command [/bin/bash], exit code 137, code: 0.","Details":null,"Target":null,"AdditionalInfo":null}}

And here's the system log stream when the crash occurred. It seems to fail and then automatically start:

{"TimeStamp":"2023-08-01 22:24:20 \u002B0000 UTC","Type":"Warning","ContainerAppName":"dev-sample","RevisionName":"dev-sample--axebiex","ReplicaName":"dev-sample--axebiex-65884c954b-xhgxg","Msg":"Persistent Failiure to start container","Reason":"ContainerBackOff","EventSource":"ContainerAppController","Count":1}
{"TimeStamp":"2023-08-01 22:24:20 \u002B0000 UTC","Type":"Warning","ContainerAppName":"dev-sample","RevisionName":"dev-sample--axebiex","ReplicaName":"dev-sample--axebiex-65884c954b-xhgxg","Msg":"Persistent Failiure to start container","Reason":"ContainerBackOff","EventSource":"ContainerAppController","Count":1}
{"TimeStamp":"2023-08-01 22:24:23 \u002B0000 UTC","Type":"Warning","ContainerAppName":"dev-sample","RevisionName":"dev-sample--axebiex","ReplicaName":"dev-sample--axebiex-65884c954b-xhgxg","Msg":"Persistent Failiure to start container","Reason":"ContainerBackOff","EventSource":"ContainerAppController","Count":2}
{"TimeStamp":"2023-08-01 22:24:23 \u002B0000 UTC","Type":"Warning","ContainerAppName":"dev-sample","RevisionName":"dev-sample--axebiex","ReplicaName":"dev-sample--axebiex-65884c954b-xhgxg","Msg":"Container \u0027dev-sample\u0027 was terminated with exit code \u0027132\u0027","Reason":"Error","EventSource":"ContainerAppController","Count":1}
{"TimeStamp":"2023-08-01 22:24:23 \u002B0000 UTC","Type":"Warning","ContainerAppName":"dev-sample","RevisionName":"dev-sample--axebiex","ReplicaName":"dev-sample--axebiex-65884c954b-xhgxg","Msg":"Container \u0027dev-sample\u0027 was terminated with exit code \u0027132\u0027","Reason":"Error","EventSource":"ContainerAppController","Count":1}
{"TimeStamp":"2023-08-01 22:24:23 \u002B0000 UTC","Type":"Warning","ContainerAppName":"dev-sample","RevisionName":"dev-sample--axebiex","ReplicaName":"dev-sample--axebiex-65884c954b-xhgxg","Msg":"Persistent Failiure to start container","Reason":"ContainerBackOff","EventSource":"ContainerAppController","Count":3}
{"TimeStamp":"2023-08-01 22:24:23 \u002B0000 UTC","Type":"Warning","ContainerAppName":"dev-sample","RevisionName":"dev-sample--axebiex","ReplicaName":"dev-sample--axebiex-65884c954b-xhgxg","Msg":"Persistent Failiure to start container","Reason":"ContainerBackOff","EventSource":"ContainerAppController","Count":3}
{"TimeStamp":"2023-08-01 22:24:36 \u002B0000 UTC","Type":"Normal","ContainerAppName":"dev-sample","RevisionName":"dev-sample--axebiex","ReplicaName":"dev-sample--axebiex-65884c954b-xhgxg","Msg":"Pulling image \u0022iechatgptdev.azurecr.io/dev-api:sample\u0022","Reason":"PullingImage","EventSource":"ContainerAppController","Count":4}
{"TimeStamp":"2023-08-01 22:24:36 \u002B0000 UTC","Type":"Normal","ContainerAppName":"dev-sample","RevisionName":"dev-sample--axebiex","ReplicaName":"dev-sample--axebiex-65884c954b-xhgxg","Msg":"Successfully pulled image \u0022iechatgptdev.azurecr.io/dev-api:sample\u0022 in 283.978756ms (283.985849ms including waiting)","Reason":"ImagePulled","EventSource":"ContainerAppController","Count":1}
{"TimeStamp":"2023-08-01 22:24:36 \u002B0000 UTC","Type":"Normal","ContainerAppName":"dev-sample","RevisionName":"dev-sample--axebiex","ReplicaName":"dev-sample--axebiex-65884c954b-xhgxg","Msg":"Successfully pulled image \u0022iechatgptdev.azurecr.io/dev-api:sample\u0022 in 283.978756ms (283.985849ms including waiting)","Reason":"ImagePulled","EventSource":"ContainerAppController","Count":1}
{"TimeStamp":"2023-08-01 22:24:36 \u002B0000 UTC","Type":"Normal","ContainerAppName":"dev-sample","RevisionName":"dev-sample--axebiex","ReplicaName":"dev-sample--axebiex-65884c954b-xhgxg","Msg":"Created container dev-sample","Reason":"ContainerCreated","EventSource":"ContainerAppController","Count":4}
{"TimeStamp":"2023-08-01 22:24:36 \u002B0000 UTC","Type":"Normal","ContainerAppName":"dev-sample","RevisionName":"dev-sample--axebiex","ReplicaName":"dev-sample--axebiex-65884c954b-xhgxg","Msg":"Started container dev-sample","Reason":"ContainerStarted","EventSource":"ContainerAppController","Count":4}
{"TimeStamp":"2023-08-01 22:24:36 \u002B0000 UTC","Type":"Normal","ContainerAppName":"dev-sample","RevisionName":"dev-sample--axebiex","ReplicaName":"dev-sample--axebiex-65884c954b-xhgxg","Msg":"Started container dev-sample","Reason":"ContainerStarted","EventSource":"ContainerAppController","Count":4}

Expected behavior [What you expected to happen.]

The operation works consistently in any Container Apps environments

Actual behavior [What actually happened.]

Container crashes in a specific Container Apps envirnoments

Has this issue been reported before? If so, I'd like to know how to deal with it. If there's anything else I should investigate, please let me know.

ahmelsayed commented 1 year ago

exit code 132 seems to be Illegal Instruction which probably means it's trying to access some CPU instruction that isn't there. The top search result suggests avx, but I'm not familiar with LangChain.

jumpei-tsutsui commented 1 year ago

I tried running two apps on Azure App Service (Web Apps) with the same container image and the first app worked without the issue, but the second app crashed as well. The logs are the same as described in this Issue, and I have found some issues and posts about the App Service with similar situations. None of them seem to be resolved, but I hope this helps.

https://learn.microsoft.com/en-us/answers/questions/1164705/azure-app-service-goes-down-with-fail-middleware(0 https://learn.microsoft.com/en-us/answers/questions/203958/nodejs-web-app-on-azure https://github.com/Azure/azure-functions-host/issues/7829 https://stackoverflow.com/questions/76567572/running-fastapi-from-devops-azur-and-getting-timeout-error-failed-to-forward-req https://stackoverflow.com/questions/71249435/azure-bot-service-failed-to-forward-request-to-application

DrPass commented 11 months ago

Have the same issue with same conditions: Container app, LangChain + ChromaDB, image built with container-apps-deploy-action. However issue persists when pulling image from azure container registry and running it locally. The similar image built locally runs ok. Trying to find difference between images...

ArunJRK commented 8 months ago

Having the same issue. Image is running locally but failing in azure.

sengiv commented 6 months ago

i can't verify this fix, but it worked for me install full debian image like FROM debian:latest, and not python image basically the fault lies with the base image, experiment to find the right one, also don't forget "Container resource allocation" experiment with high specs and go down from there