Open shavgath opened 1 year ago
137 is out of memory error. It happens when the container memory usage exceeds the limit set on it. The default is 1GB, you can try to increase it.
Yes and we can also see the following in the logs:
Container exceeded its local ephemeral storage limit "1Gi"
However it looks like you cannot override the ephemeral storage value in the schema as it fails to apply it. What other possible solutions are there to increase disk space as Im not able to see this in the Microsoft documentation for container apps.
Hi @shavgath could you describe the steps to reproduce this issue and the results you expected? Thanks!
Hey @SophCarp:
It's weird how we've been using ACA's for several months now as azure devops agents and they've been working perfectly fine and I've never seen those errors/warnings in the logs before. Wondering if anything has changed in the backend?
@shavgath , the size limit for ephemeral storage is 1Gi in the container apps, and it is not customizable. If your job is required bigger storage the suggestion is to mount an Azure File to your container. https://learn.microsoft.com/en-us/azure/container-apps/storage-mounts-azure-files?tabs=bash
@howang-ms , per the recommendation from the docs, we also started mounting an Azure Storage account File Share inside our Azure build agent container. We decided to mount the file share on the _work directory inside the agent's installation directory.
Initially we ran into a permission issue where, during the very first build step in a pipeline, the agent would crash and claim that it didn't have permission to access the very same .js file it just downloaded. Thanks to the detailed logs under _diag we were able to figure out that it was caused by a bug in the ZipFile implementation (fixed by this PR: https://github.com/dotnet/runtime/pull/56370). We then upgraded to version 3.x of the build agent, to make use of that fix. After upgrading the build agent, we continued to the next issue:
error: chmod on /opt/devops-agent/_work/2/s/.git/config.lock failed: Operation not permitted
fatal: could not set 'core.filemode' to 'false'
##[error]Unable to use git.exe init repository under /opt/devops-agent/_work/2/s, 'git init' failed with exit code: 128
This happens during the "checkout" step of the Azure Pipeline. We've set the Azure Files mountpoint to /opt/devops-agent/_work
thus the config.lock mentioned in the error is placed on the fileshare. There seems to be no way for us to prevent git init
from attempting to take ownership of the config.lock
. Which fails because of a shortcoming of the way Azure Files are mounted: https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/could-not-change-permissions-azure-files
We're fairly stuck at this point since manually mounting the fileshare by running mount -t cifs -o ...
with a different uid and gid only ends up giving us: mount: /mnt/workdirectory: permission denied.
. We've ran that command as root and we've also tried just putting it in /etc/fstab
which leads to mount -a
throwing the same error.
Do you have any pointers for us?
To be clear, our Azure Build agent ran perfectly fine inside the Azure Container App, without the Azure Files mount. We only switched to mounting the fileshare because we ran into the same error as mentioned by the OP.
@vincentspaa Do you know if your issues are related to https://github.com/microsoft/azure-container-apps/issues/520?
@anthonychu Being able to add "uid=1000", "gid=1000"
to the mount options (as mentioned in that issue) will most likely fix the aforementioned problem. It's hard to estimate whether that will allow the build agent to then run off of the mounted _work directory without any further issues (i.e. due to further Azure File Share limitations). But it would definitely help out a lot.
@vincentspaa
This happens during the "checkout" step of the Azure Pipeline. We've set the Azure Files mountpoint to
/opt/devops-agent/_work
thus the config.lock mentioned in the error is placed on the fileshare. There seems to be no way for us to preventgit init
from attempting to take ownership of theconfig.lock
. Which fails because of a shortcoming of the way Azure Files are mounted: ...Do you have any pointers for us?
Possibly you could work around this by having git store the .git
folder on local storage while the rest of the repo is on the mounted storage.
With git init you can use use --separate-git-dir=<git-dir>
or set $GIT_DIR
to have git store the .git folder elsewhere
See git documentation
@Josverl Thank you for the suggestion. That's not really something we have control over, we only get to control which branch a given DevOps pipeline triggers on. From there DevOps makes sure to run the appropriate git commands. And even if we were to move the .git directory outside of the fileshare we would:
We are increasing the amount of ephemeral storage. More details will be shared later this month when the changes have been applied.
You can keep track of the ephemeral storage in issue #599
@shavgath Did you resolve this?
Seeing the same issue randomly during pipeline runs
Hi @anthonychu , we also encounter this issue randomly in production workload. A pod has 2,5Go of allocated RAM and a code 137 is produced while this pod only used 400Mo : "Container 'prd-xxx-ca' was terminated with exit code '137'". Also, we believe it happened during an Azure Container App Maintenance because all pods on our Azure Container App Environment have been restarted. There is no notification about this maintenance even though we subscribed to the Azure Service health Alert information.
Using container apps as azure devops agents and recently have started seeing agents drop or hang midway through jobs. When looking at the logs I can see the below error:
"Container was terminated with exit code '137'"
Not sure where this is coming from since it was working perfectly fine and no changes were made. This happens very frequently to the point that I can't run any pipelines and have to look at other solutions.