Closed funkel1989 closed 1 week ago
@funkel1989 - thanks for reaching out. There is an issue in which some container jobs finish execution you would see a message "Container container-name was terminated with exit code 0" but a "Successful Delete" message doesn't appear until after the timeout period has been exceeded.
What happens here is that the container job finished successfully (hence you see that GitHub stops hearing from the agent) but the container app job object is not removed. This doesn't have a functional impact, but it results in misleading logs.
We are testing a fix and will begin rolling it out in the next few weeks.
Does this address your question? Feel free to comment otherwise
@vinisoto I don't believe this is the same issue that we are seeing. Azure is reporting messages about how the container is no longer responding and then the devops pipeline task just times out.
Experiencing the same issue
@funkel1989, @CezaryKlus - can you please send an email to acasupport at
microsoft dot
com?
Please include your subscription Id, environment name, container app job name, and a timestamp when you saw this behavior. Please include an execution/replica name if possible, to speed up the process.
@funkel1989 @CezaryKlus - we deployed fixes related to this issue. Please feel free to open a new issue here and a support request if you continue to see similar issues.
This issue is a: (mark with an x)
Issue description
1 out of every 15 jobs on average will stop and azure devops will lose connection with the agent before the job has completed.
The system logs are reporting "Job was active longer than specified deadline" but these jobs are running for 2-3 minutes while others with identical configuration run for 10 plus minutes without a problem. Re-running the job solves the problem even if it ran longer than it did before.
On my run.sh I am using the --once flag so the jobs close after every task and a new job instance is create (if my understanding of how this works is correct?). I am seeing in the logs though at job execution is greater than 30 minutes which would make sense if its closing. I'm having trouble understanding how to fix this.
Here is a snippet from my run.sh
Steps to reproduce
Expected behavior All started long running jobs should complete
Actual behavior Jobs are randomly ending early
Screenshots