Open DavidSpickett opened 8 months ago
@llvm/issue-subscribers-lldb
Author: David Spickett (DavidSpickett)
@jeffreytan81 it's definitely leaving processes behind. Do you see the same thing on your machine?
I wonder if we're following the fork and stopping only the child on exit, or following only one of the children and leaving them and the parent behind.
We are seeing similar issues on our Linux x64 and Mac x64 builders. We are chasing a weird lldb test timeout for a couple of days. The LLDB test step will usually finish in 5 mins when LIT uses 60 workers but recently (around Mar 8), this step sometimes took over 1hr until the builder was killed due to timeout. We cannot find the specific test that was stuck as the log shows each time the unfinished tests were different (TestConcurrentVFork
is among the tests that PASS actually).
We came across this github issue and decided to have a try to disable TestConcurrentVFork
in our builders. Then the time out issue was gone. We suspect this test probably wasn't cleaned up properly after the run and hold some resources that are needed by other tests, causing dead locks, but we cannot verify that.
@jeffreytan81 ping!
@DavidSpickett , sorry, github notification seems to fail me here, I never got notified for the tagging, and my mailbox does not got any emails...
To answer your question, no, we haven't observed the lingering processes issue, but it is possible no one noticed yet. Since @labath is fixing this issue, I will leave as is.
I've just put the skips back: https://github.com/llvm/llvm-project/commit/0c8151ac809c283187e9b19d0cbe72a09c8d74e0
The test is a lot more stable thanks to Pavel's change, but it's still failing enough to degrade the buildbot results for example https://lab.llvm.org/buildbot/#/builders/96/builds/56699.
Saw another failure on a GitHub CI run today: https://buildkite.com/llvm-project/github-pull-requests/builds/103712#01922590-fbb5-4b6e-824f-28d2891523a1
_bk;t=1727216861348Timed Out Tests (1):
_bk;t=1727216861348 lldb-api :: functionalities/fork/concurrent_vfork/TestConcurrentVFork.py
Going to disable this on Linux as a whole, the noise in CI isn't worth whatever coverage this is giving us.
The test is now disabled for all Linux. I will not have the time to figure out a solution here so FYI @jeffreytan81 if this feature is important to you.
lldb/test/API/functionalities/fork/concurrent_vfork/TestConcurrentVFork.py
sometimes fails with a timeout. I've seen this on AArch64 and Arm Linux.For example https://lab.llvm.org/buildbot/#/builders/17/builds/50450.
Inspecting the container afterwards shows that we are using way more PIDs than you'd expect, and we have around 600 processes like:
This container persists so this could be the result of the test not cleaning up processes, and them piling up until the system complains, or someone tries to debug the wrong process and gets no response. On AArch64 I have seen it lead to system resource errors as the leftover processes pile up.
I'm going to skip all the tests on these platforms while I look into it.