Open ilovepi opened 2 weeks ago
@llvm/issue-subscribers-lldb
Author: Paul Kirth (ilovepi)
The breakpoint counting in these tests has been flakey but for a known reason (we weren't making the distinction between "thread executed the breakpoint trap" and "the process stopped while this thread happened to have the PC on the trap instruction, but it hasn't executed it yet", which could lead to miscounting breakpoints.
But I'm not sure how you'd get miscounted signals. What the test is actually counting is "number of stops in the debugger where some thread had a stop reason of "signal". The test itself only sends one SIGUSR per signal thread, and the test makes only one signal thread. So either that signal is getting resent - which seems unlikely but signals are weird - and we're legitimately reporting two signal stops, or we are incorrectly preserving the signal stop reason across two stops.
We clear the stop reason for a thread the next time that thread is given a chance to run. We don't know or care whether it actually ran, we clear it when we tell that thread it can run, and then resume the process. However, if we don't allow the thread to run when we resume the process we preserve the stop info, since that really is the last state of that thread...
But in this test the only time we suspend threads is when stepping over breakpoints, we do that by suspending all the other threads and only allowing the breakpoint thread to run one instruction. Then we put the trap back in place and run all threads without returning control to the user. So I can't see a way that that stop - with the preserved signal stop info - could leak to the user.
If we could see the gdb-remote packet
log and the lldb step
logs for a run that fails this way, we should be able to see at least what the error is.
We're seeing some LLDB tests flake in our CI. Given these are concurrent tests I assume there is some data race or lack of synchronization.
Flaky tests: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalWatchBreak.py lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalNWatchNBreak.py
Bots: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/lldb-linux-arm64/b8734630228996969777/infra https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/lldb-linux-arm64/b8734618131611235377/overview
Error output:
https://github.com/llvm/llvm-project/issues/39394 seems to be a similar report. @JDevlieghere is this a known problem?