llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.8k stars 11.45k forks source link

TUSchedulerTests.Cancellation hang #48455

Open sam-mccall opened 3 years ago

sam-mccall commented 3 years ago
Bugzilla Link 49111
Version unspecified
OS Linux
CC @nico

Extended Description

From Nico on llvm/llvm-project#48342

http://45.33.8.238/linux/38922/step_9.txt is another clangd unit test hang, this time in ClangdTests/TUSchedulerTests.Cancellation (and on linux).

Not sure if this is the same thing or no, but clangd unit tests used to be rare and now they're not (one every week or so now), so maybe it's the same cause.

(I don't think it's the same cause, but definitely a bug)

nico commented 3 years ago

Happened again in http://45.33.8.238/linux/50888/step_9.txt

Seems to fail about once a month, which isn't all that rarely.

There's also bug 50773.

Maybe it's time for a look into the threading bits here?

nico commented 3 years ago

Happened again in this build http://45.33.8.238/linux/48398/step_9.txt

Same stacks as in comment 1.

nico commented 3 years ago

Before I ran gdb:

thakis@dotc:~/src/hack$ ps aux | grep clangd ... thakis 2313354 0.0 0.0 218744 32540 pts/1 Sl+ 10:01 0:00 /usr/local/google/home/thakis/src/llvm-project/out/gn/obj/clang-tools-extra/clangd/unittests/./ClangdTests --gtest_filter=TUSchedulerTests.Cancellation

I then ran kill 2313354 (which is what in the end caused the test to fail, but only after the test had been running for > 2h).

nico commented 3 years ago

(this was the build http://45.33.8.238/linux/45981/summary.html )

nico commented 3 years ago

Happened again today (but not since then I think, so somewhat rare).

I attached to the hanging process in gdb before killing the process:

Attaching to process 2313354 [New LWP 2313397] [New LWP 2313398] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". futex_wait_cancelable (private=0, expected=0, futex_word=0x7ffcd0bbf8c8) at ../sysdeps/nptl/futex-internal.h:186 186 ../sysdeps/nptl/futex-internal.h: No such file or directory. (gdb) bt

​0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7ffcd0bbf8c8) at ../sysdeps/nptl/futex-internal.h:186

​1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ffcd0bbf8d0, cond=0x7ffcd0bbf8a0) at pthread_cond_wait.c:508

​2 __pthread_cond_wait (cond=0x7ffcd0bbf8a0, mutex=0x7ffcd0bbf8d0) at pthread_cond_wait.c:638

​3 0x00007f87b4f8f90c in std::condition_variable::wait(std::unique_lock&) () from /lib/x86_64-linux-gnu/libstdc++.so.6

​4 0x00000000040c4f3b in clang::clangd::Notification::wait() const ()

​5 0x0000000001a08b36 in clang::clangd::Context::TypedAnyStorage<llvm::detail::scope_exit<llvm::unique_function<void ()> > >::~TypedAnyStorage() ()

​6 0x000000000153d203 in std::_Sp_counted_ptr_inplace<clang::clangd::Context::Data, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose()

()

​7 0x0000000001596f45 in clang::clangd::WithContext::~WithContext() ()

​8 0x00000000019ed9e7 in clang::clangd::(anonymous namespace)::TUSchedulerTests::updateWithCallback(clang::clangd::TUScheduler&, llvm::StringRef, llvm::StringRef, clang::clangd::WantDiagnostics, llvm::unique_function<void ()>) ()

​9 0x00000000019ecca0 in clang::clangd::(anonymous namespace)::TUSchedulerTests_Cancellation_Test::TestBody() ()

​10 0x00000000040e45dc in testing::Test::Run() ()

​11 0x00000000040e5a10 in testing::TestInfo::Run() ()

​12 0x00000000040e61c0 in testing::TestCase::Run() ()

​13 0x00000000040ee454 in testing::internal::UnitTestImpl::RunAllTests() ()

​14 0x00000000040edfcc in testing::UnitTest::Run() ()

​15 0x0000000001ba37d6 in main ()

(gdb) threads info Undefined command: "threads". Try "help". (gdb) info threads Id Target Id Frame

Let me know if there's additional tings I should capture next time.

nico commented 2 years ago

This is another instance of this hang: https://reviews.llvm.org/D122251#3528961

nico commented 2 years ago

(But haven't seen clangd unit tests hang in a very long time before that -- less frequently than once / month for sure.)

nico commented 2 years ago

Saw this once locally last week, and once today here: http://45.33.8.238/macm1/37396/step_9.txt

nico commented 2 years ago

(Both times on an M1 mac.)