Open bcfriesen opened 4 years ago
I ran this unittest through valgrind
, and observed that, when TMPDIR
is set to one of the file systems which causes the program to hang as I described originally, the helgrind
threading tool in valgrind
reports the following:
==12182== Thread #1: pthread_cond_destroy: destruction of condition variable being waited upon
==12182== at 0x4C34882: ??? (in /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
==12182== by 0x40B665: (anonymous namespace)::VerifyingConsumer::~VerifyingConsumer() (llvm-project/clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:100)
==12182== by 0x40D629: DirectoryWatcherTest_DeleteWatchedDir_Test::TestBody() (llvm-project/clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:430)
==12182== by 0x54A633: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2402)
==12182== by 0x537A81: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2455)
==12182== by 0x525A25: testing::Test::Run() (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2474)
==12182== by 0x5262BA: testing::TestInfo::Run() (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2656)
==12182== by 0x526853: testing::TestCase::Run() (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2774)
==12182== by 0x52BE24: testing::internal::UnitTestImpl::RunAllTests() (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4649)
==12182== by 0x54DC33: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2402)
==12182== by 0x5396E1: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2455)
==12182== by 0x52BB29: testing::UnitTest::Run() (llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4257)
==12182==
It does not report that error when TMPDIR
is set to a file system which allows the program to complete without hanging.
Extended Description
Greetings,
I am very close to having a working x86 flang buildbot - everything about the bot's workflow succeeds except for a single lit test,
DirectoryWatcherTest
. This test reads and writes some temporary files/directories inTMPDIR
.On the system where this buildbot is running (a Cray XC40), I have observed the following behaviors (all results tested on commit 724bf4ee23a of llvm-project):
On "login nodes", the test runs without any issues.
TMPDIR
is set to/tmp
, which is a ramdisk:However, the same test on an XC "compute node" hangs:
The program hangs on that last line and never returns. lldb shows the following:
I have been told that the
/tmp
ramdisk on "compute nodes" is configured slightly different than the/tmp
ramdisk on "login nodes," which may explain the difference in behavior. But I don't know what the difference in configuration actually is.I also tried setting
TMPDIR
to different file systems, including Lustre and GPFS. I found that whenTMPDIR
is a GPFS file system, the test succeeds on any kind of node (login and compute), but whenTMPDIR
is a Lustre filesystem, it hangs on both login nodes and compute nodes, in the same way as whenTMPDIR
is set to the/tmp
ramdisk.So to summarize:
Any ideas how to make this test less sensitive to the kind of file system it's running on?
Thanks.