Closed nv-rliu closed 1 week ago
Our nightly MG triangle_count hangs on 4+ GPUs with the C++ test below.
INSTANTIATE_TEST_SUITE_P(
file_tests,
Tests_MGTriangleCount_File,
::testing::Combine(
// enable correctness checks
::testing::Values(TriangleCount_Usecase{0.1, false}),
::testing::Values(cugraph::test::File_Usecase("test/datasets/karate.mtx"))));
The hang occurs when the vertex_subset_ratio
is relatively small compared the number of rank which translate to certain ranks having no vertex to process.
Version
24.08
Which installation method(s) does this occur on?
Docker, Conda, Pip, Source
Describe the bug.
If I run the entire unit test with
pytest --import-mode=append test_triangle_count_mg.py
the test hangs once it reachestest_triangles
However, if I just run that test
pytest --import-mode=append test_triangle_count_mg.py::test_triangles
all 4 tests pass.I have been able to reproduce this on
dgx18
with 2-GPUs and 8-GPUsMinimum reproducible example
Relevant log output
Environment details
Other/Misc.
No response
Code of Conduct