Closed nv-rliu closed 7 months ago
What's the minimum scale (i.e. # GPUs) to reproduce this? Can you reproduce this with 2 GPUs?
What's the minimum scale (i.e. # GPUs) to reproduce this? Can you reproduce this with 2 GPUs?
It's reproducible with 2-GPUs. This was the result from running on a lab machine.
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/karate.csv-directed:True-seeds:[0, 2]-radius:1] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/karate.csv-directed:True-seeds:[0, 2]-radius:2] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/karate.csv-directed:False-seeds:[0, 2]-radius:1] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/karate.csv-directed:False-seeds:[0, 2]-radius:2] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/dolphins.csv-directed:True-seeds:[0, 2]-radius:1] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/dolphins.csv-directed:True-seeds:[0, 2]-radius:2] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/dolphins.csv-directed:True-seeds:[0, 2]-radius:3] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/dolphins.csv-directed:False-seeds:[0, 2]-radius:1] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/dolphins.csv-directed:False-seeds:[0, 2]-radius:2] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/dolphins.csv-directed:False-seeds:[0, 2]-radius:3] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/email-Eu-core.csv-directed:True-seeds:[0, 2]-radius:1] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/email-Eu-core.csv-directed:True-seeds:[0, 2]-radius:2] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/email-Eu-core.csv-directed:True-seeds:[0, 2]-radius:3] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/email-Eu-core.csv-directed:False-seeds:[0, 2]-radius:1] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/email-Eu-core.csv-directed:False-seeds:[0, 2]-radius:2] - AssertionError: ColumnBase are different
FAILED cugraph/cugraph/tests/sampling/test_egonet_mg.py::test_dask_mg_ego_graphs[graph_file:/home/nfs/ralphl/fix-tests/nv-rliu/datasets/email-Eu-core.csv-directed:False-seeds:[0, 2]-radius:3] - AssertionError: ColumnBase are different
=============================================== 16 failed, 56 passed, 363 warnings in 159.28s (0:02:39) ================================================```
After looking at a trend of 1 Node 8-GPU runs across multiple days, it appears that the failure is transient.
You mean this is a heisenbug? Sounds worse but let me try to reproduce this first.
How often can you reproduce this? (Say run test_egonet_mg.py 10 times, how many times you see at least one failure?)
I am running this on my local system with 2 GPUs, and I can't reproduce the test failure. Let me try this on a DGX node as well.
Never mind, I reproduced this.
Version
24.04
Which installation method(s) does this occur on?
Source
Describe the bug.
Currently, the MG implementation of
ego_graph
returns a value that differs from the SG implementation when passed multiplen
values, akaseeds
.Minimum reproducible example
Relevant log output
Environment details
Other/Misc.
No response
Code of Conduct