PLC sometimes handles isolated nodes inconsistently, and we are forced to handle the edge cases in Python (if possible).
I will illustrate this with plc.weakly_connected_components, but the same behavior has been seen elsewhere such as with Louvain.
Let's consider an undirected graph with edge indices [1, 2] and [2, 1] in COO format. There is currently no way to tell PLC how many nodes are in this graph, and it assumes the number of nodes is max(indices) + 1. Here's the first example:
This is almost perfect, right? Because we gave renumber=False when creating G, it assumes nodes are 0, 1, and 2 (b/c of max indices). But, what if we had four (or more) nodes in the graph and node 3 (and above) have no edges? Then we have no results for them, and we have no way to tell PLC to create results for them.
What if instead of COO we used CSR? One might expect CSR to work since the number of nodes is implied by the length of the offsets array. With WCC, we have two ways to use CSR: when creating the graph, or when calling WCC.
# Change creation of G to use this
src_or_offset_array=cp.array([0, 0, 1, 2, 2], np.int32),
dst_or_index_array=cp.array([2, 1], np.int32),
input_array_format="CSR",
# Or call plc.weakly_connected_components with this
graph=None,
offsets=cp.array([0, 0, 1, 2, 2], np.int32),
indices=cp.array([2, 1], np.int32),
Both approaches give the same result as before (only nodes 0, 1, 2).
It seems the fundamental issue is that COO contains incomplete information about the graph. If we are not renumbering, then I want to tell PLC the number of nodes in the graph. If we are renumbering, then I want to give it an array of nodes. Then I want results for each node.
Is this something that should be handled in the PLC/C layers?
Minimum reproducible example
See above
Relevant log output
No response
Environment details
No response
Other/Misc.
No response
Code of Conduct
[X] I agree to follow cuGraph's Code of Conduct
[X] I have searched the open bugs and have found no duplicates for this bug report
Version
23.12 dev
Which installation method(s) does this occur on?
Source
Describe the bug.
PLC sometimes handles isolated nodes inconsistently, and we are forced to handle the edge cases in Python (if possible).
I will illustrate this with
plc.weakly_connected_components
, but the same behavior has been seen elsewhere such as with Louvain.Let's consider an undirected graph with edge indices
[1, 2]
and[2, 1]
in COO format. There is currently no way to tell PLC how many nodes are in this graph, and it assumes the number of nodes ismax(indices) + 1
. Here's the first example:which outputs
This is almost perfect, right? Because we gave
renumber=False
when creatingG
, it assumes nodes are 0, 1, and 2 (b/c of max indices). But, what if we had four (or more) nodes in the graph and node 3 (and above) have no edges? Then we have no results for them, and we have no way to tell PLC to create results for them.What if instead of COO we used CSR? One might expect CSR to work since the number of nodes is implied by the length of the offsets array. With WCC, we have two ways to use CSR: when creating the graph, or when calling WCC.
Both approaches give the same result as before (only nodes 0, 1, 2).
It seems the fundamental issue is that COO contains incomplete information about the graph. If we are not renumbering, then I want to tell PLC the number of nodes in the graph. If we are renumbering, then I want to give it an array of nodes. Then I want results for each node.
Is this something that should be handled in the PLC/C layers?
Minimum reproducible example
Relevant log output
No response
Environment details
No response
Other/Misc.
No response
Code of Conduct