rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library
https://docs.rapids.ai/api/cugraph/stable/
Apache License 2.0
1.75k stars 303 forks source link

isolated vertices are not being captured properly when initializing a Graph with an adjacency matrix #1478

Open rlratzel opened 3 years ago

rlratzel commented 3 years ago

Isolated vertices are lost when we initialize a Graph using an adj list:

>>> nxG = nx.from_numpy_array     (np.array([[1,1,0], [1,0,0], [0,0,0]]), create_using=nx.DiGraph)
>>> G   = cugraph.from_numpy_array(np.array([[1,1,0], [1,0,0], [0,0,0]]), create_using=cugraph.DiGraph)
>>>
>>> nxG.number_of_nodes()
3
>>> G.number_of_vertices()
2
>>> G = cugraph.from_numpy_array(np.array([[0]]), create_using=cugraph.DiGraph)
/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/join/casting_logic.py:132: UserWarning: can't safely cast column from right with type float64 to int64, upcasting to float64
  warnings.warn(
>>> G.number_of_vertices()
nan
>>> nxG = nx.from_numpy_array(np.array([[0]]), create_using=nx.DiGraph)
>>> nxG.number_of_nodes()
1

This is likely due to internally converting the numpy array to an edgelist, which is then used to initialize the Graph. We may be able to at least capture the correct number of vertices by looking at the incoming array's shape (which would also allow us to assert it's square).

ChuckHastings commented 3 years ago

The new graph object construction code can handle this properly as long as the python code calls the construction properly.

The legacy graph object construction code in C++ should handle this properly except in the case where there are isolated vertices at the end of the vertex list.

We need to prioritize/schedule this accordingly. If we can wait until we migrate to the new graph objects (C++) then this problem will be corrected at that point. If we need to fix it sooner then we will need to update the legacy implementation to properly handle isolated vertices at the end of the vertex list.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

ChuckHastings commented 1 year ago

Note that #3982 adds parameters to the C API that can address this. The python layer would need to be updated to pick up these changes in the case of an adjacency matrix.