metagraph-dev / metagraph

Multi-target API for graph analytics with Dask
https://metagraph.readthedocs.io/en/latest/
Apache License 2.0
27 stars 7 forks source link

What is a node? #49

Closed jim22k closed 4 years ago

jim22k commented 4 years ago

Before we dig deeply into the base types, let's define what a node is.

networkx lets a node be any hashable object. Each node must be unique within a graph. Let's call this Hashable. graphblas requires nodes to be sequential uints starting from 0. Let's call this Sequential.

These seem to be the most common definitions of nodes used by various libraries. Sometimes there will be a concept of a node label. igraph uses Sequential nodes, but lets you add a "label" attribute to each node, which needs to be a string. Most methods and algorithms which expect a node let you pass in either the uint or the string and it figures things out.

I see two approaches to handling the difference in node expectations as we convert between different library representations of Graph and Nodes.

Approach 1

Require all nodes to be Sequential. When this is not a natural constraint in a library (like networkx), we create a copy and assign each node a sequential number in init. We still keep the mapping around for reference, but that mapping is not passed around.

Approach 2

Give every node a label. For graphs with Sequential nodes, the uint number is the label. Maintain a mapping between node label and node position and pass this mapping around. This allows each backend library to use either the label or position as appropriate for their implementation. Note: this is our current approach with node_index

seibert commented 4 years ago

A variant of Approach 2 would be to require a node to have a numeric ID (like a label) which does not need to be consecutive. This would be more efficient to implement on accelerators, and could still be auto-assigned when a graph is created.

As an example, suppose we create a Graph (using fake graphviz notation here) from this:

a->b
b->c
b->d

we would want to have a graph with the following node properties:

node_offset node_id node_label
0 0 "a"
1 1 "b"
2 2 "c"
3 3 "d"
Then if we run a filter operation and select all nodes but "c", we would get the new graph: node_offset node_id node_label
0 0 "a"
1 1 "b"
2 3 "d"

Which would ensure we could connect node "d" between the two graphs without having to do a string comparison.