Discrepancy between "contains_isolated_nodes" and "edge_index_undirected"

FeynmanDNA commented 2 years ago

Strangely even though print(f'Contains isolated nodes: {data.contains_isolated_nodes()}') returns True, the networkx converted graph reports no sub-graphs: nx.number_connected_components(G) == 1

Then I looked into the difference between edge_index and edge_index_undirected and noticed some discrepancy.

Python code to reproduce my observation:

selected_dataset = 'synthetic_graph_1'
splitting_strategy = 'random'

dataset = LinkVesselGraph(
    root='data', 
    name=selected_dataset,
    splitting_strategy=splitting_strategy,
    use_edge_attr=True,
    use_atlas=True,
)

data = dataset[0]  # Get the first graph object.
print(data)

# Gather some statistics about the graph.
print(f'Number of nodes in graph: {data.num_nodes}')
print(f'Number of edges in graph: {data.num_edges}')
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
print(f'Contains isolated nodes: {data.contains_isolated_nodes()}')
print(f'Contains self-loops: {data.contains_self_loops()}')
print(f'Is Undirected: {data.is_undirected()}')

print(f'Number of undirected edges', data.edge_index_undirected.size(dim=1))
print(f'Number of training edges', data.train_pos_edge_index.size(dim=1))
print(f'Number of validation edges', data.val_pos_edge_index.size(dim=1))
print(f'Number of test edges', data.test_pos_edge_index.size(dim=1))

print(data.edge_index[:, 10:20])
"""
tensor([[  10,   11,   12,   13,   15,   17,   19,   20,   21,   22],
        [1642, 1679, 1658, 1553, 1559, 1564, 1593, 1656, 1565, 1549]])
"""

print(data.edge_index_undirected[:, 26:36])
"""
tensor([[  13, 1553,   14, 1629,   15, 1559,   16, 1580,   17, 1564],
        [1553,   13, 1629,   14, 1559,   15, 1580,   16, 1564,   17]])
"""

Note that edge_index reports no edge for node 14 and 16, but edge_index_undirected somehow returns edges that involves node 14 and 16

In fact, after the networkx conversion, if like data.contains_isolated_nodes() said, it has isolated nodes, then we should see subgraphs of connected components, but strangely we get a singe graph:

data_undirected = Data(x=data.x, edge_index = data.edge_index_undirected,
                    edge_attr = data.edge_attr_undirected)

G = to_networkx(
    data_undirected,
    to_undirected=True,  # to undirected for is_connected()
)
print("Networkx: #nodes, #edges", G.number_of_nodes(), G.number_of_edges())

# G connected?
print(f"nx.is_connected(G): {nx.is_connected(G)}")
print(f"nx.number_connected_components(G): {nx.number_connected_components(G)}")
"""
Networkx: #nodes, #edges 3197 3249
nx.is_connected(G): True
nx.number_connected_components(G): 1
"""

I tried remove_isolated_nodes and it reports similar observations.

Looks like edge_index_undirected is custom-made, maybe there is some logic that connects isolated nodes somehow that users should be be aware of?

Thanks!

FeynmanDNA commented 2 years ago

Hmm, I just realized I was confused by the edge_index's relationship to training set again... which @jqmcginnis you have already kindly explained here (https://github.com/jocpae/VesselGraph/issues/12#issuecomment-1148524602)

So it is not surprising that the training set contains isolated nodes (like node 14 and 16 in the above example), since they are randomly chosen to be part of training or not.

jqmcginnis commented 2 years ago

@FeynmanDNA sorry once again for the late response!

Thank you very much for the observation! Do you see any intuitive way of handling this to a better extent? I am still not very happy with the terminology of edge_index, edge_index_undirected and the training, validation and test masks. Do you see anyway of handling this in a better way, or should we stick to it for now?

FeynmanDNA commented 2 years ago

@jqmcginnis like you mentioned in your code comment that the link dataset adheres to the convention that only training edges are present in the data.edge_index. I recommend sticking to the community convention :)

jocpae / VesselGraph

Discrepancy between "contains_isolated_nodes" and "edge_index_undirected" #13