pathpy / pathpyG

GPU-accelerated Next-Generation Network Analytics and Graph Learning for Time Series Data on Complex Networks.
https://www.pathpy.net
GNU Affero General Public License v3.0
33 stars 3 forks source link

Bug in `Graph.is_edge` #131

Closed IngoScholtes closed 8 months ago

IngoScholtes commented 8 months ago

The current implementation raises an IndexError when testing for the presence of a directed edge, where the source node has outdegree zero in the underlying network.

Example:

g = pp.Graph.from_edge_list([['a','b'], ['a','c'], ['a','a']])
g.is_edge('b', 'a')
IngoScholtes commented 8 months ago

This bug seems to be due to a deviation of the compressed sparse row matrix format used by the pyG.EdgeIndex format, compared to the common approach used, e.g., in scipy.

In essence, in pyG zero rows are not represented by repeated values in the index pointer array, which also leads to the fact that the reconstructed sparse representation misses all zero rows.

Example comparing pyG and scipy CSR:

from scipy.sparse import csr_matrix
import numpy as np
import torch
from torch_geometric import EdgeIndex

e = EdgeIndex(torch.tensor([[0,0,0],[0,1,2]]), sort_order='row')
((row_ptr, col), perm) = e.get_csr()
print(row_ptr)
print(e.to_dense())

tensor([0, 3])
tensor([[1., 1., 1.]])
m = csr_matrix((3, 3), dtype=np.int8)
m[0,1] = 1
m[0,0] = 1
m[0,2] = 1
print(m.indptr)
print(m.todense())

[0 3 3 3]
[[1 1 1]
 [0 0 0]
 [0 0 0]]
IngoScholtes commented 8 months ago

I could trace this to the call to torch_convert_indices_from_coo_to_csr in pyG:

torch._convert_indices_from_coo_to_csr(
            self[dim],
            self.get_num_rows(),
            out_int32=self.dtype != torch.int64,
        )

The function get_num_rows in pyG.EdgeIndex calculates the number of rows as the number of non-zero rows, which leads to the above behavior.

I will raise this to the pyG developers ...

IngoScholtes commented 8 months ago

Juspt reported this issue to the pyG developers. I also added a pull request that would fix it.