Closed IngoScholtes closed 8 months ago
This bug seems to be due to a deviation of the compressed sparse row matrix format used by the pyG.EdgeIndex
format, compared to the common approach used, e.g., in scipy.
In essence, in pyG zero rows are not represented by repeated values in the index pointer array, which also leads to the fact that the reconstructed sparse representation misses all zero rows.
Example comparing pyG and scipy CSR:
from scipy.sparse import csr_matrix
import numpy as np
import torch
from torch_geometric import EdgeIndex
e = EdgeIndex(torch.tensor([[0,0,0],[0,1,2]]), sort_order='row')
((row_ptr, col), perm) = e.get_csr()
print(row_ptr)
print(e.to_dense())
tensor([0, 3])
tensor([[1., 1., 1.]])
m = csr_matrix((3, 3), dtype=np.int8)
m[0,1] = 1
m[0,0] = 1
m[0,2] = 1
print(m.indptr)
print(m.todense())
[0 3 3 3]
[[1 1 1]
[0 0 0]
[0 0 0]]
I could trace this to the call to torch_convert_indices_from_coo_to_csr
in pyG:
torch._convert_indices_from_coo_to_csr(
self[dim],
self.get_num_rows(),
out_int32=self.dtype != torch.int64,
)
The function get_num_rows
in pyG.EdgeIndex
calculates the number of rows as the number of non-zero rows, which leads to the above behavior.
I will raise this to the pyG developers ...
Juspt reported this issue to the pyG developers. I also added a pull request that would fix it.
The current implementation raises an
IndexError
when testing for the presence of a directed edge, where the source node has outdegree zero in the underlying network.Example: