rusty1s / pytorch_sparse

PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations
MIT License
1.01k stars 147 forks source link

AssertionError in matmul (assert int(col.max()) < N) #191

Closed minsikseo-cdl closed 2 years ago

minsikseo-cdl commented 2 years ago

Hi, I just faced AssertionError while using matmul. Here is the error message:

File "/workspace/networks.py", line 37, in spspmm
    C = matmul(A, B)
  File "/opt/conda/lib/python3.7/site-packages/torch_sparse/matmul.py", line 139, in matmul
    return spspmm(src, other, reduce)
  File "/opt/conda/lib/python3.7/site-packages/torch_sparse/matmul.py", line 116, in spspmm
    return spspmm_sum(src, other)
  File "/opt/conda/lib/python3.7/site-packages/torch_sparse/matmul.py", line 106, in spspmm_sum
    sparse_sizes=(M, K), is_sorted=True)
  File "/opt/conda/lib/python3.7/site-packages/torch_sparse/tensor.py", line 26, in __init__
    is_sorted=is_sorted)
  File "/opt/conda/lib/python3.7/site-packages/torch_sparse/storage.py", line 76, in __init__
    assert int(col.max()) < N
AssertionError

And my SparseTensors are:

A
> SparseTensor(row=tensor([     0,      0,      0,  ..., 493398, 493398, 493398], device='cuda:0'),
             col=tensor([     0,   4946,   4947,  ..., 493396, 493397, 493398], device='cuda:0'),
             val=tensor([1., 1., 1.,  ..., 1., 1., 1.], device='cuda:0'),
             size=(493399, 493399), nnz=3576315, density=0.00%)
B
> SparseTensor(row=tensor([     0,      0,      0,  ..., 493398, 493398, 493398], device='cuda:0'),
             col=tensor([     0,   4946,   4947,  ..., 493396, 493397, 493398], device='cuda:0'),
             val=tensor([1., 1., 1.,  ..., 1., 1., 1.], device='cuda:0'),
             size=(493399, 493399), nnz=3576315, density=0.00%)

In fact, A and B are identical. So what I want to do is nothing but the sparse matrix power of A

When I check the rows and columns indices and the sparse_size of A, it seems nothing's wrong. Even when I'm doing the identical operation using torch.sparse.mm with torch.sparse_coo_tensor, it gives the right result. (But, somehow, torch.sparse.mm seems to require more memory than torch_sparse.matmul, so I can't do this on GPUs)

It might be the problem that torch.ops.torch_sparse.spspmm_sum at line 101, in torch_sparse.spspmm_sum gives something wrong.

Any comment will be helpful.

Best,

rusty1s commented 2 years ago

This looks related to https://github.com/rusty1s/pytorch_sparse/issues/174.

I sadly cannot reproduce this issue on my machine, so it would be great to have your support finding the cause of this issue. Is it possible for you to debug https://github.com/rusty1s/pytorch_sparse/blob/master/csrc/cuda/spspmm_cuda.cu to see which output produces row or col tensors with unreasonably high values? Let me know if you need any guidance in doing so.

dgm2 commented 2 years ago

hello, any workaround for this issue? I found this assertion error as well on cuda 10.2

edge_index, edge_weight = spspmm(edge_index, edge_weight, edge_index, edge_weight, num_nodes, num_nodes, num_nodes)

File "/mnt/iusers01/fse-ugpgt01/compsci01/xxxx/.conda/envs/graph_ae/lib/python3.7/site-packages/torch_sparse/spspmm.py", line 30, in spspmm
    C = matmul(A, B)
  File "/mnt/iusers01/fse-ugpgt01/compsci01/xxxxx/.conda/envs/graph_ae/lib/python3.7/site-packages/torch_sparse/matmul.py", line 140, in matmul
    return spspmm(src, other, reduce)
  File "/mnt/iusers01/fse-ugpgt01/compsci01/xxxx/.conda/envs/graph_ae/lib/python3.7/site-packages/torch_sparse/matmul.py", line 117, in spspmm
    return spspmm_sum(src, other)
  File "/mnt/iusers01/fse-ugpgt01/compsci01/xxxx/.conda/envs/graph_ae/lib/python3.7/site-packages/torch_sparse/matmul.py", line 107, in spspmm_sum
    sparse_sizes=(M, K), is_sorted=True)
  File "/mnt/iusers01/fse-ugpgt01/compsci01/xxxx/.conda/envs/graph_ae/lib/python3.7/site-packages/torch_sparse/tensor.py", line 38, in __init__
    trust_data=trust_data,
  File "/mnt/iusers01/fse-ugpgt01/compsci01/xxxx/.conda/envs/graph_ae/lib/python3.7/site-packages/torch_sparse/storage.py", line 77, in __init__
    assert trust_data or int(col.max()) < N
AssertionError

Any comment is helpful! Thank you,

rusty1s commented 2 years ago

A current workaround may be to try and see if the newly added sparse matrix multiplication of torch.sparse_csr_tensor directly inside PyTorch works for you, see here. Let me know.

dgm2 commented 2 years ago

the torch version is giving an error about size. it expects the last index of crow to be 8629? any idea on how get this to work with the torch version?

the torch-sparse version does not give the issue on this setup

image

crow_indices.numel() must be size(0) + 1, but got: 8629

dgm2 commented 2 years ago

I tried converting the SparseTensors

row, col, value = torch.sparse.mm(A.to_torch_sparse_csr_tensor(), B.to_torch_sparse_csr_tensor())

gives

return torch._sparse_mm(mat1, mat2) RuntimeError: torch.empty: Only 2D sparse CSR tensors are supported.

rusty1s commented 2 years ago

How do A and B look like? Aren't they two-dimensional? Which shape do the value tensors of A and B have? This might also be the reason of the error inside torch-sparse since our sparse-matrix multiplication also requires 2-dimensional matrices.

github-actions[bot] commented 2 years ago

This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?