rusty1s / pytorch_sparse

PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations
MIT License
1.01k stars 147 forks source link

Slow sparse-sparse multiplication with SparseTensor format #270

Closed realCrush closed 1 year ago

realCrush commented 2 years ago

Hi, I find it when doing sparse-sparse multiplication with SparseTensor format, the speed is super slow even compared with doing this with dense format.

The variable s and adj_t are both SparseTensor, and below is the info:

image

Then I just do 10000 times sparse-sparse multiplication (to avoid noise) and got ~110s time usage:

image

The same computation with a simple .to_dense() making time usage decease to ~5s:

image

Move the to_dense() out of for-loop, it becomes even faster (~3s):

image

I'm confusing because SparseTensor should be faster then dense format for such a very sparse tensor, this bug make my program really slow🥲, hope you can fix it

(All the above experiments are on an empty 3090 GPU card(without other program's disturbing).

rusty1s commented 2 years ago

Thanks for reporting! What is the timing when only running a single sparse-sparse matrix multiplication (e.g., adj_t @ s)? My assumption is that the intermediate sparse matrix is pretty dense, which might explain the overall slowdown. You can also try to densify the intermediate representation:

s_t @ (adj_t @ s).to_dense()

We are utilizing cusparse internally for this, so there is sadly not much we can do to improve its efficiency :(

realCrush commented 2 years ago

Thanks for reporting! What is the timing when only running a single sparse-sparse matrix multiplication (e.g., adj_t @ s)? My assumption is that the intermediate sparse matrix is pretty dense, which might explain the overall slowdown. You can also try to densify the intermediate representation:

s_t @ (adj_t @ s).to_dense()

We are utilizing cusparse internally for this, so there is sadly not much we can do to improve its efficiency :(

It does becomes faster however still cost ~50s, and the result of adj_t @ s is also sparse (density=0.67%). Actually I found when doing SparseTensor@dense, it is really fast, and faster then SparseTensor.to_dense()@dense, I also tried to do the operations with torch.sparse, it is also slow and cost ~25s, this is wired since SparseTensor and torch.sparse are both designed for sparse tensors.

I want to ask is there a way to do sparse@sparse efficiently? I can't believe both SparseTensor@SparseTensor and torch.sparse@torch.sparse are so slow even compared with dense format🤪

image

realCrush commented 2 years ago

I leave two variables as .pkl here, you can try it:

test.zip

rusty1s commented 2 years ago

How dense is the final output? I think doing sparse-sparse matrix multiplication on GPUs efficiently is still an open problem. This is not really a GPU-friendly algorithm, as it involves non-contiguous access patterns and requires computing the number of NNZ entries in advance.

realCrush commented 2 years ago

How dense is the final output? I think doing sparse-sparse matrix multiplication on GPUs efficiently is still an open problem. This is not really a GPU-friendly algorithm, as it involves non-contiguous access patterns and requires computing the number of NNZ entries in advance.

The final output is also sparse, with density=0.68%

image

realCrush commented 2 years ago

This is not really a GPU-friendly algorithm

I also tried to do ot on CPU, and reduce time from ~110 to ~25:

image
rusty1s commented 2 years ago

If the CPU time is much faster than GPU times, I assume that your sparse matrices are too small to take advantages of the GPU here.

github-actions[bot] commented 1 year ago

This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?

rusty1s commented 1 year ago

This should be resolved.