Closed realCrush closed 1 year ago
Thanks for reporting! What is the timing when only running a single sparse-sparse matrix multiplication (e.g., adj_t @ s
)? My assumption is that the intermediate sparse matrix is pretty dense, which might explain the overall slowdown. You can also try to densify the intermediate representation:
s_t @ (adj_t @ s).to_dense()
We are utilizing cusparse
internally for this, so there is sadly not much we can do to improve its efficiency :(
Thanks for reporting! What is the timing when only running a single sparse-sparse matrix multiplication (e.g.,
adj_t @ s
)? My assumption is that the intermediate sparse matrix is pretty dense, which might explain the overall slowdown. You can also try to densify the intermediate representation:s_t @ (adj_t @ s).to_dense()
We are utilizing
cusparse
internally for this, so there is sadly not much we can do to improve its efficiency :(
It does becomes faster however still cost ~50s, and the result of adj_t @ s
is also sparse (density=0.67%). Actually I found when doing SparseTensor@dense
, it is really fast, and faster then SparseTensor.to_dense()@dense
, I also tried to do the operations with torch.sparse
, it is also slow and cost ~25s, this is wired since SparseTensor
and torch.sparse
are both designed for sparse tensors.
I want to ask is there a way to do sparse@sparse efficiently? I can't believe both SparseTensor@SparseTensor
and torch.sparse@torch.sparse
are so slow even compared with dense format🤪
How dense is the final output? I think doing sparse-sparse matrix multiplication on GPUs efficiently is still an open problem. This is not really a GPU-friendly algorithm, as it involves non-contiguous access patterns and requires computing the number of NNZ entries in advance.
How dense is the final output? I think doing sparse-sparse matrix multiplication on GPUs efficiently is still an open problem. This is not really a GPU-friendly algorithm, as it involves non-contiguous access patterns and requires computing the number of NNZ entries in advance.
The final output is also sparse, with density=0.68%
This is not really a GPU-friendly algorithm
I also tried to do ot on CPU, and reduce time from ~110 to ~25:
If the CPU time is much faster than GPU times, I assume that your sparse matrices are too small to take advantages of the GPU here.
This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?
This should be resolved.
Hi, I find it when doing sparse-sparse multiplication with
SparseTensor
format, the speed is super slow even compared with doing this with dense format.The variable
s
andadj_t
are bothSparseTensor
, and below is the info:Then I just do 10000 times sparse-sparse multiplication (to avoid noise) and got ~110s time usage:
The same computation with a simple
.to_dense()
making time usage decease to ~5s:Move the
to_dense()
out of for-loop, it becomes even faster (~3s):I'm confusing because
SparseTensor
should be faster then dense format for such a very sparse tensor, this bug make my program really slow🥲, hope you can fix it(All the above experiments are on an empty 3090 GPU card(without other program's disturbing).