sparsemat / sprs

sparse linear algebra library for rust
Apache License 2.0
386 stars 45 forks source link

Parallel smmp #201

Closed vbarrielle closed 4 years ago

vbarrielle commented 4 years ago

This pull request improves the performance of the sparse matrix product by taking advantage of multiple cores, thanks to rayon. Here are benchmark results demonstrating the improvements (benchmarked on a AMD Ryzen 7 3700X 8-Core Processor this time):

sparse_mult_perf_by_shape_no_old

Here's a description of the entries:

As before, with no multithreading, it takes a very large matrix to beat scipy's implementation (mostly because we need to sort the indices whereas scipy does not guarantee sorted indices). However, as far as I know scipy's matrix product is not parallelized, so taking advantage of multiple cores gives sprs an advantage.

On smaller shapes it takes a lot of threads to beat scipy (here the shape is 15000 x 25000):

sparse_mult_perf_150000_25000

I'm not entirely satisfied with the current multithreading implementation, dividing the work into independent chunks has been a bit clunky, but I'm sure it could be done in a cleaner way. That can wait though, the current state is correct and is a nice milestone.

It's quite possible some performance can be gained in the future, as currently the heuristic to divide the workload for the symbolic part has not been extensively tested, so some threads could be starving. A simple improvement could be to delay the sorting of indices, to know the actual number of nonzeros of the result and be able to divide the sorting workload evenly.