Open aujxn opened 2 years ago
This would be perfect as a PR! I see no problem adding the Send+Sync bounds, this is for a matrix element which would be quite surprising to see as anything else. We will just have to add the trait bound where necessary, maybe add these bounds to CsMatBase
?
This implementation isn't ideal but should be better in many sparse by dense products. In certain cases like when the right hand side is a dense vector this is actually still sequential. To optimize it might be necessary to make a parallel iterator of the CsMatBase::outer_iterator
, but I don't know if it is possible to zip parallel iterators from sprs
with parallel iterators from ndarray
using their zip combinator. I don't have any experience actually implementing parallel iterators but I'll look into what it will take.
It will probably also be worth building some more benchmarks to compare the difference but I'm not sure if Criterion
can do benchmark comparisons for different conditional compilation directives.
I had need of parallel {CSR matrix} x { dense vector} mult, so I coded it using rayon. It was .... a nice speedup.
Um, I'm [ETA: NOT] sufficiently experienced with Rust to know how to go about integrating this into sprs, but then again, the code is so tiny that I think it wouldn't be difficult to just rewrite it in the right way to integrate with what you've already done.
Disclaimer: I really don't have a good idea of how to work with and extend traits, so I didn't even try to make this compatible with the existing traits, but I know that that is something that needs doing.
Disc#2: the code isn't as efficient as it could be: viz. it could assign the results to the final locations, but instead concats a bunch of fragment-vectors.
Currently only sparse by sparse products are parallel in the
smmp
module. Converting the current sparse by dense products usingndarray::parallel
should be straight forward. Here is an implementation forpar_csr_mulacc_dense_colmaj
that gives a significant speedup on my machine:The only changes here are the parallel iterator, adding the
rayon
feature forndarray
, and adding theSync
andSend
trait bounds to the data types inside the matrices. My concern is that addingSend + Sync
will result in these trait requirements to be unnecessarily added in many places.Looking at the
impl Mul
forCsMatBase
andCsMatBase
I see thatSync + Send
is required no matter ifmulti_thread
is enabled or not. Is it okay to propagate these trait requirements all the way up to many of the traitimpl
s forCsMatBase
and then use the conditional feature compilation on the lowest level functions found in theprod
module? Conditionally compiling at all the higher level implementations sounds like it would get nasty very quickly, especially as more parallel features get added.