Parallel multiply for dense matrices

wo80 / CSparse.NET

A concise library for solving sparse linear systems with direct methods.

GNU Lesser General Public License v2.1

58 stars 25 forks source link

Parallel multiply for dense matrices #21

Closed andreasg123 closed 5 years ago

andreasg123 commented 5 years ago

For completeness, and because it is easier to benchmark, here is the parallel multiply for dense matrices. When using random matrices with dimensions of powers of 3 and limiting the number of threads to 2, 4, and 48, respectively, the threshold for switching to the parallel version is 1 million multiplications. That corresponds to multiplying 100x100 matrices.

For different numbers of threads and maximum dimensions, the total speedup factors are shown in this table:

	2	4	48
243	0.93	1.23	2.59
729	1.46	2.19	4.22
2187	1.86	3.44	10.34

As one can see, the main benefit is for larger matrices.

wo80 commented 5 years ago

Thanks again and sorry for delay. I will try to publish a new package on nuget next week.

I've opened another issue #22 regarding the parallel algorithms. What do you think?

epsi1on commented 5 years ago

Thanks for parallel implementation. For column major storage, i think ParallelTransposeMultiply() is also good to implement. as it is more cache friendly than usual ParallelMultiply() and it will have much better performance gain than ParallelMultiply()

andreasg123 commented 5 years ago

I added ParallelOptions as an optional parameter. This could be expanded along the lines of #22. I also cleaned up the line endings.

@epsi1on, my benchmarks seem to indicate that for ParallelTransposeMultiply one would need a matrix with 1 million elements to make parallel multiplication worthwhile. Currently, I'm not inclined to implement it unless there is truly a need for that.

@wo80, is there a reason why TransposeMultiply and similar methods don't check whether the vector dimension matches the matrix dimension?

epsi1on commented 5 years ago

@andreasg123 Yes, you are exactly right. if ParallelMultiply() is A time faster than Multiply() then TransposeMultiply() is much less than A times faster than ParallelTransposeMultiply().

It was my mistake. Please just forget my last comment before this comment.

Thanks

wo80 commented 5 years ago

is there a reason why TransposeMultiply and similar methods don't check whether the vector dimension matches the matrix dimension?

The reason was, for iterative solvers, when doing the null and length checks, you'd have an impact on performance and I wanted it to be as fast as possible. I wasn't sure about that decision back then and I'm still not sure.