madeleineudell / ParallelSparseMatMul.jl

A Julia library for parallel sparse matrix multiplication using shared memory
Other
43 stars 13 forks source link

centralize imports, fix deprecation warnings and an error, bring up t… #7

Closed sbromberger closed 9 years ago

sbromberger commented 9 years ago

…o 0.4

madeleineudell commented 9 years ago

Thanks for the PR! Looks like you missed a few exports, though.

Did you test it in parallel on 0.4 and observe any parallel speedup? I haven't been able to install 0.4 yet, and noticed the amount of speedup depended quite heavily on the (changing) internals during the development of 0.3.

sbromberger commented 9 years ago

So I needed parallel sparse matrices for a new branch of LightGraphs.jl, and I found that parallelizing betweenness centrality across 3 cores made the calculation almost 60% faster (560 seconds vs ~1350 not in parallel). I'm not using any of the multiplication but I did notice that issue #6 is causing some problems. I'm trying to fix that now.

Another issue I found in my own code is that since floating-point arithmetic is not associative (that is, 0.6 + 0.1 + 0.1 != 0.6 + 0.2, lots of small errors tend to accumulate. I've found that in my FP-accumulation, after about 300k accums, I'm getting accuracy to about 5 decimal places (that is, relative to single-thread execution, the results of parallel FP operations are within 5 decimal places). It seems to decrease as a function of graph size.

In any case, 5 decimal places is good enough for centrality measures on large graphs, so I'm happy with it. #6 is a puzzler. I'm wondering whether the shared arrays are really as parallel-write-safe as we assume. Might have to explore DArrays as an alternative.

PS: my work is solely on 0.4.

madeleineudell commented 9 years ago

I just updated the discussion of issue #6; the issue is (probably) that the += operation is not thread safe. This means multiplication (but not transpose multiplication) of shared sparse matrices will not be thread safe.

The way to fix it is to introduce a separate result vector for each core and to add them up in serial at the end.

sbromberger commented 9 years ago

I'm pretty sure that was the approach I took last night as I was mucking about. I'll have a look when I get back home.

sbromberger commented 9 years ago

I think my latest change to src/parallel_matmul.jl fixes #6; at least it results consistently in ~

julia> max_error = findmax(abs(v1-u1))
(1.4210854715202004e-14,12586)

which is close enough for government work :)

madeleineudell commented 9 years ago

That looks great. Thanks!