madeleineudell / ParallelSparseMatMul.jl

A Julia library for parallel sparse matrix multiplication using shared memory
Other
43 stars 13 forks source link

setindex! failing with shared matrices #10

Closed sbromberger closed 9 years ago

sbromberger commented 9 years ago
julia> a = share(spzeros(Int, 10,10))
10x10 sparse matrix with 0 Int64 entries:

julia> a[1,1] = 5
ERROR: indexing not defined for ParallelSparseMatMul.SharedSparseMatrixCSC{Int64,Int64}
 in setindex! at abstractarray.jl:572
sbromberger commented 9 years ago

Ah -

## setindex! not yet implemented, because can't splice a shared array
sbromberger commented 9 years ago

Which I guess leads to this question: what are the use cases here if the fields of the sparse matrices can't be changed? I guess you could change already-existing values, but you wouldn't be able to do anything that would result in changing rowval or colptr (for instance, add or remove sparse elements.)

cc @madeleineudell for some inspiration here :)

sbromberger commented 9 years ago

Ah, ok, I see the benefit now. It's so that we're not passing large data structures around. (They're still effectively read only, but we're not copying them to remote workers.)

madeleineudell commented 9 years ago

The original use case I was interested in was to multiply by the (fixed) matrix (repeatedly) inside an internal loop of some algorithm, which is a common trope in optimization methods.

Building sparse matrices in CSC or CSR form is generally a bad idea, and even more so in parallel: the overhead for reindexing (which requires splicing a new value into a sorted list) is just too high. You could do it fast enough using heaps and such, but not just using the straight sparse matrix structure. It's usually better to collect all the tuples (rowindex, columnindex, value) you'll want to put in the matrix first, and then sort them into CSC or CSR form.

Why do you want to build the matrix in parallel?

On Thu, Sep 10, 2015 at 10:03 AM, Seth Bromberger notifications@github.com wrote:

Ah, ok, I see the benefit now. It's so that we're not passing large data structures around. (They're still effectively read only, but we're not copying them to remote workers.)

— Reply to this email directly or view it on GitHub https://github.com/madeleineudell/ParallelSparseMatMul.jl/issues/10#issuecomment-139310752 .

Madeleine Udell Postdoctoral Fellow at the Center for the Mathematics of Information California Institute of Technology www.stanford.edu/~udell (415) 729-4115

sbromberger commented 9 years ago

Why do you want to build the matrix in parallel?

Because I was approaching the problem wrong, as it turns out. I was thinking I needed to have shared storage for any parallel-induced side effects I wanted; as it turns out, @sync @parallel myfunc for i = 1:n; foobar(i, shareddata, otherstuff); end is perfectly cromulent code, and will allow me to access the results of foobar() from the dispatch process via myfunc().

The really wonderful thing about this package, btw, is that I can pass my internal graph structure unmodified (well, after a share()) and it will be available to worker processes. This gets us to the point where LightGraphs.jl's 3-worker betweenness centrality calculations beat graph-tool's 4-core effort (see https://graph-tool.skewed.de/performance vs http://dpaste.com/2X94RC4).

Any way to get this into METADATA?

madeleineudell commented 9 years ago

Yes, I can certainly put it on METADATA...

On Thu, Sep 10, 2015 at 9:53 PM, Seth Bromberger notifications@github.com wrote:

Why do you want to build the matrix in parallel?

Because I was approaching the problem wrong, as it turns out. I was thinking I needed to have shared storage for any parallel-induced side effects I wanted; as it turns out, @sync @parallel myfunc for i = 1:n; foobar(i, shareddata, otherstuff); end is perfectly cromulent code, and will allow me to access the results of foobar() from the dispatch process via myfunc().

The really wonderful thing about this package, btw, is that I can pass my internal graph structure unmodified (well, after a share()) and it will be available to worker processes. This gets us to the point where LightGraphs.jl's 3-worker betweenness centrality calculations beat graph-tool's 4-core effort (see https://graph-tool.skewed.de/performance vs http://dpaste.com/2X94RC4).

Any way to get this into METADATA?

— Reply to this email directly or view it on GitHub https://github.com/madeleineudell/ParallelSparseMatMul.jl/issues/10#issuecomment-139452159 .

Madeleine Udell Postdoctoral Fellow at the Center for the Mathematics of Information California Institute of Technology www.stanford.edu/~udell (415) 729-4115

sbromberger commented 9 years ago

@madeleineudell could you let me know when it's tagged and available? I'd like to include it as a LightGraphs.jl dependency. Thanks!

madeleineudell commented 9 years ago

ok, just made a PR on METADATA.

Madeleine Udell Postdoctoral Fellow at the Center for the Mathematics of Information California Institute of Technology www.stanford.edu/~udell (415) 729-4115

On Mon, Sep 14, 2015 at 7:28 PM, Seth Bromberger notifications@github.com wrote:

@madeleineudell https://github.com/madeleineudell could you let me know when it's tagged and available? I'd like to include it as a LightGraphs.jl dependency. Thanks!

— Reply to this email directly or view it on GitHub https://github.com/madeleineudell/ParallelSparseMatMul.jl/issues/10#issuecomment-140258463 .