Creating sparse matrices is rather slow

natverse / nat.nblast

R package implementing the NBLAST neuron search algorithm, as an add-on for the NeuroAnatomy Toolbox (nat) R package.

http://natverse.org/nat.nblast/

17 stars 6 forks source link

Creating sparse matrices is rather slow #19

Closed jdmanton closed 10 years ago

jdmanton commented 10 years ago

Creating a sparse matrix for 1,000 neurons from the 16,000-neuron full score matrix has been running for more than 90 minutes and still hasn't finished. This is with the full score matrix loaded into memory, so the slowness is not caused by disk access issues.

jefferis commented 10 years ago

Hmm. Is this a case where some kind of pre-allocation might help after all?

Gregory Jefferis

On 5 Sep 2014, at 21:09, James Manton notifications@github.com wrote:

Creating a sparse matrix for 1,000 neurons from the 16,000-neuron full score matrix has been running for more than 90 minutes and still hasn't finished. This is with the full score matrix loaded into memory, so the slowness is not caused by disk access issues.

— Reply to this email directly or view it on GitHub.

jdmanton commented 10 years ago

Is this a case where some kind of pre-allocation might help after all?

Apparently not...

No pre-allocation:

> system.time(foo <- sparse_score_mat(names(kcs20), allbyallmem))
   user  system elapsed 
198.828   2.110 202.860

Pre-allocation:

> system.time(foo <- sparse_score_mat(names(kcs20), allbyallmem))
   user  system elapsed 
220.063   3.425 229.289

I'll try some other implementations of sparse matrices and, if they're not much better, write one myself that's perhaps not as good for linear algebra but is faster for our use cases.

jdmanton commented 10 years ago

This is now much improved in 452b5c2, by switching from the Matrix package to spam for the sparse matrices.

jefferis commented 10 years ago

Just happened to notice this:

https://stat.ethz.ch/pipermail/r-help/2010-December/262365.html

but not much explanation

jdmanton commented 10 years ago

At least it means that it is due to the library and is not because I've done something silly. Spam seems to be uniformly faster, if somewhat harder to deal with.