greenelab / connectivity-search-analyses

hetnet connectivity search research notebooks (previously hetmech)
BSD 3-Clause "New" or "Revised" License
9 stars 5 forks source link

Scipy sparse vector with memory map #90

Closed zietzm closed 6 years ago

zietzm commented 6 years ago

Addresses #76

Updates explore/matrix-io/02.sparse-and-dense.ipynb to include a comparison of multiplying memory-mapped numpy arrays (which are stored on disk in .npy format) by various types of vectors. Specifically, adds a comparison of various scipy.sparse matrix formats for this task.

I found these results fairly surprising, so I reran the notebook several times and found similar results each time. Different types of sparse matrices seem to be better for different tasks, though how this breaks down is a bit tricky. For example, scipy.sparse.coo_matrix appears to be the fastest sparse method for multiplication by the largest matrices, while CSR/CSC are basically tied for fastest among the sparse vectors for small matrices.

Overall, it appears that subsetting memory-mapped matrices and multiplying by a numpy.ones vector is still the fastest method in most cases, and it (subsetting) shows the biggest improvement over other methods for the largest, slowest-to-load matrices. See 02.sparse-and-dense.ipynb