Updates explore/matrix-io/02.sparse-and-dense.ipynb to include a comparison of multiplying memory-mapped numpy arrays (which are stored on disk in .npy format) by various types of vectors. Specifically, adds a comparison of various scipy.sparse matrix formats for this task.
I found these results fairly surprising, so I reran the notebook several times and found similar results each time. Different types of sparse matrices seem to be better for different tasks, though how this breaks down is a bit tricky. For example, scipy.sparse.coo_matrix appears to be the fastest sparse method for multiplication by the largest matrices, while CSR/CSC are basically tied for fastest among the sparse vectors for small matrices.
Overall, it appears that subsetting memory-mapped matrices and multiplying by a numpy.ones vector is still the fastest method in most cases, and it (subsetting) shows the biggest improvement over other methods for the largest, slowest-to-load matrices. See 02.sparse-and-dense.ipynb
Addresses #76
Updates explore/matrix-io/02.sparse-and-dense.ipynb to include a comparison of multiplying memory-mapped numpy arrays (which are stored on disk in
.npy
format) by various types of vectors. Specifically, adds a comparison of variousscipy.sparse
matrix formats for this task.I found these results fairly surprising, so I reran the notebook several times and found similar results each time. Different types of sparse matrices seem to be better for different tasks, though how this breaks down is a bit tricky. For example,
scipy.sparse.coo_matrix
appears to be the fastest sparse method for multiplication by the largest matrices, while CSR/CSC are basically tied for fastest among the sparse vectors for small matrices.Overall, it appears that subsetting memory-mapped matrices and multiplying by a
numpy.ones
vector is still the fastest method in most cases, and it (subsetting) shows the biggest improvement over other methods for the largest, slowest-to-load matrices. See 02.sparse-and-dense.ipynb