Closed jcapriot closed 6 months ago
Thank you for the changes! Would you be willing to write benchmarks (examples in the
benchmarks
directory, see https://asv.readthedocs.io/en/stable/writing_benchmarks.html)
Sure!
Merging #643 (c06b832) into main (a1d2081) will decrease coverage by
0.02%
. The diff coverage is100.00%
.
Improves the memory access pattern for
GCXS
arrays dotndarray
s.Was running a few benchmarks of the GCXS class against scipy's csr/csc arrays and noticed that they were significantly slower. So was curious what the biggest difference was as their structures shouldn't be too fundamentally different. When looking at the csr_dot_ndarray code and the csc_dot_ndarray code I realized that the access pattern of the arrays was inefficient.
Here are some of the relevant timings of things:
Scipy Timing:
Sparse timing:
Timing on main:
Timing on PR:
The left ndarray dot GCXS arrays are still a little slower than I'd like, but couldn't quite see where to dive into that at, as it looks like it's doing what it should be for the matvec operation (basically just doing
(GCXS.T @ x.T).T
)One other thing, for 1D ndarrays numba doesn't seem to be able to optimize these as well as it could (considering benchmarks with scipy csr/csc times a 1D array). There should likely be a branching for 1D ndarrays to use a version that doesn't include the loop over the second dimension.