Closed huitseeker closed 3 months ago
Data from #370 suggests that there are not much improvements using a16z's Spartan2 implementation, at least at first glance. Some of the verify tests seem to be performing better, but I'm not sure why. Otherwise, it seems that using a chunked multiply and specializing Scalar::ONE has minimal impact on performance. I guess, since we already have a parallelized implementation, the most low-hanging fruit has already been addressed
I agree, thanks for ruling that out!
What?
Nova requires committing to a cross-term which involves a sparse matrix-vector multiplication, which is currently a folding bottleneck.
Context
We changed our matrix representation from the original COO (still used in the Spartan2 repo mentioned below) to CSR and parallelized the matrix-vector multiplication in #38. Ref on this data representation formats: https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)
We have an issue open for accelerating this on the GPU at #75
This issue
We'd like to cherry-pick the improvements from https://github.com/a16z/Spartan2/pulls?q=is%3Apr+matrix+is%3Aclosed to see if e.g. a specialization to the one-scalar would help the SpMVM on the CPU