lurk-lab / arecibo

An advanced fork of Nova
https://lurk-lang.org/
MIT License
63 stars 27 forks source link

SpMVM acceleration on the CPU #368

Closed huitseeker closed 3 months ago

huitseeker commented 3 months ago

What?

Nova requires committing to a cross-term which involves a sparse matrix-vector multiplication, which is currently a folding bottleneck.

Context

We changed our matrix representation from the original COO (still used in the Spartan2 repo mentioned below) to CSR and parallelized the matrix-vector multiplication in #38. Ref on this data representation formats: https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)

We have an issue open for accelerating this on the GPU at #75

This issue

We'd like to cherry-pick the improvements from https://github.com/a16z/Spartan2/pulls?q=is%3Apr+matrix+is%3Aclosed to see if e.g. a specialization to the one-scalar would help the SpMVM on the CPU

winston-h-zhang commented 3 months ago

Data from #370 suggests that there are not much improvements using a16z's Spartan2 implementation, at least at first glance. Some of the verify tests seem to be performing better, but I'm not sure why. Otherwise, it seems that using a chunked multiply and specializing Scalar::ONE has minimal impact on performance. I guess, since we already have a parallelized implementation, the most low-hanging fruit has already been addressed

huitseeker commented 3 months ago

I agree, thanks for ruling that out!