Finch+PyData benchmarks

pydata / sparse

Sparse multi-dimensional arrays for the PyData ecosystem

https://sparse.pydata.org

BSD 3-Clause "New" or "Revised" License

585 stars 125 forks source link

Finch+PyData benchmarks #652

Closed mtsokol closed 5 months ago

mtsokol commented 5 months ago

Issue: https://github.com/willow-ahrens/finch-tensor/issues/24

Finch and PyData asv benchmarks.

Hi @willow-ahrens! I started working on a list of specific benchmarks that we want to implement and use for measuring improvements for Lazy API and Finch in general:

sparse matrix-vector (generated with fsprand)
- large 1000x1000, medium 100x100, small 10x10 (calls sparse.tensordot)
- uses CSC, CSR formats
- y = Ax + z (A is sparse)
- eager and compiled mode!
sparse tensor-tensor
- uses CSF, GCXS (PyData) formats
SDDMM
- A=B∘CD (A and B are sparse)
- eager and compiled mode!
reductions (sum, prod)
elementwise (add, mul)

github-actions[bot] commented 5 months ago

Test Results

5 905 tests ±0 5 875 :white_check_mark: +1 6m 21s :stopwatch: +4s 1 suites ±0 30 :zzz: - 1 1 files ±0 0 :x: ±0

Results for commit 525149d3. ± Comparison against base commit 9a8b31aa.

:recycle: This comment has been updated with latest results.

willow-ahrens commented 5 months ago

Tensordot is a fine way to benchmark, though I would almost prefer the A[:, None] syntax because it's more granular, so it feels like there are more opportunities to make scheduling mistakes. Also, since finch_tensor and finch are the repos that we want to change, wouldn't we want this suite to live there?

willow-ahrens commented 5 months ago

also, I think that large should be 1_000_000 by 1_000_000 with 10_000_000 nnz.

willow-ahrens commented 5 months ago

it's fine to leave small as 10 x 10, as a test case to measure calling overhead, though we might want to make it even smaller to be more accurate. the medium size you've chosen is also very small. The two most meaningful sizes are the extremes, as these show the latency and throughput of the kernel.

mtsokol commented 5 months ago

Tensordot is a fine way to benchmark, though I would almost prefer the A[:, None] syntax because it's more granular

By None you mean adding a new dimention at that position? If yes, then this index type isn't supported in finch-tensor

Also, since finch_tensor and finch are the repos that we want to change, wouldn't we want this suite to live there?

Hmm, we plan to add another backend to pydata/sparse so we need benchmarks also there. This would mean there are going to be similar benchmarks in finch-tensor, sparse, and Finch.jl.

I'm Ok, with copying these to finch-tensor as well, it's just something to remember that they are there.

hameerabbasi commented 5 months ago

I'm Ok, with copying these to finch-tensor as well, it's just something to remember that they are there.

I'd suggest setting up a new repo that can test both libraries instead of copying code, which can get increasingly out of sync. In addition, it might be a nice start to a "sparse extension" to the Array API benchmarks.

willow-ahrens commented 5 months ago

I don't think we need to copy these over to finch-tensor, and I see why we would want them here. I'll likely add some similar benchmarks in Finch.jl to test the scheduling heuristics in-tree as I make changes