Add `SDDMM` example - Githubissues

mtsokol commented 2 months ago

Hi @hameerabbasi,

This PR adds SDDMM example and upgrades Finch to the latest version.

[UPDATED 14.05.2024] For my machine, running:

python examples/sddmm_example.py

gives:

Finch
Took 8.787564675013224 s.

Numba
Took 22.904020706812542 s.

SciPy
Took 22.59452811876933 s.

github-actions[bot] commented 2 months ago

Test Results

5 923 tests ±0 5 892 :white_check_mark: ±0 9m 24s :stopwatch: + 2m 33s 1 suites ±0 31 :zzz: ±0 1 files ±0 0 :x: ±0

Results for commit 0f52367e. ± Comparison against base commit 79b9d71d.

This pull request skips 1 and un-skips 1 tests.

``` sparse.numba_backend.tests.test_compressed ‑ test_reductions_float16[i8-None-sum-kwargs0] ``` ``` sparse.numba_backend.tests.test_compressed ‑ test_reductions_float16[f8-None-sum-kwargs0] ```

:recycle: This comment has been updated with latest results.

mtsokol commented 2 months ago

I think density could be increased to 0.0001 so we have 100 non-zeros (more realistic?) - I get same performance.

hameerabbasi commented 2 months ago

I'd actually like to test the examples as well, to make sure they always work. Can we add something like the following to CI:

# test_examples.sh
for example in $(find ./examples/ -iname *.py); do
  python $example
done

# in CI
source test_examples.sh

Alternatively (and preferably) let's move this to the benchmarks.

mtsokol commented 2 months ago

I added a CI stage for running it.

I can add SDDMM also to the benchmarks, but I prefer to also have examples separately that can be quickly shared with others and executed in repl, instead of unwrapping asv-specific benchmark code.

mtsokol commented 2 months ago

Blocked by https://github.com/willow-ahrens/Finch.jl/issues/534

mtsokol commented 2 months ago

Here's a debug output for Finch lazy mode plan:

```julia Executing: :(function var"##compute#410"(prgm) begin V = (((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{SparseCOOLevel{2, Tuple{Int64, Int64}, Vector{Int64}, Tuple{PlusOneVector{Int32}, PlusOneVector{Int32}}, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}} V_2 = ((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}} V_3 = ((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}} A0 = V::Tensor{SparseCOOLevel{2, Tuple{Int64, Int64}, Vector{Int64}, Tuple{PlusOneVector{Int32}, PlusOneVector{Int32}}, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}} A0_2 = Tensor(Dense(SparseDict(Element{0.0, Float64}())))::Tensor{DenseLevel{Int64, SparseLevel{Int64, Finch.DictTable{Int64, Int64, Vector{Int64}, Vector{Int64}, Vector{Int64}, Dict{Tuple{Int64, Int64}, Int64}}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}} @finch mode = :fast begin A0_2 .= 0.0 for i1 = _ for i0 = _ A0_2[i1, i0] = A0[i0, i1] end end return A0_2 end A2 = V_2::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}} A4 = V_3::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}} A8 = Tensor(Dense(SparseDict(Element{0.0, Float64}())))::Tensor{DenseLevel{Int64, SparseLevel{Int64, Finch.DictTable{Int64, Int64, Vector{Int64}, Vector{Int64}, Vector{Int64}, Dict{Tuple{Int64, Int64}, Int64}}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}} @finch mode = :fast begin A8 .= 0.0 for i52 = _ for i51 = _ for i50 = _ A8[i50, i51] << + >>= (*)(A0_2[i50, i51], (*)(A2[1, i52], A4[1, i52])) end end end return A8 end return (A8,) end end) ```

willow-ahrens commented 2 months ago

Let's keep working on this until we see a speedup from fusion. I believe a fusion-based speedup should be achievable here, so it's a good goal to work towards.

mtsokol commented 2 months ago

Right now in the latest Finch version we have precompilation of a few kernels. This causes a timeout of the first benchmark. Let me fix it.

willow-ahrens commented 2 months ago

Thanks @mtsokol!

pydata / sparse

Add `SDDMM` example #674

Test Results