willow-ahrens / Finch.jl

Sparse tensors in Julia and more! Datastructure-driven array programing language.
http://willowahrens.io/Finch.jl/
MIT License
158 stars 15 forks source link

Improve Compiler Runtime by 25x #432

Closed willow-ahrens closed 6 months ago

willow-ahrens commented 6 months ago

fixes #338 by reducing the cost of unresolve, mark_dead, and propagate_copies. This is a rewrite of our copy propagation algorithm based on https://www.csd.uwo.ca/~mmorenom/CS447/Lectures/CodeOptimization.html/node8.html and http://infolab.stanford.edu/~ullman/dragon/slides4.pdf. This results in an end-to-end compilation speedup of 25x for @finch_kernel

Both copy propagation and mark_dead can now bound the runtime of visiting a node of size n as O(n^2) with a constant number of recursive calls, assuming constant # of iters till convergence. this could be better but I think it's good enough for now.

  Benchmark Report for /Users/willow/Projects/Finch.jl
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

  Job Properties
  ==============

    •  Time of benchmarks:
       • Target: 23 Feb 2024 - 17:03
       • Baseline: 23 Feb 2024 - 17:09

    •  Package commits:
       • Target: 7bc05e
       • Baseline: 9779bc

    •  Julia commits:
       • Target: 312098
       • Baseline: 312098

    •  Julia command flags:
       • Target: None
       • Baseline: None

    •  Environment variables:
       • Target: JULIA_NUM_THREADS => 1
       • Baseline: JULIA_NUM_THREADS => 1

  Results
  =======

  A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results that indicate possible regressions or improvements - are shown below (thus, an
  empty table means that all benchmark results remained invariant between builds).

                                      ID   time ratio memory ratio
  –––––––––––––––––––––––––––––––––––––– –––––––––––– ––––––––––––
  ["compile", "compile_pretty_triangle"] 0.04 (5%) ✅ 0.14 (1%) ✅

  Benchmark Group List
  ====================

  Here's a list of all the benchmark groups executed by this job:

    •  ["compile"]

    •  ["graphs", "bellmanford"]

    •  ["graphs", "bfs"]

    •  ["graphs", "pagerank"]

    •  ["indices", "SpMV_32"]

    •  ["indices", "SpMV_64"]

    •  ["matrices", "ATA_spgemm_gustavson"]

    •  ["matrices", "ATA_spgemm_outer"]

  Julia versioninfo
  =================

  Target
  ––––––

  Julia Version 1.10.0
  Commit 3120989f39b (2023-12-25 18:01 UTC)
  Build Info:
    Official https://julialang.org/ release
  Platform Info:
    OS: macOS (arm64-apple-darwin22.4.0)
    uname: Darwin 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:55:06 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6020 arm64 arm
    CPU: Apple M2 Pro: 
                   speed         user         nice          sys         idle          irq
         #1-12  2400 MHz    2289355 s          0 s     943750 s   59080078 s          0 s
    Memory: 32.0 GB (92.0625 MB free)
    Uptime: 521514.0 sec
    Load Avg:  2.9638671875  2.2724609375  1.9140625
    WORD_SIZE: 64
    LIBM: libopenlibm
    LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
    Threads: 1 on 8 virtual cores

  Baseline
  ––––––––

  Julia Version 1.10.0
  Commit 3120989f39b (2023-12-25 18:01 UTC)
  Build Info:
    Official https://julialang.org/ release
  Platform Info:
    OS: macOS (arm64-apple-darwin22.4.0)
    uname: Darwin 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:55:06 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6020 arm64 arm
    CPU: Apple M2 Pro: 
                   speed         user         nice          sys         idle          irq
         #1-12  2400 MHz    2295245 s          0 s     944750 s   59113655 s          0 s
    Memory: 32.0 GB (87.796875 MB free)
    Uptime: 521854.0 sec
    Load Avg:  2.6083984375  2.64697265625  2.22509765625
    WORD_SIZE: 64
    LIBM: libopenlibm
    LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
    Threads: 1 on 8 virtual cores

▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
Target result

  Benchmark Report for /Users/willow/Projects/Finch.jl
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

  Job Properties
  ==============

    •  Time of benchmark: 23 Feb 2024 - 17:3

    •  Package commit: 7bc05e

    •  Julia commit: 312098

    •  Julia command flags: None

    •  Environment variables: JULIA_NUM_THREADS => 1

  Results
  =======

  Below is a table of this job's results, obtained by running the benchmarks. The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
  The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero.

                                                          ID            time    GC time          memory allocations
  –––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––––––– –––––––––– ––––––––––––––– –––––––––––
                               ["compile", "compile_SpGeMM"]  75.650 ms (5%)             87.00 MiB (1%)     2048552
                      ["compile", "compile_pretty_triangle"]    3.521 s (5%) 104.490 ms   2.12 GiB (1%)    57950996
                         ["compile", "time_to_first_SpGeMM"]    4.584 s (5%)             66.66 KiB (1%)          70
              ["graphs", "bellmanford", "Newman/netscience"]   2.844 μs (5%)             54.25 KiB (1%)          24
                ["graphs", "bellmanford", "SNAP/roadNet-CA"] 134.301 ms (5%)             94.36 MiB (1%)        2709
                     ["graphs", "bfs", "SNAP/soc-Epinions1"]   2.824 ms (5%)              4.83 MiB (1%)          43
                  ["graphs", "bfs", "SNAP/soc-LiveJournal1"] 391.633 ms (5%)            190.80 MiB (1%)          60
                ["graphs", "pagerank", "SNAP/soc-Epinions1"]  16.513 ms (5%)             12.77 MiB (1%)          34
             ["graphs", "pagerank", "SNAP/soc-LiveJournal1"]    3.694 s (5%)  21.806 ms   2.15 GiB (1%)          50
                ["indices", "SpMV_32", "SNAP/soc-Epinions1"]   1.511 ms (5%)            593.00 KiB (1%)           2
                ["indices", "SpMV_64", "SNAP/soc-Epinions1"] 740.916 μs (5%)            593.00 KiB (1%)           2
  ["matrices", "ATA_spgemm_gustavson", "SNAP/soc-Epinions1"]    1.341 s (5%)            812.63 MiB (1%)       27485
      ["matrices", "ATA_spgemm_outer", "SNAP/soc-Epinions1"]    4.679 s (5%) 287.031 ms   3.17 GiB (1%)         248

  Benchmark Group List
  ====================

  Here's a list of all the benchmark groups executed by this job:

    •  ["compile"]

    •  ["graphs", "bellmanford"]

    •  ["graphs", "bfs"]

    •  ["graphs", "pagerank"]

    •  ["indices", "SpMV_32"]

    •  ["indices", "SpMV_64"]

    •  ["matrices", "ATA_spgemm_gustavson"]

    •  ["matrices", "ATA_spgemm_outer"]

  Julia versioninfo
  =================

  Julia Version 1.10.0
  Commit 3120989f39b (2023-12-25 18:01 UTC)
  Build Info:
    Official https://julialang.org/ release
  Platform Info:
    OS: macOS (arm64-apple-darwin22.4.0)
    uname: Darwin 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:55:06 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6020 arm64 arm
    CPU: Apple M2 Pro: 
                   speed         user         nice          sys         idle          irq
         #1-12  2400 MHz    2289355 s          0 s     943750 s   59080078 s          0 s
    Memory: 32.0 GB (92.0625 MB free)
    Uptime: 521514.0 sec
    Load Avg:  2.9638671875  2.2724609375  1.9140625
    WORD_SIZE: 64
    LIBM: libopenlibm
    LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
    Threads: 1 on 8 virtual cores

▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
Baseline result

  Benchmark Report for /Users/willow/Projects/Finch.jl
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

  Job Properties
  ==============

    •  Time of benchmark: 23 Feb 2024 - 17:9

    •  Package commit: 9779bc

    •  Julia commit: 312098

    •  Julia command flags: None

    •  Environment variables: JULIA_NUM_THREADS => 1

  Results
  =======

  Below is a table of this job's results, obtained by running the benchmarks. The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
  The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero.

                                                          ID            time    GC time          memory allocations
  –––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––––––– –––––––––– ––––––––––––––– –––––––––––
                               ["compile", "compile_SpGeMM"]  74.378 ms (5%)             87.01 MiB (1%)     2049679
                      ["compile", "compile_pretty_triangle"]   84.001 s (5%) 807.599 ms  14.64 GiB (1%)   504601524
                         ["compile", "time_to_first_SpGeMM"]    4.566 s (5%)             66.66 KiB (1%)          70
              ["graphs", "bellmanford", "Newman/netscience"]   2.844 μs (5%)             54.25 KiB (1%)          24
                ["graphs", "bellmanford", "SNAP/roadNet-CA"] 136.492 ms (5%)             94.36 MiB (1%)        2709
                     ["graphs", "bfs", "SNAP/soc-Epinions1"]   2.881 ms (5%)              4.83 MiB (1%)          43
                  ["graphs", "bfs", "SNAP/soc-LiveJournal1"] 391.382 ms (5%)            190.80 MiB (1%)          60
                ["graphs", "pagerank", "SNAP/soc-Epinions1"]  16.870 ms (5%)             12.77 MiB (1%)          34
             ["graphs", "pagerank", "SNAP/soc-LiveJournal1"]    3.587 s (5%)  23.657 ms   2.15 GiB (1%)          50
                ["indices", "SpMV_32", "SNAP/soc-Epinions1"]   1.516 ms (5%)            593.00 KiB (1%)           2
                ["indices", "SpMV_64", "SNAP/soc-Epinions1"] 743.833 μs (5%)            593.00 KiB (1%)           2
  ["matrices", "ATA_spgemm_gustavson", "SNAP/soc-Epinions1"]    1.341 s (5%)            812.63 MiB (1%)       27485
      ["matrices", "ATA_spgemm_outer", "SNAP/soc-Epinions1"]    4.676 s (5%) 284.434 ms   3.17 GiB (1%)         248

  Benchmark Group List
  ====================

  Here's a list of all the benchmark groups executed by this job:

    •  ["compile"]

    •  ["graphs", "bellmanford"]

    •  ["graphs", "bfs"]

    •  ["graphs", "pagerank"]

    •  ["indices", "SpMV_32"]

    •  ["indices", "SpMV_64"]

    •  ["matrices", "ATA_spgemm_gustavson"]

    •  ["matrices", "ATA_spgemm_outer"]

  Julia versioninfo
  =================

  Julia Version 1.10.0
  Commit 3120989f39b (2023-12-25 18:01 UTC)
  Build Info:
    Official https://julialang.org/ release
  Platform Info:
    OS: macOS (arm64-apple-darwin22.4.0)
    uname: Darwin 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:55:06 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6020 arm64 arm
    CPU: Apple M2 Pro: 
                   speed         user         nice          sys         idle          irq
         #1-12  2400 MHz    2295245 s          0 s     944750 s   59113655 s          0 s
    Memory: 32.0 GB (87.796875 MB free)
    Uptime: 521854.0 sec
    Load Avg:  2.6083984375  2.64697265625  2.22509765625
    WORD_SIZE: 64
    LIBM: libopenlibm
    LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
    Threads: 1 on 8 virtual cores
codecov[bot] commented 6 months ago

Codecov Report

Attention: Patch coverage is 83.78378% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 75.80%. Comparing base (94ce74f) to head (550f566).

Files Patch % Lines
src/util/util.jl 83.78% 6 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #432 +/- ## ========================================== - Coverage 75.84% 75.80% -0.04% ========================================== Files 87 87 Lines 7712 7718 +6 ========================================== + Hits 5849 5851 +2 - Misses 1863 1867 +4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

willow-ahrens commented 6 months ago

@nullplay @radha-patel I'd be curious to know whether this improves the runtime of either of your workflows. I recall both of you experienced long compile times and used @finch_kernel which is what this should affect, in theory.

nullplay commented 6 months ago

This is amazing! I'll check my large kernel (which previously takes 20mins to compile) with this PR.

radha-patel commented 6 months ago

Thanks Willow!! Some of the kernels I wrote that had previously not been compiling (e.g. SYMSYM) are now compiling. And others like MTTKRP are definitely compiling faster.