jstolarek / slicer

Companion code for paper "Imperative Functional Programs that Explain their Work", Wilmer Ricciotti, Jan Stolarek, Roly Perera and James Cheney, ICFP 2017, Oxford, UK
http://dl.acm.org/citation.cfm?id=3110258
GNU General Public License v3.0
6 stars 0 forks source link

Unbelievable speed-ups from Resugarable type class #50

Open jstolarek opened 7 years ago

jstolarek commented 7 years ago

While working on performance measurements for the paper I noticed an enormous jump in performance between commits 06ca72df2af054fcd09404868c45970dc9163c59 (introduction of a benchmark suite, Feb 8, 2017) and d74087b302527e5db6364a2e9c155d16a93e52e0 (HEAD of references branch, Feb 23, 2017). By "enormous" I mean many of the tests being about several times faster, with some being several hundred times faster (sic!). In an abuse of git bisect I have identified the reponsible commit to be 4c25ab8cf91b63f497e6f2ff40649710fada9ea2:

commit 4c25ab8cf91b63f497e6f2ff40649710fada9ea2
Author: Jan Stolarek <jan.stolarek@ed.ac.uk>
Date:   Wed Feb 15 10:38:06 2017 +0000

    Create Resugarable type class

What this commit does is that instead of having separate functions to resugar expressions and values it introduces a Resugarable type class and existing functions become part of instance definitions of that class. This is something that was not expected to affect performance in any way and yet it turns out to have a profound impact. Tests with most visible speed-ups are: copy-list, curried-componentwise-sum, curried-pointwise-sum, curried-pointwise-sum-two, exceptions (300x faster!), foo (100x faster), map-increment, mergesort, merge, operators, refs (250x faster), reverse-eval, reverse-trace-profile2, reverse-trace-profile, reverse-trace, simple-closure, sort, T13 (670x faster - WTF!?), T2 (320x faster), uncurried-componentwise-sum (170x faster). I find these speed-ups absolutely unbelievable so a good first step would be for someone to reproduce these results. Run:

$ git checkout 4c25ab8cf91b63f497e6f2ff40649710fada9ea2
$ cabal clean && cabal configure --enable-benchmarks && cabal build --ghc-options="-O2" && cabal bench 1>bench-report-4c25ab8cf91b63f497e6f2ff40649710fada9ea2-org.txt
$ cat bench-report-4c25ab8cf91b63f497e6f2ff40649710fada9ea2-org.txt | grep -v "unProfile" > bench-report-4c25ab8cf91b63f497e6f2ff40649710fada9ea2.txt

$ git checkout de0ee392629a40f26837f40899e3e5bd9a24423e
$ cabal clean && cabal configure --enable-benchmarks && cabal build --ghc-options="-O2" && cabal bench 1>bench-report-de0ee392629a40f26837f40899e3e5bd9a24423e-org.txt
$ cat bench-report-de0ee392629a40f26837f40899e3e5bd9a24423e-org.txt | grep -v "unProfile" > bench-report-de0ee392629a40f26837f40899e3e5bd9a24423e.txt

This is not related to our research work in any way, but investigating the exact cause behind this change will definitely yield valuable knowledge of Haskell. Is this exposing a GHC bug? Is this a problem in our benchmarking code?

Here's a bisect log for the mergesort test:

d74087b302527e5db6364a2e9c155d16a93e52e0  Thu Feb 23 20:09:58 2017
benchmarking Slicer/examples/mergesort.tml
time                 248.0 μs   (247.8 μs .. 248.2 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 248.0 μs   (247.9 μs .. 248.1 μs)
std dev              389.4 ns   (294.3 ns .. 531.4 ns)

50b4a6403c0722ee7cc0d91e81360c956eeb8299  Tue Feb 21 16:46:47 2017
benchmarking Slicer/examples/mergesort.tml
time                 216.3 μs   (215.9 μs .. 216.8 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 216.7 μs   (216.1 μs .. 218.4 μs)
std dev              2.877 μs   (1.021 μs .. 5.737 μs)

d2d62e627a555a235b65c682e20866b878193ca7  Wed Feb 15 17:49:24 2017
benchmarking Slicer/examples/mergesort.tml
time                 204.6 μs   (204.2 μs .. 205.0 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 204.9 μs   (204.7 μs .. 205.4 μs)
std dev              962.2 ns   (477.7 ns .. 1.826 μs)

0b7d5d066fd31da36e14960809ab00cc7c3412b1  Wed Feb 15 12:35:45 2017
benchmarking Slicer/examples/mergesort.tml
time                 211.6 μs   (211.5 μs .. 211.8 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 211.6 μs   (211.5 μs .. 211.7 μs)
std dev              399.1 ns   (277.9 ns .. 633.7 ns

fcc42d1e6a2638a458636f067810f486d9102f14  Wed Feb 15 11:15:32 2017
benchmarking Slicer/examples/mergesort.tml
time                 208.2 μs   (208.0 μs .. 208.6 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 208.1 μs   (207.8 μs .. 208.8 μs)
std dev              1.503 μs   (674.9 ns .. 2.496 μs)

4c25ab8cf91b63f497e6f2ff40649710fada9ea2  Wed Feb 15 10:38:06 2017
benchmarking Slicer/examples/mergesort.tml
time                 226.5 μs   (226.1 μs .. 226.9 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 227.2 μs   (227.0 μs .. 227.6 μs)
std dev              1.042 μs   (764.2 ns .. 1.460 μs)

de0ee392629a40f26837f40899e3e5bd9a24423e  Tue Feb 14 12:57:38 2017
benchmarking Slicer/examples/mergesort.tml
time                 6.376 ms   (6.320 ms .. 6.425 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 6.596 ms   (6.509 ms .. 6.765 ms)
std dev              356.0 μs   (193.0 μs .. 597.6 μs)
variance introduced by outliers: 28% (moderately inflated)

06ca72df2af054fcd09404868c45970dc9163c59  Wed Feb 8 13:39:23 2017
benchmarking Slicer/examples/mergesort.tml
time                 6.245 ms   (6.186 ms .. 6.303 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 6.454 ms   (6.377 ms .. 6.624 ms)
std dev              344.7 μs   (169.1 μs .. 636.9 μs)
variance introduced by outliers: 28% (moderately inflated)

Attaching full benchmark logs. bench-report-4c25ab8cf91b63f497e6f2ff40649710fada9ea2.txt bench-report-de0ee392629a40f26837f40899e3e5bd9a24423e.txt