ndmitchell / weeder

Detect dead exports or package imports
BSD 3-Clause "New" or "Revised" License
124 stars 8 forks source link

Reduce allocation by switching from Foundation to Text #27

Closed barrucadu closed 7 years ago

barrucadu commented 7 years ago

I saw #25 and decided to try profiling the alternative strings in the str directory while running weeder on https://github.com/barrucadu/dejafu, four packages with 51 modules between them. Here's what I found:

Time (Real) Time (User) Time (Sys) Total Allocation
Str-ByteString.hs 0m34.217s 0m33.771s 0m2.497s 1,668,614,968 bytes
Str-Foundation.hs 0m34.499s 0m34.130s 0m2.373s 3,839,080,344 bytes
Str-Foundation-Unsafe.hs 0m34.783s 0m34.369s 0m2.379s 3,015,613,200 bytes
Str-String.hs 0m39.284s 0m38.590s 0m2.702s 5,528,129,584 bytes
Str-Text.hs 0m34.344s 0m33.880s 0m2.462s 1,767,357,752 bytes

Timing results are from compiling without profiling, allocation results are from compiling with -prof -fprof-auto, so there may well be some optimisations being defeated.

Text appears to use a little over half the memory of Foundation.Unsafe, with a very similar runtime.

The benchmark scripts are:

time:

#! /usr/bin/env nix-shell
#! nix-shell -i bash --pure  -p 'pkgs.haskellPackages.ghcWithPackages (pkgs: with pkgs; [ hashable unordered-containers yaml cmdargs extra text foundation bytestring ])' -p haskellPackages.stack

for str in str/*.hs; do
  mkdir -p dist/$str
  ghc --make src/Paths.hs $str Main -isrc -outputdir dist/$str -o dist/$str/weeder
  dist/$str/weeder --test || exit 1
done

pushd ../dejafu
stack build
popd

for str in str/*.hs; do
  echo
  echo $str
  time ./dist/$str/weeder ../dejafu/concurrency ../dejafu/dejafu ../dejafu/hunit-dejafu ../dejafu/tasty-dejafu ../dejafu/dejafu-tests >/dev/null
done

allocation:

#! /usr/bin/env nix-shell
#! nix-shell -i bash --pure  -p 'pkgs.profiledHaskellPackages.ghcWithPackages (pkgs: with pkgs; [ hashable unordered-containers yaml cmdargs extra text foundation bytestring ])' -p haskellPackages.stack

for str in str/*.hs; do
  mkdir -p dist/$str
  ghc --make src/Paths.hs $str Main -isrc -outputdir dist/$str -o dist/$str/weeder -prof -fprof-auto
  dist/$str/weeder --test || exit 1
done

pushd ../dejafu
stack build
popd

for str in str/*.hs; do
  ./dist/$str/weeder ../dejafu/concurrency ../dejafu/dejafu ../dejafu/hunit-dejafu ../dejafu/tasty-dejafu ../dejafu/dejafu-tests +RTS -p >/dev/null
  mv *.prof dist/$str/
done

(where pkgs.profiledHaskellPackages is the Nix Haskell package set with libraries compiled with profiling turned on)

ndmitchell commented 7 years ago

Thanks for the investigation!

Profiling can have huge impacts on optimised code, both performance and memory use. Can you rebenchmark with "normal" flags (e.g. Cabal build)?

barrucadu commented 7 years ago

Compiled with just -rtsopts, Foundation does better than it did, but Text still beats it in terms of total allocation (and maximum residency):

str/Str-ByteString.hs                                              
   1,666,312,648 bytes allocated in the heap
     327,385,856 bytes copied during GC
       4,698,224 bytes maximum residency (98 sample(s))
          69,240 bytes maximum slop
              12 MB total memory in use (0 MB lost due to fragmentation)

str/Str-Foundation.hs            
   2,495,373,664 bytes allocated in the heap
     316,563,464 bytes copied during GC
       2,875,192 bytes maximum residency (102 sample(s))
          66,584 bytes maximum slop
               9 MB total memory in use (0 MB lost due to fragmentation)

str/Str-Foundation-Unsafe.hs
   2,495,311,344 bytes allocated in the heap
     314,517,328 bytes copied during GC
       2,862,040 bytes maximum residency (102 sample(s))
          68,616 bytes maximum slop
               9 MB total memory in use (0 MB lost due to fragmentation)

str/Str-String.hs
   5,525,893,584 bytes allocated in the heap
   2,666,256,056 bytes copied during GC
      24,358,400 bytes maximum residency (285 sample(s))
         701,024 bytes maximum slop
              66 MB total memory in use (0 MB lost due to fragmentation)

str/Str-Text.hs
   1,764,079,248 bytes allocated in the heap
     308,773,880 bytes copied during GC
       4,973,528 bytes maximum residency (101 sample(s))
       1,224,600 bytes maximum slop
              13 MB total memory in use (0 MB lost due to fragmentation)

The timing results were already from the unprofiled binaries.

vincenthz commented 7 years ago

@barrucadu I don't think maximum residency means what to do you think it means: less is better and foundation "wins" there, which is compounded by "total memory is use" which represent how many mblocks have been allocated (less is better there too). Can you also add which ghc version that you used for the benchmarks ?

Globally, while it would be nice to push down the number of bytes allocated, it's hard to make any comparative differences that means something between 2 differents codebase with this number (e.g. text vs foundation).

That being said, I'm sure it would be interesting to drive a new round of optimisation for weeder, in foundation in some near future.

barrucadu commented 7 years ago

I have no idea how I misread the maximum residency to think Text won there, whoops.

I'm using GHC 8.0.2.

ndmitchell commented 7 years ago

The only metric users care about is total time and total memory in use - all the other details are really stuff to help developers optimise things. That said, I'd be surprised if decreasing the allocation rate didn't help foundation somewhere.

My guess given the timings is that String is vastly slower than the other approaches, but the majority of the time is not in String stuff.

I think the above benchmarks suggests foundation is the right approach - does everyone concur? I had planned to do a blog post comparing the options once the analysis had been done, so thanks @barrucadu

barrucadu commented 7 years ago

Yes, given the lower residency, Foundation does look like the right choice.

ndmitchell commented 7 years ago

Agreed, awesome - always bet on @vincenthz! (Although if he could reduce the alloc rate it would almost certainly improve performance even more - tearing away from the rest of the pack).