haskellfoundation / hs-opt-handbook.github.io

The Haskell Optimization Handbook
https://haskell.foundation/hs-opt-handbook.github.io/
Creative Commons Attribution 4.0 International
170 stars 12 forks source link

GHC Flags for faster runtime chapter #38

Open doyougnu opened 2 years ago

doyougnu commented 2 years ago

mostly tuning the garbage collector.

doyougnu commented 1 year ago

An example from Arnaud Bailey:

Playing with 
[@michaelpj](https://input-output-rnd.slack.com/team/UBR973VL4)
’s ring buffer "kata" after Friday's dojo, I ended up writing some benchmark with criterion. This morning, I tidied up the code and cabal, adding "standard" ghc options where I thought they were needed and I was puzzled by seeing consistent and significant (~x2) performance degradation. I thought changing the types of the indices used could be the culprit, eg. having different behaviour whether it's a Word64 or an Int but of course not.
It turns out the performance hit came from the use of the -with-rtsopts=N in the ghc-options field of the benchmark's cabal section.
This is probably not surprising to anyone here but a good reminder to self that parallelism for code is just like antibiotics for diseases: It's not automatic :laughing:

and some responses.

Is this a performance hit for the single benchmarks or for the set as a whole, though? If parallelization makes you run 4 (or how many) benchmarks at the same time, then each of them will become slower, but benchmarks may (or may not, particularly when there's high memory consumption involved and GC is triggered all the time) finish significantly earlier.
BTW, it might be something more specific than -N, like parallel GC being enabled for the nursery or something.

and my comments:

yea I would want to try to disable the parallel GC with -qg  or set it to only treat the older nurseries with a parallel GC: -qg1 . If those produce a speedup then I would start tinkering with the GC nursery size and the number of generations. If you really want to start looking around the you could use -s to see how many sparks were created and timing stats. That would at least confirm whether adding CPUs was even beneficial.