haskell / containers

Assorted concrete container types
https://hackage.haskell.org/package/containers
314 stars 177 forks source link

Enable -fworker-wrapper-cbv on ghc-9.4 #1003

Open AndreasPK opened 1 month ago

AndreasPK commented 1 month ago

This flag allows for some significant perf improvements and the downsides don't apply to containers.

In particular this flag can cause rules to not fire if the relevant functions don't have NOINLINE pragmas. However the relevant functions in containers seem to have such a pragma so there should be no downside.

Here are the results for map-benchmarks:

Warning: Unknown/unsupported 'ghc' version detected (Cabal 3.8.1.0 supports
'ghc' version < 9.6): /opt/ghc-9.10.1/bin/ghc is version 9.10.1
Resolving dependencies...
Up to date
All
  lookup absent:                  OK
    89.4 μs ± 6.4 μs,       same as baseline
  lookup present:                 OK
    75.3 μs ± 5.7 μs,       same as baseline
  map:                            OK
    35.8 μs ± 3.0 μs,       same as baseline
  map really:                     OK
    84.9 μs ± 5.3 μs,       same as baseline
  <$:                             OK
    23.9 μs ± 1.4 μs, 11% less than baseline
  <$ really:                      OK
    57.1 μs ± 2.6 μs,       same as baseline
  alterF lookup absent:           OK
    89.6 μs ± 5.8 μs,       same as baseline
  alterF lookup present:          OK
    75.0 μs ± 5.3 μs,       same as baseline
  alterF no rules lookup absent:  OK
    91.6 μs ± 6.1 μs,       same as baseline
  alterF no rules lookup present: OK
    80.9 μs ± 5.5 μs,       same as baseline
  insert absent:                  OK
    209  μs ±  14 μs, 20% less than baseline
  insert present:                 OK
    159  μs ±  12 μs, 23% less than baseline
  alterF insert absent:           OK
    253  μs ±  22 μs, 20% less than baseline
  alterF insert present:          OK
    180  μs ±  11 μs,       same as baseline
  alterF no rules insert absent:  OK
    296  μs ±  23 μs, 16% less than baseline
  alterF no rules insert present: OK
    223  μs ±  11 μs,       same as baseline
  delete absent:                  OK
    143  μs ±  12 μs,       same as baseline
  delete present:                 OK
    199  μs ±  11 μs, 25% less than baseline
  alterF delete absent:           OK
    163  μs ±  14 μs,       same as baseline
  alterF delete present:          OK
    234  μs ±  22 μs, 22% less than baseline
  alterF no rules delete absent:  OK
    97.8 μs ± 6.0 μs,       same as baseline
  alterF no rules delete present: OK
    272  μs ±  26 μs, 19% less than baseline
  alter absent:                   OK
    211  μs ±  11 μs, 23% less than baseline
  alter insert:                   OK
    215  μs ±  11 μs, 24% less than baseline
  alter update:                   OK
    170  μs ±  11 μs, 24% less than baseline
  alter delete:                   OK
    213  μs ±  13 μs, 25% less than baseline
  alterF alter absent:            OK
    169  μs ±  11 μs,       same as baseline
  alterF alter insert:            OK
    245  μs ±  23 μs, 19% less than baseline
  alterF alter update:            OK
    177  μs ±  11 μs,  9% less than baseline
  alterF alter delete:            OK
    240  μs ±  22 μs, 21% less than baseline
  alterF no rules alter absent:   OK
    98.5 μs ± 5.5 μs,       same as baseline
  alterF no rules alter insert:   OK
    293  μs ±  22 μs, 16% less than baseline
  alterF no rules alter update:   OK
    222  μs ±  11 μs,       same as baseline
  alterF no rules alter delete:   OK
    272  μs ±  22 μs, 19% less than baseline
  insertWith absent:              OK
    209  μs ±  11 μs, 20% less than baseline
  insertWith present:             OK
    164  μs ±  11 μs, 22% less than baseline
  insertWith' absent:             OK
    201  μs ±  11 μs, 20% less than baseline
  insertWith' present:            OK
    171  μs ±  13 μs, 20% less than baseline
  insertWithKey absent:           OK
    215  μs ±  11 μs, 21% less than baseline
  insertWithKey present:          OK
    164  μs ±  11 μs, 23% less than baseline
  insertWithKey' absent:          OK
    201  μs ±  11 μs, 21% less than baseline
  insertWithKey' present:         OK
    159  μs ±  12 μs, 23% less than baseline
  insertLookupWithKey absent:     OK
    216  μs ±  12 μs, 23% less than baseline
  insertLookupWithKey present:    OK
    172  μs ±  11 μs, 23% less than baseline
  insertLookupWithKey' absent:    OK
    207  μs ±  11 μs, 24% less than baseline
  insertLookupWithKey' present:   OK
    174  μs ±  13 μs, 22% less than baseline
  mapWithKey:                     OK
    39.4 μs ± 3.5 μs,       same as baseline
  foldlWithKey:                   OK
    351  μs ±  23 μs, 24% less than baseline
  foldlWithKey':                  OK
    15.0 μs ± 816 ns,       same as baseline
  foldrWithKey:                   OK
    61.5 ns ± 5.3 ns,       same as baseline
  foldrWithKey':                  OK
    32.2 μs ± 1.4 μs,       same as baseline
  update absent:                  OK
    189  μs ±  12 μs, 21% less than baseline
  update present:                 OK
    146  μs ±  11 μs, 24% less than baseline
  update delete:                  OK
    200  μs ±  13 μs, 25% less than baseline
  updateLookupWithKey absent:     OK
    199  μs ±  11 μs, 21% less than baseline
  updateLookupWithKey present:    OK
    161  μs ±  11 μs, 23% less than baseline
  updateLookupWithKey delete:     OK
    207  μs ±  12 μs, 24% less than baseline
  mapMaybe:                       OK
    95.0 μs ± 3.4 μs, 14% less than baseline
  mapMaybeWithKey:                OK
    94.4 μs ± 5.7 μs, 10% less than baseline
  lookupIndex:                    OK
    167  μs ±  11 μs,       same as baseline
  union:                          OK
    72.7 μs ± 5.9 μs, 26% less than baseline
  difference:                     OK
    63.3 μs ± 6.0 μs, 17% less than baseline
  intersection:                   OK
    28.9 μs ± 2.7 μs, 11% less than baseline
  split:                          OK
    7.58 ns ± 644 ps,       same as baseline
  fromList:                       OK
    59.5 μs ± 2.8 μs,  4% less than baseline
  fromList-desc:                  OK
    363  μs ±  14 μs, 30% less than baseline
  fromAscList:                    OK
    78.6 μs ± 5.3 μs,       same as baseline
  fromDistinctAscList:            OK
    34.5 μs ± 2.9 μs,       same as baseline
  fromDistinctAscList:fusion:     OK
    30.9 μs ± 2.7 μs,       same as baseline
  fromDistinctDescList:           OK
    33.2 μs ± 1.6 μs,       same as baseline
  fromDistinctDescList:fusion:    OK
    30.3 μs ± 2.7 μs,       same as baseline
  minView:                        OK
    18.8 ns ± 1.3 ns, 22% less than baseline

All 72 tests passed (13.93s)
AndreasPK commented 1 month ago

For reference I benchmarked this using 9.10 on a skylake machine. If others could try to reproduce these results I would be grateful.

AndreasPK commented 1 month ago

There is a segfault on 9.4.8, weirdly enough I'm not yet sure it has anything to do with this flag. In particular ghc segfaults when building the Main.hs of the seq-properties tests:

Loading unit process-1.6.18.0 ... linking ... done.
Loading unit transformers-compat-0.7.2 ... linking ... done.
Loading unit optparse-applicative-0.18.1.0 ... linking ... done.
Loading unit tagged-0.8.8 ... linking ... done.
Loading unit stm-2.5.1.0 ... linking ... done.
Loading unit tasty-1.5 ... linking ... done.
Loading unit tasty-quickcheck-0.10.3 ... linking ... done.
Loading unit call-stack-0.4.0 ... linking ... done.
Loading unit tasty-hunit-0.10.1 ... linking ... done.
Loading unit ghc-heap-9.10.1 ... linking ... done.
Loading unit primitive-0.9.0.0 ... linking ... done.
Loading unit vector-stream-0.1.0.1 ... linking ... done.
Loading unit vector-0.13.1.0 ... linking ... done.
Loading unit wherefrom-compat-0.1.1.0 ... linking ... done.
Loading unit nothunks-0.2.1.0 ... linking ... done.
Loading unit containers-tests-0 ... linking ... done.
Search directories (user):
Search directories (gcc):
Segmentation fault (core dumped)

I opened a ghc ticket.

AndreasPK commented 1 month ago

Sadly seems there is a ghc bug in 9.4+ that causes the segfault. So I will let this rest until new point releases which contain a fix have been released.