haskell / containers

Assorted concrete container types
https://hackage.haskell.org/package/containers
315 stars 178 forks source link

Flag adjustments for GHC-9.4 and tag inference. #827

Open AndreasPK opened 2 years ago

AndreasPK commented 2 years ago

In ghc-9.4 the tag inference optimization finally landed. This is great news for containers as it was one of the main motivations for this optimization. And just by using the new GHC container performance should improve.

However in order to get the most out of this containers should enable -fworker-wrapper-cbv for most, or all of it's modules.

There are some edge cases around W/W + RULES why -fworker-wrapper-cbv can't be on by default in 9.4. but I think these are not very relevant for containers and even when they occur downstream users can fix any resulting RULE breakage by proper use of INLINE[ABLE].

9.4 isn't out yet. But to avoid this falling by the wayside I'm opening this issue now.

AndreasPK commented 3 months ago

For reference these are the gains I see when enabling -fworker-wrapper-cbv for containers when running map-benchmarks.

I haven't run any of the other benchmarks:

Warning: Unknown/unsupported 'ghc' version detected (Cabal 3.8.1.0 supports
'ghc' version < 9.6): /opt/ghc-9.10.1/bin/ghc is version 9.10.1
Resolving dependencies...
Up to date
All
  lookup absent:                  OK
    89.4 μs ± 6.4 μs,       same as baseline
  lookup present:                 OK
    75.3 μs ± 5.7 μs,       same as baseline
  map:                            OK
    35.8 μs ± 3.0 μs,       same as baseline
  map really:                     OK
    84.9 μs ± 5.3 μs,       same as baseline
  <$:                             OK
    23.9 μs ± 1.4 μs, 11% less than baseline
  <$ really:                      OK
    57.1 μs ± 2.6 μs,       same as baseline
  alterF lookup absent:           OK
    89.6 μs ± 5.8 μs,       same as baseline
  alterF lookup present:          OK
    75.0 μs ± 5.3 μs,       same as baseline
  alterF no rules lookup absent:  OK
    91.6 μs ± 6.1 μs,       same as baseline
  alterF no rules lookup present: OK
    80.9 μs ± 5.5 μs,       same as baseline
  insert absent:                  OK
    209  μs ±  14 μs, 20% less than baseline
  insert present:                 OK
    159  μs ±  12 μs, 23% less than baseline
  alterF insert absent:           OK
    253  μs ±  22 μs, 20% less than baseline
  alterF insert present:          OK
    180  μs ±  11 μs,       same as baseline
  alterF no rules insert absent:  OK
    296  μs ±  23 μs, 16% less than baseline
  alterF no rules insert present: OK
    223  μs ±  11 μs,       same as baseline
  delete absent:                  OK
    143  μs ±  12 μs,       same as baseline
  delete present:                 OK
    199  μs ±  11 μs, 25% less than baseline
  alterF delete absent:           OK
    163  μs ±  14 μs,       same as baseline
  alterF delete present:          OK
    234  μs ±  22 μs, 22% less than baseline
  alterF no rules delete absent:  OK
    97.8 μs ± 6.0 μs,       same as baseline
  alterF no rules delete present: OK
    272  μs ±  26 μs, 19% less than baseline
  alter absent:                   OK
    211  μs ±  11 μs, 23% less than baseline
  alter insert:                   OK
    215  μs ±  11 μs, 24% less than baseline
  alter update:                   OK
    170  μs ±  11 μs, 24% less than baseline
  alter delete:                   OK
    213  μs ±  13 μs, 25% less than baseline
  alterF alter absent:            OK
    169  μs ±  11 μs,       same as baseline
  alterF alter insert:            OK
    245  μs ±  23 μs, 19% less than baseline
  alterF alter update:            OK
    177  μs ±  11 μs,  9% less than baseline
  alterF alter delete:            OK
    240  μs ±  22 μs, 21% less than baseline
  alterF no rules alter absent:   OK
    98.5 μs ± 5.5 μs,       same as baseline
  alterF no rules alter insert:   OK
    293  μs ±  22 μs, 16% less than baseline
  alterF no rules alter update:   OK
    222  μs ±  11 μs,       same as baseline
  alterF no rules alter delete:   OK
    272  μs ±  22 μs, 19% less than baseline
  insertWith absent:              OK
    209  μs ±  11 μs, 20% less than baseline
  insertWith present:             OK
    164  μs ±  11 μs, 22% less than baseline
  insertWith' absent:             OK
    201  μs ±  11 μs, 20% less than baseline
  insertWith' present:            OK
    171  μs ±  13 μs, 20% less than baseline
  insertWithKey absent:           OK
    215  μs ±  11 μs, 21% less than baseline
  insertWithKey present:          OK
    164  μs ±  11 μs, 23% less than baseline
  insertWithKey' absent:          OK
    201  μs ±  11 μs, 21% less than baseline
  insertWithKey' present:         OK
    159  μs ±  12 μs, 23% less than baseline
  insertLookupWithKey absent:     OK
    216  μs ±  12 μs, 23% less than baseline
  insertLookupWithKey present:    OK
    172  μs ±  11 μs, 23% less than baseline
  insertLookupWithKey' absent:    OK
    207  μs ±  11 μs, 24% less than baseline
  insertLookupWithKey' present:   OK
    174  μs ±  13 μs, 22% less than baseline
  mapWithKey:                     OK
    39.4 μs ± 3.5 μs,       same as baseline
  foldlWithKey:                   OK
    351  μs ±  23 μs, 24% less than baseline
  foldlWithKey':                  OK
    15.0 μs ± 816 ns,       same as baseline
  foldrWithKey:                   OK
    61.5 ns ± 5.3 ns,       same as baseline
  foldrWithKey':                  OK
    32.2 μs ± 1.4 μs,       same as baseline
  update absent:                  OK
    189  μs ±  12 μs, 21% less than baseline
  update present:                 OK
    146  μs ±  11 μs, 24% less than baseline
  update delete:                  OK
    200  μs ±  13 μs, 25% less than baseline
  updateLookupWithKey absent:     OK
    199  μs ±  11 μs, 21% less than baseline
  updateLookupWithKey present:    OK
    161  μs ±  11 μs, 23% less than baseline
  updateLookupWithKey delete:     OK
    207  μs ±  12 μs, 24% less than baseline
  mapMaybe:                       OK
    95.0 μs ± 3.4 μs, 14% less than baseline
  mapMaybeWithKey:                OK
    94.4 μs ± 5.7 μs, 10% less than baseline
  lookupIndex:                    OK
    167  μs ±  11 μs,       same as baseline
  union:                          OK
    72.7 μs ± 5.9 μs, 26% less than baseline
  difference:                     OK
    63.3 μs ± 6.0 μs, 17% less than baseline
  intersection:                   OK
    28.9 μs ± 2.7 μs, 11% less than baseline
  split:                          OK
    7.58 ns ± 644 ps,       same as baseline
  fromList:                       OK
    59.5 μs ± 2.8 μs,  4% less than baseline
  fromList-desc:                  OK
    363  μs ±  14 μs, 30% less than baseline
  fromAscList:                    OK
    78.6 μs ± 5.3 μs,       same as baseline
  fromDistinctAscList:            OK
    34.5 μs ± 2.9 μs,       same as baseline
  fromDistinctAscList:fusion:     OK
    30.9 μs ± 2.7 μs,       same as baseline
  fromDistinctDescList:           OK
    33.2 μs ± 1.6 μs,       same as baseline
  fromDistinctDescList:fusion:    OK
    30.3 μs ± 2.7 μs,       same as baseline
  minView:                        OK
    18.8 ns ± 1.3 ns, 22% less than baseline

All 72 tests passed (13.93s)