Open AndreasPK opened 2 years ago
For reference these are the gains I see when enabling -fworker-wrapper-cbv
for containers when running map-benchmarks.
I haven't run any of the other benchmarks:
Warning: Unknown/unsupported 'ghc' version detected (Cabal 3.8.1.0 supports
'ghc' version < 9.6): /opt/ghc-9.10.1/bin/ghc is version 9.10.1
Resolving dependencies...
Up to date
All
lookup absent: OK
89.4 μs ± 6.4 μs, same as baseline
lookup present: OK
75.3 μs ± 5.7 μs, same as baseline
map: OK
35.8 μs ± 3.0 μs, same as baseline
map really: OK
84.9 μs ± 5.3 μs, same as baseline
<$: OK
23.9 μs ± 1.4 μs, 11% less than baseline
<$ really: OK
57.1 μs ± 2.6 μs, same as baseline
alterF lookup absent: OK
89.6 μs ± 5.8 μs, same as baseline
alterF lookup present: OK
75.0 μs ± 5.3 μs, same as baseline
alterF no rules lookup absent: OK
91.6 μs ± 6.1 μs, same as baseline
alterF no rules lookup present: OK
80.9 μs ± 5.5 μs, same as baseline
insert absent: OK
209 μs ± 14 μs, 20% less than baseline
insert present: OK
159 μs ± 12 μs, 23% less than baseline
alterF insert absent: OK
253 μs ± 22 μs, 20% less than baseline
alterF insert present: OK
180 μs ± 11 μs, same as baseline
alterF no rules insert absent: OK
296 μs ± 23 μs, 16% less than baseline
alterF no rules insert present: OK
223 μs ± 11 μs, same as baseline
delete absent: OK
143 μs ± 12 μs, same as baseline
delete present: OK
199 μs ± 11 μs, 25% less than baseline
alterF delete absent: OK
163 μs ± 14 μs, same as baseline
alterF delete present: OK
234 μs ± 22 μs, 22% less than baseline
alterF no rules delete absent: OK
97.8 μs ± 6.0 μs, same as baseline
alterF no rules delete present: OK
272 μs ± 26 μs, 19% less than baseline
alter absent: OK
211 μs ± 11 μs, 23% less than baseline
alter insert: OK
215 μs ± 11 μs, 24% less than baseline
alter update: OK
170 μs ± 11 μs, 24% less than baseline
alter delete: OK
213 μs ± 13 μs, 25% less than baseline
alterF alter absent: OK
169 μs ± 11 μs, same as baseline
alterF alter insert: OK
245 μs ± 23 μs, 19% less than baseline
alterF alter update: OK
177 μs ± 11 μs, 9% less than baseline
alterF alter delete: OK
240 μs ± 22 μs, 21% less than baseline
alterF no rules alter absent: OK
98.5 μs ± 5.5 μs, same as baseline
alterF no rules alter insert: OK
293 μs ± 22 μs, 16% less than baseline
alterF no rules alter update: OK
222 μs ± 11 μs, same as baseline
alterF no rules alter delete: OK
272 μs ± 22 μs, 19% less than baseline
insertWith absent: OK
209 μs ± 11 μs, 20% less than baseline
insertWith present: OK
164 μs ± 11 μs, 22% less than baseline
insertWith' absent: OK
201 μs ± 11 μs, 20% less than baseline
insertWith' present: OK
171 μs ± 13 μs, 20% less than baseline
insertWithKey absent: OK
215 μs ± 11 μs, 21% less than baseline
insertWithKey present: OK
164 μs ± 11 μs, 23% less than baseline
insertWithKey' absent: OK
201 μs ± 11 μs, 21% less than baseline
insertWithKey' present: OK
159 μs ± 12 μs, 23% less than baseline
insertLookupWithKey absent: OK
216 μs ± 12 μs, 23% less than baseline
insertLookupWithKey present: OK
172 μs ± 11 μs, 23% less than baseline
insertLookupWithKey' absent: OK
207 μs ± 11 μs, 24% less than baseline
insertLookupWithKey' present: OK
174 μs ± 13 μs, 22% less than baseline
mapWithKey: OK
39.4 μs ± 3.5 μs, same as baseline
foldlWithKey: OK
351 μs ± 23 μs, 24% less than baseline
foldlWithKey': OK
15.0 μs ± 816 ns, same as baseline
foldrWithKey: OK
61.5 ns ± 5.3 ns, same as baseline
foldrWithKey': OK
32.2 μs ± 1.4 μs, same as baseline
update absent: OK
189 μs ± 12 μs, 21% less than baseline
update present: OK
146 μs ± 11 μs, 24% less than baseline
update delete: OK
200 μs ± 13 μs, 25% less than baseline
updateLookupWithKey absent: OK
199 μs ± 11 μs, 21% less than baseline
updateLookupWithKey present: OK
161 μs ± 11 μs, 23% less than baseline
updateLookupWithKey delete: OK
207 μs ± 12 μs, 24% less than baseline
mapMaybe: OK
95.0 μs ± 3.4 μs, 14% less than baseline
mapMaybeWithKey: OK
94.4 μs ± 5.7 μs, 10% less than baseline
lookupIndex: OK
167 μs ± 11 μs, same as baseline
union: OK
72.7 μs ± 5.9 μs, 26% less than baseline
difference: OK
63.3 μs ± 6.0 μs, 17% less than baseline
intersection: OK
28.9 μs ± 2.7 μs, 11% less than baseline
split: OK
7.58 ns ± 644 ps, same as baseline
fromList: OK
59.5 μs ± 2.8 μs, 4% less than baseline
fromList-desc: OK
363 μs ± 14 μs, 30% less than baseline
fromAscList: OK
78.6 μs ± 5.3 μs, same as baseline
fromDistinctAscList: OK
34.5 μs ± 2.9 μs, same as baseline
fromDistinctAscList:fusion: OK
30.9 μs ± 2.7 μs, same as baseline
fromDistinctDescList: OK
33.2 μs ± 1.6 μs, same as baseline
fromDistinctDescList:fusion: OK
30.3 μs ± 2.7 μs, same as baseline
minView: OK
18.8 ns ± 1.3 ns, 22% less than baseline
All 72 tests passed (13.93s)
In ghc-9.4 the tag inference optimization finally landed. This is great news for containers as it was one of the main motivations for this optimization. And just by using the new GHC container performance should improve.
However in order to get the most out of this containers should enable
-fworker-wrapper-cbv
for most, or all of it's modules.There are some edge cases around W/W + RULES why
-fworker-wrapper-cbv
can't be on by default in 9.4. but I think these are not very relevant for containers and even when they occur downstream users can fix any resulting RULE breakage by proper use ofINLINE[ABLE].
9.4 isn't out yet. But to avoid this falling by the wayside I'm opening this issue now.