Closed mpark closed 5 years ago
The current C++11 implementation using a chain of ternary operators gets optimized nicely under Clang >= 4.0, but does not work well for any GCC.
Clang 3.9: https://travis-ci.org/mpark/variant/jobs/474776453
5: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark "--benchmark_color=true"
5: Test timeout computed to be: 10000000
5: 2019-01-03 10:15:54
5: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark
5: Run on (2 X 2300 MHz CPU s)
5: CPU Caches:
5: L1 Data 32K (x1)
5: L1 Instruction 32K (x1)
5: L2 Unified 256K (x1)
5: L3 Unified 46080K (x1)
5: Load Average: 1.30, 1.00, 0.46
5: ---------------------------------------------------
5: Benchmark Time CPU Iterations
5: ---------------------------------------------------
5: visit1<1> 2801 ns 2801 ns 249965
5: visit1<2> 9519 ns 9519 ns 73370
5: visit1<3> 11382 ns 11382 ns 61655
5: visit1<5> 11409 ns 11409 ns 61178
5: visit1<8> 10320 ns 10320 ns 67497
5: visit1<15> 10483 ns 10483 ns 66962
5: visit1<32> 12492 ns 12492 ns 55995
5: visit1<65> 11254 ns 11253 ns 62465
5: visit1<128> 10673 ns 10673 ns 65601
5:
1/2 Test #5: execute.visit.1.mpark ............ Passed 7.60 sec
test 6
Start 6: execute.visit.1.mpark-dev
6: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev "--benchmark_color=true"
6: Test timeout computed to be: 10000000
6: 2019-01-03 10:16:02
6: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev
6: Run on (2 X 2300 MHz CPU s)
6: CPU Caches:
6: L1 Data 32K (x1)
6: L1 Instruction 32K (x1)
6: L2 Unified 256K (x1)
6: L3 Unified 46080K (x1)
6: Load Average: 1.25, 1.00, 0.47
6: ---------------------------------------------------
6: Benchmark Time CPU Iterations
6: ---------------------------------------------------
6: visit1<1> 2843 ns 2842 ns 248350
6: visit1<2> 4347 ns 4347 ns 161527
6: visit1<3> 5755 ns 5755 ns 121301
6: visit1<5> 7205 ns 7205 ns 96040
6: visit1<8> 7921 ns 7921 ns 89186
6: visit1<15> 11592 ns 11592 ns 60326
6: visit1<32> 19671 ns 19609 ns 36296
6: visit1<65> 40742 ns 40741 ns 17328
6: visit1<128> 72147 ns 72147 ns 9872
Clang 4.0: https://travis-ci.org/mpark/variant/jobs/474776455
5: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark "--benchmark_color=true"
5: Test timeout computed to be: 10000000
5: 2019-01-03 10:18:31
5: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark
5: Run on (2 X 2300 MHz CPU s)
5: CPU Caches:
5: L1 Data 32K (x1)
5: L1 Instruction 32K (x1)
5: L2 Unified 256K (x1)
5: L3 Unified 46080K (x1)
5: Load Average: 1.22, 1.08, 0.52
5: ---------------------------------------------------
5: Benchmark Time CPU Iterations
5: ---------------------------------------------------
5: visit1<1> 2868 ns 2868 ns 241164
5: visit1<2> 10869 ns 10869 ns 64441
5: visit1<3> 10247 ns 10247 ns 69007
5: visit1<5> 11963 ns 11963 ns 58924
5: visit1<8> 11442 ns 11442 ns 60372
5: visit1<15> 10584 ns 10583 ns 64373
5: visit1<32> 10750 ns 10750 ns 66075
5: visit1<65> 12604 ns 12604 ns 56448
5: visit1<128> 19759 ns 19759 ns 34593
5:
1/2 Test #5: execute.visit.1.mpark ............ Passed 7.72 sec
test 6
Start 6: execute.visit.1.mpark-dev
6: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev "--benchmark_color=true"
6: Test timeout computed to be: 10000000
6: 2019-01-03 10:18:38
6: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev
6: Run on (2 X 2300 MHz CPU s)
6: CPU Caches:
6: L1 Data 32K (x1)
6: L1 Instruction 32K (x1)
6: L2 Unified 256K (x1)
6: L3 Unified 46080K (x1)
6: Load Average: 1.21, 1.07, 0.52
6: ---------------------------------------------------
6: Benchmark Time CPU Iterations
6: ---------------------------------------------------
6: visit1<1> 2896 ns 2896 ns 242707
6: visit1<2> 4413 ns 4412 ns 159670
6: visit1<3> 3852 ns 3851 ns 179790
6: visit1<5> 2977 ns 2977 ns 235361
6: visit1<8> 2954 ns 2954 ns 234164
6: visit1<15> 3022 ns 3021 ns 232849
6: visit1<32> 3833 ns 3833 ns 181819
6: visit1<65> 4925 ns 4924 ns 143037
6: visit1<128> 3833 ns 3833 ns 180677
GCC 4.9: https://travis-ci.org/mpark/variant/jobs/474776437
5: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark "--benchmark_color=true"
5: Test timeout computed to be: 10000000
5: 2019-01-03 09:56:50
5: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark
5: Run on (2 X 2300 MHz CPU s)
5: CPU Caches:
5: L1 Data 32K (x1)
5: L1 Instruction 32K (x1)
5: L2 Unified 256K (x1)
5: L3 Unified 46080K (x1)
5: Load Average: 1.09, 0.87, 0.41
5: ---------------------------------------------------
5: Benchmark Time CPU Iterations
5: ---------------------------------------------------
5: visit1<1> 3790 ns 3789 ns 183728
5: visit1<2> 11373 ns 11373 ns 61747
5: visit1<3> 12018 ns 12018 ns 58302
5: visit1<5> 10370 ns 10370 ns 68172
5: visit1<8> 10569 ns 10569 ns 65518
5: visit1<15> 11400 ns 11400 ns 61299
5: visit1<32> 12453 ns 12436 ns 55540
5: visit1<65> 10548 ns 10545 ns 66160
5: visit1<128> 10768 ns 10767 ns 65310
5:
1/2 Test #5: execute.visit.1.mpark ............ Passed 7.73 sec
test 6
Start 6: execute.visit.1.mpark-dev
6: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev "--benchmark_color=true"
6: Test timeout computed to be: 10000000
6: 2019-01-03 09:56:58
6: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev
6: Run on (2 X 2300 MHz CPU s)
6: CPU Caches:
6: L1 Data 32K (x1)
6: L1 Instruction 32K (x1)
6: L2 Unified 256K (x1)
6: L3 Unified 46080K (x1)
6: Load Average: 1.08, 0.87, 0.42
6: ---------------------------------------------------
6: Benchmark Time CPU Iterations
6: ---------------------------------------------------
6: visit1<1> 6873 ns 6873 ns 101490
6: visit1<2> 4174 ns 4174 ns 168032
6: visit1<3> 6520 ns 6520 ns 107322
6: visit1<5> 8068 ns 8068 ns 86855
6: visit1<8> 9532 ns 9532 ns 74410
6: visit1<15> 11307 ns 11307 ns 61789
6: visit1<32> 20581 ns 20581 ns 34392
6: visit1<65> 38014 ns 38015 ns 18733
6: visit1<128> 69736 ns 69735 ns 9869
GCC 8: https://travis-ci.org/mpark/variant/jobs/474776448
5: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark "--benchmark_color=true"
5: Test timeout computed to be: 10000000
5: 2019-01-03 10:09:28
5: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark
5: Run on (2 X 2300 MHz CPU s)
5: CPU Caches:
5: L1 Data 32K (x1)
5: L1 Instruction 32K (x1)
5: L2 Unified 256K (x1)
5: L3 Unified 46080K (x1)
5: Load Average: 1.15, 1.04, 0.50
5: ---------------------------------------------------
5: Benchmark Time CPU Iterations
5: ---------------------------------------------------
5: visit1<1> 3805 ns 3805 ns 182961
5: visit1<2> 10565 ns 10565 ns 66086
5: visit1<3> 11357 ns 11357 ns 61588
5: visit1<5> 11736 ns 11736 ns 59659
5: visit1<8> 12278 ns 12278 ns 57865
5: visit1<15> 12271 ns 12270 ns 57095
5: visit1<32> 11354 ns 11354 ns 61476
5: visit1<65> 12317 ns 12316 ns 56952
5: visit1<128> 10754 ns 10754 ns 64382
5:
1/2 Test #5: execute.visit.1.mpark ............ Passed 7.76 sec
test 6
Start 6: execute.visit.1.mpark-dev
6: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev "--benchmark_color=true"
6: Test timeout computed to be: 10000000
6: 2019-01-03 10:09:36
6: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev
6: Run on (2 X 2300 MHz CPU s)
6: CPU Caches:
6: L1 Data 32K (x1)
6: L1 Instruction 32K (x1)
6: L2 Unified 256K (x1)
6: L3 Unified 46080K (x1)
6: Load Average: 1.14, 1.04, 0.50
6: ---------------------------------------------------
6: Benchmark Time CPU Iterations
6: ---------------------------------------------------
6: visit1<1> 5757 ns 5757 ns 121563
6: visit1<2> 4699 ns 4699 ns 149398
6: visit1<3> 6353 ns 6353 ns 109737
6: visit1<5> 7424 ns 7424 ns 93428
6: visit1<8> 8623 ns 8623 ns 81464
6: visit1<15> 13569 ns 13568 ns 50929
6: visit1<32> 20769 ns 20769 ns 33469
6: visit1<65> 38117 ns 38116 ns 18302
6: visit1<128> 69499 ns 69499 ns 9904
The current C++14/17 implementation of recursive switch
works well for all of Clang, but has some issues under GCC.
Clang 7: https://travis-ci.org/mpark/variant/jobs/474776465
5: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark "--benchmark_color=true"
5: Test timeout computed to be: 10000000
5: 2019-01-03 10:31:18
5: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark
5: Run on (2 X 2500 MHz CPU s)
5: CPU Caches:
5: L1 Data 32K (x1)
5: L1 Instruction 32K (x1)
5: L2 Unified 256K (x1)
5: L3 Unified 30720K (x1)
5: Load Average: 1.32, 1.07, 0.51
5: ---------------------------------------------------
5: Benchmark Time CPU Iterations
5: ---------------------------------------------------
5: visit1<1> 2864 ns 2862 ns 239638
5: visit1<2> 11464 ns 11455 ns 61048
5: visit1<3> 10244 ns 10239 ns 68836
5: visit1<5> 9989 ns 9981 ns 70500
5: visit1<8> 11452 ns 11444 ns 60595
5: visit1<15> 12330 ns 12321 ns 56714
5: visit1<32> 10617 ns 10605 ns 66692
5: visit1<65> 10673 ns 10667 ns 65805
5: visit1<128> 14185 ns 14174 ns 48735
5:
1/2 Test #5: execute.visit.1.mpark ............ Passed 7.64 sec
test 6
Start 6: execute.visit.1.mpark-dev
6: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev "--benchmark_color=true"
6: Test timeout computed to be: 10000000
6: 2019-01-03 10:31:25
6: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev
6: Run on (2 X 2500 MHz CPU s)
6: CPU Caches:
6: L1 Data 32K (x1)
6: L1 Instruction 32K (x1)
6: L2 Unified 256K (x1)
6: L3 Unified 30720K (x1)
6: Load Average: 1.30, 1.07, 0.52
6: ---------------------------------------------------
6: Benchmark Time CPU Iterations
6: ---------------------------------------------------
6: visit1<1> 2850 ns 2846 ns 245817
6: visit1<2> 3436 ns 3434 ns 201585
6: visit1<3> 2940 ns 2939 ns 236889
6: visit1<5> 3838 ns 3836 ns 182712
6: visit1<8> 2973 ns 2972 ns 236992
6: visit1<15> 2957 ns 2956 ns 237050
6: visit1<32> 2951 ns 2949 ns 237763
6: visit1<65> 2981 ns 2980 ns 235619
6: visit1<128> 9646 ns 9641 ns 72540
GCC 8: https://travis-ci.org/mpark/variant/jobs/474776450
5: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark "--benchmark_color=true"
5: Test timeout computed to be: 10000000
5: 2019-01-03 10:11:32
5: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark
5: Run on (2 X 2500 MHz CPU s)
5: CPU Caches:
5: L1 Data 32K (x1)
5: L1 Instruction 32K (x1)
5: L2 Unified 256K (x1)
5: L3 Unified 30720K (x1)
5: Load Average: 1.14, 0.89, 0.41
5: ---------------------------------------------------
5: Benchmark Time CPU Iterations
5: ---------------------------------------------------
5: visit1<1> 2799 ns 2798 ns 250365
5: visit1<2> 9468 ns 9463 ns 73890
5: visit1<3> 11481 ns 11475 ns 61645
5: visit1<5> 11572 ns 11566 ns 60407
5: visit1<8> 11592 ns 11586 ns 60486
5: visit1<15> 11585 ns 11579 ns 58922
5: visit1<32> 10668 ns 10663 ns 65947
5: visit1<65> 10632 ns 10626 ns 64057
5: visit1<128> 11657 ns 11651 ns 54142
5:
1/2 Test #5: execute.visit.1.mpark ............ Passed 7.54 sec
test 6
Start 6: execute.visit.1.mpark-dev
6: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev "--benchmark_color=true"
6: Test timeout computed to be: 10000000
6: 2019-01-03 10:11:40
6: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev
6: Run on (2 X 2500 MHz CPU s)
6: CPU Caches:
6: L1 Data 32K (x1)
6: L1 Instruction 32K (x1)
6: L2 Unified 256K (x1)
6: L3 Unified 30720K (x1)
6: Load Average: 1.11, 0.90, 0.41
6: ---------------------------------------------------
6: Benchmark Time CPU Iterations
6: ---------------------------------------------------
6: visit1<1> 2827 ns 2825 ns 248203
6: visit1<2> 4179 ns 4176 ns 166883
6: visit1<3> 2858 ns 2856 ns 241306
6: visit1<5> 2858 ns 2856 ns 241945
6: visit1<8> 3826 ns 3823 ns 181854
6: visit1<15> 2850 ns 2848 ns 244742
6: visit1<32> 3830 ns 3827 ns 182111
6: visit1<65> 6870 ns 6865 ns 102572
6: visit1<128> 12199 ns 12164 ns 58194
GCC >= 7 perform well compared to the manual jump table, but it does not
inline/collapse the recursive switch
. This is reflected in our numbers in
the progression from visit<32>
, visit<65>
to visit<128>
. It can also be
seen concretely in the generated assembly: https://godbolt.org/z/Rfcv9C
GCC 6: https://travis-ci.org/mpark/variant/jobs/474776444
5: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark "--benchmark_color=true"
5: Test timeout computed to be: 10000000
5: 2019-01-03 10:04:26
5: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark
5: Run on (2 X 2500 MHz CPU s)
5: CPU Caches:
5: L1 Data 32K (x1)
5: L1 Instruction 32K (x1)
5: L2 Unified 256K (x1)
5: L3 Unified 30720K (x1)
5: Load Average: 1.25, 0.94, 0.43
5: ---------------------------------------------------
5: Benchmark Time CPU Iterations
5: ---------------------------------------------------
5: visit1<1> 2807 ns 2805 ns 249207
5: visit1<2> 9514 ns 9514 ns 72992
5: visit1<3> 11995 ns 11995 ns 58128
5: visit1<5> 10296 ns 10296 ns 68163
5: visit1<8> 10375 ns 10375 ns 67930
5: visit1<15> 10541 ns 10541 ns 64657
5: visit1<32> 10666 ns 10666 ns 63728
5: visit1<65> 10516 ns 10516 ns 66144
5: visit1<128> 12532 ns 12532 ns 57244
5:
1/2 Test #5: execute.visit.1.mpark ............ Passed 7.58 sec
test 6
Start 6: execute.visit.1.mpark-dev
6: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev "--benchmark_color=true"
6: Test timeout computed to be: 10000000
6: 2019-01-03 10:04:34
6: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev
6: Run on (2 X 2500 MHz CPU s)
6: CPU Caches:
6: L1 Data 32K (x1)
6: L1 Instruction 32K (x1)
6: L2 Unified 256K (x1)
6: L3 Unified 30720K (x1)
6: Load Average: 1.21, 0.95, 0.44
6: ---------------------------------------------------
6: Benchmark Time CPU Iterations
6: ---------------------------------------------------
6: visit1<1> 7626 ns 7615 ns 90840
6: visit1<2> 7466 ns 7462 ns 94088
6: visit1<3> 10818 ns 10818 ns 64757
6: visit1<5> 11162 ns 11160 ns 63228
6: visit1<8> 10955 ns 10953 ns 63846
6: visit1<15> 11100 ns 11098 ns 63626
6: visit1<32> 11341 ns 11338 ns 61223
6: visit1<65> 12561 ns 12561 ns 55673
6: visit1<128> 16291 ns 16290 ns 42286
GCC < 7 produces a jump table of size 32 (our switch
size) even when unneeded.
This can be seen for example, in the generated assembly for visit<8>
: https://godbolt.org/z/8XKQGD. Although it seems like it's not actually worse than
the manual jump table aside from the visit<1>
case 🤔
Follow-up on the current C++14/17 implementation with regards to GCC >= 7.
I was concerned that the non-inlined recursive switch
would grow faster than
the manual jump table approach, but having tested for visit<200>
and visit<250>
,
the manual jump table performance also grows, at approximately the same rate it seems.
GCC 8: https://travis-ci.org/mpark/variant/jobs/475164795
5: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark "--benchmark_color=true"
5: Test timeout computed to be: 10000000
5: 2019-01-04 05:49:01
5: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark
5: Run on (2 X 2600 MHz CPU s)
5: CPU Caches:
5: L1 Data 32K (x1)
5: L1 Instruction 32K (x1)
5: L2 Unified 256K (x1)
5: L3 Unified 20480K (x1)
5: Load Average: 1.00, 1.00, 0.70
5: ---------------------------------------------------
5: Benchmark Time CPU Iterations
5: ---------------------------------------------------
5: visit1<200> 17584 ns 17574 ns 40196
5: visit1<250> 20115 ns 20105 ns 34640
5:
1/2 Test #5: execute.visit.1.mpark ............ Passed 1.84 sec
test 6
Start 6: execute.visit.1.mpark-dev
6: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev "--benchmark_color=true"
6: Test timeout computed to be: 10000000
6: 2019-01-04 05:49:03
6: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev
6: Run on (2 X 2600 MHz CPU s)
6: CPU Caches:
6: L1 Data 32K (x1)
6: L1 Instruction 32K (x1)
6: L2 Unified 256K (x1)
6: L3 Unified 20480K (x1)
6: Load Average: 1.00, 1.00, 0.70
6: ---------------------------------------------------
6: Benchmark Time CPU Iterations
6: ---------------------------------------------------
6: visit1<200> 16236 ns 16227 ns 43488
6: visit1<250> 18135 ns 18126 ns 38061
This is also roughly the case for GCC < 7.
GCC 6: https://travis-ci.org/mpark/variant/jobs/475164789
5: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark "--benchmark_color=true"
5: Test timeout computed to be: 10000000
5: 2019-01-04 05:34:19
5: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark
5: Run on (2 X 2600 MHz CPU s)
5: CPU Caches:
5: L1 Data 32K (x1)
5: L1 Instruction 32K (x1)
5: L2 Unified 256K (x1)
5: L3 Unified 20480K (x1)
5: Load Average: 1.40, 1.13, 0.78
5: ---------------------------------------------------
5: Benchmark Time CPU Iterations
5: ---------------------------------------------------
5: visit1<200> 15825 ns 15825 ns 43707
5: visit1<250> 20641 ns 20641 ns 32384
5:
1/2 Test #5: execute.visit.1.mpark ............ Passed 1.80 sec
test 6
Start 6: execute.visit.1.mpark-dev
6: Test command: /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev "--benchmark_color=true"
6: Test timeout computed to be: 10000000
6: 2019-01-04 05:34:20
6: Running /home/travis/build/mpark/variant/build/visit.1/execute.visit.1.mpark-dev
6: Run on (2 X 2600 MHz CPU s)
6: CPU Caches:
6: L1 Data 32K (x1)
6: L1 Instruction 32K (x1)
6: L2 Unified 256K (x1)
6: L3 Unified 20480K (x1)
6: Load Average: 1.40, 1.13, 0.78
6: ---------------------------------------------------
6: Benchmark Time CPU Iterations
6: ---------------------------------------------------
6: visit1<200> 18900 ns 18889 ns 37390
6: visit1<250> 20114 ns 20104 ns 34828
It seems safe to me to proceed with the current implementation, for C++14/17.
I've reverted the C++11 implementation to be the manual jump table approach,
and only updated C++14/17 implementation to be recursive switch
.
The following is the full matrix of the benchmarks for visit.1
https://travis-ci.org/mpark/variant/builds/476107115
Follow-up to #52 CC @quicknir