The functions when reading and writing using encase's wrappers can be very hot. #38 shows that these functions are not being inlined as aggressively. This PR attempts to coax the compiler doing so. Resolves #38.
Full benchmark results:
```
Gnuplot not found, using plotters backend
Benchmarking Troughput/16KiB_write
Benchmarking Troughput/16KiB_write: Warming up for 3.0000 s
Benchmarking Troughput/16KiB_write: Collecting 100 samples in estimated 5.0075 s (1.9M iterations)
Benchmarking Troughput/16KiB_write: Analyzing
Troughput/16KiB_write time: [1.5729 µs 1.5746 µs 1.5763 µs]
thrpt: [9.6799 GiB/s 9.6903 GiB/s 9.7013 GiB/s]
change:
time: [-8.0639% -7.8170% -7.5969%] (p = 0.00 < 0.05)
thrpt: [+8.2215% +8.4799% +8.7713%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low severe
3 (3.00%) high mild
Benchmarking Troughput/16KiB_read
Benchmarking Troughput/16KiB_read: Warming up for 3.0000 s
Benchmarking Troughput/16KiB_read: Collecting 100 samples in estimated 5.0047 s (2.3M iterations)
Benchmarking Troughput/16KiB_read: Analyzing
Troughput/16KiB_read time: [1.0712 µs 1.0730 µs 1.0748 µs]
thrpt: [14.197 GiB/s 14.221 GiB/s 14.244 GiB/s]
change:
time: [-11.805% -11.044% -10.376%] (p = 0.00 < 0.05)
thrpt: [+11.578% +12.415% +13.386%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) low mild
Benchmarking Troughput/16KiB_create
Benchmarking Troughput/16KiB_create: Warming up for 3.0000 s
Benchmarking Troughput/16KiB_create: Collecting 100 samples in estimated 5.0085 s (1.5M iterations)
Benchmarking Troughput/16KiB_create: Analyzing
Troughput/16KiB_create time: [2.4438 µs 2.4495 µs 2.4545 µs]
thrpt: [6.2166 GiB/s 6.2293 GiB/s 6.2439 GiB/s]
change:
time: [-63.717% -63.466% -63.239%] (p = 0.00 < 0.05)
thrpt: [+172.03% +173.72% +175.61%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) low severe
4 (4.00%) low mild
Benchmarking Troughput/16KiB_manual
Benchmarking Troughput/16KiB_manual: Warming up for 3.0000 s
Benchmarking Troughput/16KiB_manual: Collecting 100 samples in estimated 5.0078 s (3.0M iterations)
Benchmarking Troughput/16KiB_manual: Analyzing
Troughput/16KiB_manual time: [356.64 ns 357.51 ns 358.17 ns]
thrpt: [42.602 GiB/s 42.681 GiB/s 42.785 GiB/s]
change:
time: [-2.8350% -1.1837% +0.5988%] (p = 0.19 > 0.05)
thrpt: [-0.5953% +1.1979% +2.9177%]
No change in performance detected.
Benchmarking Troughput/16KiB_stdlib
Benchmarking Troughput/16KiB_stdlib: Warming up for 3.0000 s
Benchmarking Troughput/16KiB_stdlib: Collecting 100 samples in estimated 5.0018 s (3.1M iterations)
Benchmarking Troughput/16KiB_stdlib: Analyzing
Troughput/16KiB_stdlib time: [379.93 ns 381.21 ns 382.13 ns]
thrpt: [39.931 GiB/s 40.028 GiB/s 40.163 GiB/s]
change:
time: [-2.7601% -0.1978% +2.5277%] (p = 0.89 > 0.05)
thrpt: [-2.4654% +0.1982% +2.8384%]
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
Benchmarking Troughput/128KiB_write
Benchmarking Troughput/128KiB_write: Warming up for 3.0000 s
Benchmarking Troughput/128KiB_write: Collecting 100 samples in estimated 5.0964 s (252k iterations)
Benchmarking Troughput/128KiB_write: Analyzing
Troughput/128KiB_write time: [12.469 µs 12.475 µs 12.482 µs]
thrpt: [9.7800 GiB/s 9.7854 GiB/s 9.7902 GiB/s]
change:
time: [-8.5845% -8.4567% -8.3377%] (p = 0.00 < 0.05)
thrpt: [+9.0960% +9.2379% +9.3906%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
Benchmarking Troughput/128KiB_read
Benchmarking Troughput/128KiB_read: Warming up for 3.0000 s
Benchmarking Troughput/128KiB_read: Collecting 100 samples in estimated 5.0326 s (323k iterations)
Benchmarking Troughput/128KiB_read: Analyzing
Troughput/128KiB_read time: [8.4088 µs 8.4427 µs 8.4746 µs]
thrpt: [14.404 GiB/s 14.459 GiB/s 14.517 GiB/s]
change:
time: [-14.693% -12.591% -10.795%] (p = 0.00 < 0.05)
thrpt: [+12.102% +14.404% +17.223%]
Performance has improved.
Benchmarking Troughput/128KiB_create
Benchmarking Troughput/128KiB_create: Warming up for 3.0000 s
Benchmarking Troughput/128KiB_create: Collecting 100 samples in estimated 5.0483 s (207k iterations)
Benchmarking Troughput/128KiB_create: Analyzing
Troughput/128KiB_create time: [16.998 µs 17.016 µs 17.034 µs]
thrpt: [7.1665 GiB/s 7.1740 GiB/s 7.1815 GiB/s]
change:
time: [-67.741% -67.671% -67.614%] (p = 0.00 < 0.05)
thrpt: [+208.77% +209.32% +209.99%]
Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
12 (12.00%) low mild
4 (4.00%) high mild
Benchmarking Troughput/128KiB_manual
Benchmarking Troughput/128KiB_manual: Warming up for 3.0000 s
Benchmarking Troughput/128KiB_manual: Collecting 100 samples in estimated 5.0282 s (439k iterations)
Benchmarking Troughput/128KiB_manual: Analyzing
Troughput/128KiB_manual time: [2.6861 µs 2.6882 µs 2.6903 µs]
thrpt: [45.374 GiB/s 45.410 GiB/s 45.446 GiB/s]
change:
time: [-0.9956% +0.0743% +1.1614%] (p = 0.89 > 0.05)
thrpt: [-1.1481% -0.0743% +1.0056%]
No change in performance detected.
Found 22 outliers among 100 measurements (22.00%)
9 (9.00%) low severe
13 (13.00%) low mild
Benchmarking Troughput/128KiB_stdlib
Benchmarking Troughput/128KiB_stdlib: Warming up for 3.0000 s
Benchmarking Troughput/128KiB_stdlib: Collecting 100 samples in estimated 5.0218 s (303k iterations)
Benchmarking Troughput/128KiB_stdlib: Analyzing
Troughput/128KiB_stdlib time: [2.9594 µs 2.9727 µs 2.9827 µs]
thrpt: [40.926 GiB/s 41.064 GiB/s 41.248 GiB/s]
change:
time: [-3.8721% -1.7726% +0.2982%] (p = 0.10 > 0.05)
thrpt: [-0.2973% +1.8046% +4.0281%]
No change in performance detected.
Benchmarking Troughput/1MiB_write
Benchmarking Troughput/1MiB_write: Warming up for 3.0000 s
Benchmarking Troughput/1MiB_write: Collecting 100 samples in estimated 5.2833 s (30k iterations)
Benchmarking Troughput/1MiB_write: Analyzing
Troughput/1MiB_write time: [116.57 µs 116.63 µs 116.70 µs]
thrpt: [8.3683 GiB/s 8.3731 GiB/s 8.3774 GiB/s]
change:
time: [-12.534% -12.325% -12.157%] (p = 0.00 < 0.05)
thrpt: [+13.840% +14.058% +14.330%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
Benchmarking Troughput/1MiB_read
Benchmarking Troughput/1MiB_read: Warming up for 3.0000 s
Benchmarking Troughput/1MiB_read: Collecting 100 samples in estimated 5.2757 s (35k iterations)
Benchmarking Troughput/1MiB_read: Analyzing
Troughput/1MiB_read time: [94.258 µs 94.435 µs 94.643 µs]
thrpt: [10.318 GiB/s 10.341 GiB/s 10.360 GiB/s]
change:
time: [-6.8999% -6.5813% -6.2845%] (p = 0.00 < 0.05)
thrpt: [+6.7059% +7.0450% +7.4113%]
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low mild
3 (3.00%) high mild
5 (5.00%) high severe
Benchmarking Troughput/1MiB_create
Benchmarking Troughput/1MiB_create: Warming up for 3.0000 s
Benchmarking Troughput/1MiB_create: Collecting 100 samples in estimated 5.1894 s (25k iterations)
Benchmarking Troughput/1MiB_create: Analyzing
Troughput/1MiB_create time: [154.30 µs 154.51 µs 154.76 µs]
thrpt: [6.3103 GiB/s 6.3202 GiB/s 6.3291 GiB/s]
change:
time: [-64.810% -64.757% -64.706%] (p = 0.00 < 0.05)
thrpt: [+183.33% +183.74% +184.17%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
Benchmarking Troughput/1MiB_manual
Benchmarking Troughput/1MiB_manual: Warming up for 3.0000 s
Benchmarking Troughput/1MiB_manual: Collecting 100 samples in estimated 5.4942 s (40k iterations)
Benchmarking Troughput/1MiB_manual: Analyzing
Troughput/1MiB_manual time: [55.898 µs 56.160 µs 56.522 µs]
thrpt: [17.278 GiB/s 17.389 GiB/s 17.470 GiB/s]
change:
time: [-4.0751% -3.1067% -2.2785%] (p = 0.00 < 0.05)
thrpt: [+2.3316% +3.2063% +4.2482%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
Benchmarking Troughput/1MiB_stdlib
Benchmarking Troughput/1MiB_stdlib: Warming up for 3.0000 s
Benchmarking Troughput/1MiB_stdlib: Collecting 100 samples in estimated 5.3567 s (35k iterations)
Benchmarking Troughput/1MiB_stdlib: Analyzing
Troughput/1MiB_stdlib time: [78.908 µs 79.002 µs 79.100 µs]
thrpt: [12.346 GiB/s 12.361 GiB/s 12.376 GiB/s]
change:
time: [+8.2706% +8.7065% +9.0880%] (p = 0.00 < 0.05)
thrpt: [-8.3309% -8.0092% -7.6388%]
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) low severe
2 (2.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
Benchmarking Troughput/16MiB_write
Benchmarking Troughput/16MiB_write: Warming up for 3.0000 s
Benchmarking Troughput/16MiB_write: Collecting 100 samples in estimated 5.1046 s (1700 iterations)
Benchmarking Troughput/16MiB_write: Analyzing
Troughput/16MiB_write time: [1.9566 ms 1.9629 ms 1.9704 ms]
thrpt: [7.9299 GiB/s 7.9602 GiB/s 7.9857 GiB/s]
change:
time: [-9.5856% -9.1602% -8.7316%] (p = 0.00 < 0.05)
thrpt: [+9.5670% +10.084% +10.602%]
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) high mild
9 (9.00%) high severe
Benchmarking Troughput/16MiB_read
Benchmarking Troughput/16MiB_read: Warming up for 3.0000 s
Benchmarking Troughput/16MiB_read: Collecting 100 samples in estimated 5.0517 s (2000 iterations)
Benchmarking Troughput/16MiB_read: Analyzing
Troughput/16MiB_read time: [1.5998 ms 1.6033 ms 1.6074 ms]
thrpt: [9.7208 GiB/s 9.7455 GiB/s 9.7666 GiB/s]
change:
time: [-5.7922% -4.1732% -3.0456%] (p = 0.00 < 0.05)
thrpt: [+3.1412% +4.3549% +6.1484%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high severe
Benchmarking Troughput/16MiB_create
Benchmarking Troughput/16MiB_create: Warming up for 3.0000 s
Benchmarking Troughput/16MiB_create: Collecting 100 samples in estimated 5.1876 s (1500 iterations)
Benchmarking Troughput/16MiB_create: Analyzing
Troughput/16MiB_create time: [2.5976 ms 2.6039 ms 2.6114 ms]
thrpt: [5.9833 GiB/s 6.0006 GiB/s 6.0151 GiB/s]
change:
time: [-63.189% -63.099% -62.987%] (p = 0.00 < 0.05)
thrpt: [+170.17% +170.99% +171.66%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
Benchmarking Troughput/16MiB_manual
Benchmarking Troughput/16MiB_manual: Warming up for 3.0000 s
Benchmarking Troughput/16MiB_manual: Collecting 100 samples in estimated 5.0462 s (2100 iterations)
Benchmarking Troughput/16MiB_manual: Analyzing
Troughput/16MiB_manual time: [1.1353 ms 1.1425 ms 1.1508 ms]
thrpt: [13.577 GiB/s 13.677 GiB/s 13.763 GiB/s]
change:
time: [+15.028% +16.322% +17.589%] (p = 0.00 < 0.05)
thrpt: [-14.958% -14.032% -13.065%]
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
Benchmarking Troughput/16MiB_stdlib
Benchmarking Troughput/16MiB_stdlib: Warming up for 3.0000 s
Benchmarking Troughput/16MiB_stdlib: Collecting 100 samples in estimated 5.0448 s (1900 iterations)
Benchmarking Troughput/16MiB_stdlib: Analyzing
Troughput/16MiB_stdlib time: [1.3901 ms 1.3965 ms 1.4041 ms]
thrpt: [11.128 GiB/s 11.189 GiB/s 11.240 GiB/s]
change:
time: [-2.3783% -1.1699% +0.0554%] (p = 0.06 > 0.05)
thrpt: [-0.0554% +1.1838% +2.4363%]
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) high mild
6 (6.00%) high severe
Benchmarking Troughput/512MiB_write
Benchmarking Troughput/512MiB_write: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 19.1s, or reduce sample count to 20.
Benchmarking Troughput/512MiB_write: Collecting 100 samples in estimated 19.137 s (100 iterations)
Benchmarking Troughput/512MiB_write: Analyzing
Troughput/512MiB_write time: [105.70 ms 106.25 ms 107.17 ms]
thrpt: [4.6656 GiB/s 4.7059 GiB/s 4.7305 GiB/s]
change:
time: [-3.4867% -2.8708% -1.9544%] (p = 0.00 < 0.05)
thrpt: [+1.9933% +2.9557% +3.6127%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
Benchmarking Troughput/512MiB_read
Benchmarking Troughput/512MiB_read: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 15.9s, or reduce sample count to 30.
Benchmarking Troughput/512MiB_read: Collecting 100 samples in estimated 15.917 s (100 iterations)
Benchmarking Troughput/512MiB_read: Analyzing
Troughput/512MiB_read time: [74.217 ms 74.329 ms 74.451 ms]
thrpt: [6.7158 GiB/s 6.7268 GiB/s 6.7370 GiB/s]
change:
time: [-6.1588% -5.7873% -5.4313%] (p = 0.00 < 0.05)
thrpt: [+5.7432% +6.1428% +6.5630%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking Troughput/512MiB_create
Benchmarking Troughput/512MiB_create: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 19.0s, or reduce sample count to 20.
Benchmarking Troughput/512MiB_create: Collecting 100 samples in estimated 18.960 s (100 iterations)
Benchmarking Troughput/512MiB_create: Analyzing
Troughput/512MiB_create time: [100.19 ms 102.03 ms 104.11 ms]
thrpt: [4.8028 GiB/s 4.9007 GiB/s 4.9906 GiB/s]
change:
time: [-60.347% -59.363% -58.254%] (p = 0.00 < 0.05)
thrpt: [+139.54% +146.08% +152.19%]
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) high mild
10 (10.00%) high severe
Benchmarking Troughput/512MiB_manual
Benchmarking Troughput/512MiB_manual: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.0s, or reduce sample count to 30.
Benchmarking Troughput/512MiB_manual: Collecting 100 samples in estimated 15.993 s (100 iterations)
Benchmarking Troughput/512MiB_manual: Analyzing
Troughput/512MiB_manual time: [44.600 ms 44.820 ms 45.094 ms]
thrpt: [11.088 GiB/s 11.156 GiB/s 11.211 GiB/s]
change:
time: [-3.9187% -2.6812% -1.4457%] (p = 0.00 < 0.05)
thrpt: [+1.4669% +2.7551% +4.0785%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
7 (7.00%) high mild
5 (5.00%) high severe
Benchmarking Troughput/512MiB_stdlib
Benchmarking Troughput/512MiB_stdlib: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 14.1s, or reduce sample count to 30.
Benchmarking Troughput/512MiB_stdlib: Collecting 100 samples in estimated 14.084 s (100 iterations)
Benchmarking Troughput/512MiB_stdlib: Analyzing
Troughput/512MiB_stdlib time: [28.370 ms 28.449 ms 28.539 ms]
thrpt: [17.520 GiB/s 17.575 GiB/s 17.624 GiB/s]
change:
time: [-4.1694% -3.0445% -2.0873%] (p = 0.00 < 0.05)
thrpt: [+2.1318% +3.1401% +4.3509%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) high mild
6 (6.00%) high severe
```
It does seem like the create cases are over two times faster.
The functions when reading and writing using encase's wrappers can be very hot. #38 shows that these functions are not being inlined as aggressively. This PR attempts to coax the compiler doing so. Resolves #38.
Full benchmark results:
It does seem like the
create
cases are over two times faster.