Inline potentially hot functions more aggressively

The functions when reading and writing using encase's wrappers can be very hot. #38 shows that these functions are not being inlined as aggressively. This PR attempts to coax the compiler doing so. Resolves #38.

Full benchmark results:

``` Gnuplot not found, using plotters backend Benchmarking Troughput/16KiB_write Benchmarking Troughput/16KiB_write: Warming up for 3.0000 s Benchmarking Troughput/16KiB_write: Collecting 100 samples in estimated 5.0075 s (1.9M iterations) Benchmarking Troughput/16KiB_write: Analyzing Troughput/16KiB_write time: [1.5729 µs 1.5746 µs 1.5763 µs] thrpt: [9.6799 GiB/s 9.6903 GiB/s 9.7013 GiB/s] change: time: [-8.0639% -7.8170% -7.5969%] (p = 0.00 < 0.05) thrpt: [+8.2215% +8.4799% +8.7713%] Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) low severe 3 (3.00%) high mild Benchmarking Troughput/16KiB_read Benchmarking Troughput/16KiB_read: Warming up for 3.0000 s Benchmarking Troughput/16KiB_read: Collecting 100 samples in estimated 5.0047 s (2.3M iterations) Benchmarking Troughput/16KiB_read: Analyzing Troughput/16KiB_read time: [1.0712 µs 1.0730 µs 1.0748 µs] thrpt: [14.197 GiB/s 14.221 GiB/s 14.244 GiB/s] change: time: [-11.805% -11.044% -10.376%] (p = 0.00 < 0.05) thrpt: [+11.578% +12.415% +13.386%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) low mild Benchmarking Troughput/16KiB_create Benchmarking Troughput/16KiB_create: Warming up for 3.0000 s Benchmarking Troughput/16KiB_create: Collecting 100 samples in estimated 5.0085 s (1.5M iterations) Benchmarking Troughput/16KiB_create: Analyzing Troughput/16KiB_create time: [2.4438 µs 2.4495 µs 2.4545 µs] thrpt: [6.2166 GiB/s 6.2293 GiB/s 6.2439 GiB/s] change: time: [-63.717% -63.466% -63.239%] (p = 0.00 < 0.05) thrpt: [+172.03% +173.72% +175.61%] Performance has improved. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) low severe 4 (4.00%) low mild Benchmarking Troughput/16KiB_manual Benchmarking Troughput/16KiB_manual: Warming up for 3.0000 s Benchmarking Troughput/16KiB_manual: Collecting 100 samples in estimated 5.0078 s (3.0M iterations) Benchmarking Troughput/16KiB_manual: Analyzing Troughput/16KiB_manual time: [356.64 ns 357.51 ns 358.17 ns] thrpt: [42.602 GiB/s 42.681 GiB/s 42.785 GiB/s] change: time: [-2.8350% -1.1837% +0.5988%] (p = 0.19 > 0.05) thrpt: [-0.5953% +1.1979% +2.9177%] No change in performance detected. Benchmarking Troughput/16KiB_stdlib Benchmarking Troughput/16KiB_stdlib: Warming up for 3.0000 s Benchmarking Troughput/16KiB_stdlib: Collecting 100 samples in estimated 5.0018 s (3.1M iterations) Benchmarking Troughput/16KiB_stdlib: Analyzing Troughput/16KiB_stdlib time: [379.93 ns 381.21 ns 382.13 ns] thrpt: [39.931 GiB/s 40.028 GiB/s 40.163 GiB/s] change: time: [-2.7601% -0.1978% +2.5277%] (p = 0.89 > 0.05) thrpt: [-2.4654% +0.1982% +2.8384%] No change in performance detected. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild Benchmarking Troughput/128KiB_write Benchmarking Troughput/128KiB_write: Warming up for 3.0000 s Benchmarking Troughput/128KiB_write: Collecting 100 samples in estimated 5.0964 s (252k iterations) Benchmarking Troughput/128KiB_write: Analyzing Troughput/128KiB_write time: [12.469 µs 12.475 µs 12.482 µs] thrpt: [9.7800 GiB/s 9.7854 GiB/s 9.7902 GiB/s] change: time: [-8.5845% -8.4567% -8.3377%] (p = 0.00 < 0.05) thrpt: [+9.0960% +9.2379% +9.3906%] Performance has improved. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low mild 2 (2.00%) high mild 2 (2.00%) high severe Benchmarking Troughput/128KiB_read Benchmarking Troughput/128KiB_read: Warming up for 3.0000 s Benchmarking Troughput/128KiB_read: Collecting 100 samples in estimated 5.0326 s (323k iterations) Benchmarking Troughput/128KiB_read: Analyzing Troughput/128KiB_read time: [8.4088 µs 8.4427 µs 8.4746 µs] thrpt: [14.404 GiB/s 14.459 GiB/s 14.517 GiB/s] change: time: [-14.693% -12.591% -10.795%] (p = 0.00 < 0.05) thrpt: [+12.102% +14.404% +17.223%] Performance has improved. Benchmarking Troughput/128KiB_create Benchmarking Troughput/128KiB_create: Warming up for 3.0000 s Benchmarking Troughput/128KiB_create: Collecting 100 samples in estimated 5.0483 s (207k iterations) Benchmarking Troughput/128KiB_create: Analyzing Troughput/128KiB_create time: [16.998 µs 17.016 µs 17.034 µs] thrpt: [7.1665 GiB/s 7.1740 GiB/s 7.1815 GiB/s] change: time: [-67.741% -67.671% -67.614%] (p = 0.00 < 0.05) thrpt: [+208.77% +209.32% +209.99%] Performance has improved. Found 16 outliers among 100 measurements (16.00%) 12 (12.00%) low mild 4 (4.00%) high mild Benchmarking Troughput/128KiB_manual Benchmarking Troughput/128KiB_manual: Warming up for 3.0000 s Benchmarking Troughput/128KiB_manual: Collecting 100 samples in estimated 5.0282 s (439k iterations) Benchmarking Troughput/128KiB_manual: Analyzing Troughput/128KiB_manual time: [2.6861 µs 2.6882 µs 2.6903 µs] thrpt: [45.374 GiB/s 45.410 GiB/s 45.446 GiB/s] change: time: [-0.9956% +0.0743% +1.1614%] (p = 0.89 > 0.05) thrpt: [-1.1481% -0.0743% +1.0056%] No change in performance detected. Found 22 outliers among 100 measurements (22.00%) 9 (9.00%) low severe 13 (13.00%) low mild Benchmarking Troughput/128KiB_stdlib Benchmarking Troughput/128KiB_stdlib: Warming up for 3.0000 s Benchmarking Troughput/128KiB_stdlib: Collecting 100 samples in estimated 5.0218 s (303k iterations) Benchmarking Troughput/128KiB_stdlib: Analyzing Troughput/128KiB_stdlib time: [2.9594 µs 2.9727 µs 2.9827 µs] thrpt: [40.926 GiB/s 41.064 GiB/s 41.248 GiB/s] change: time: [-3.8721% -1.7726% +0.2982%] (p = 0.10 > 0.05) thrpt: [-0.2973% +1.8046% +4.0281%] No change in performance detected. Benchmarking Troughput/1MiB_write Benchmarking Troughput/1MiB_write: Warming up for 3.0000 s Benchmarking Troughput/1MiB_write: Collecting 100 samples in estimated 5.2833 s (30k iterations) Benchmarking Troughput/1MiB_write: Analyzing Troughput/1MiB_write time: [116.57 µs 116.63 µs 116.70 µs] thrpt: [8.3683 GiB/s 8.3731 GiB/s 8.3774 GiB/s] change: time: [-12.534% -12.325% -12.157%] (p = 0.00 < 0.05) thrpt: [+13.840% +14.058% +14.330%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe Benchmarking Troughput/1MiB_read Benchmarking Troughput/1MiB_read: Warming up for 3.0000 s Benchmarking Troughput/1MiB_read: Collecting 100 samples in estimated 5.2757 s (35k iterations) Benchmarking Troughput/1MiB_read: Analyzing Troughput/1MiB_read time: [94.258 µs 94.435 µs 94.643 µs] thrpt: [10.318 GiB/s 10.341 GiB/s 10.360 GiB/s] change: time: [-6.8999% -6.5813% -6.2845%] (p = 0.00 < 0.05) thrpt: [+6.7059% +7.0450% +7.4113%] Performance has improved. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe Benchmarking Troughput/1MiB_create Benchmarking Troughput/1MiB_create: Warming up for 3.0000 s Benchmarking Troughput/1MiB_create: Collecting 100 samples in estimated 5.1894 s (25k iterations) Benchmarking Troughput/1MiB_create: Analyzing Troughput/1MiB_create time: [154.30 µs 154.51 µs 154.76 µs] thrpt: [6.3103 GiB/s 6.3202 GiB/s 6.3291 GiB/s] change: time: [-64.810% -64.757% -64.706%] (p = 0.00 < 0.05) thrpt: [+183.33% +183.74% +184.17%] Performance has improved. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low mild 2 (2.00%) high mild 2 (2.00%) high severe Benchmarking Troughput/1MiB_manual Benchmarking Troughput/1MiB_manual: Warming up for 3.0000 s Benchmarking Troughput/1MiB_manual: Collecting 100 samples in estimated 5.4942 s (40k iterations) Benchmarking Troughput/1MiB_manual: Analyzing Troughput/1MiB_manual time: [55.898 µs 56.160 µs 56.522 µs] thrpt: [17.278 GiB/s 17.389 GiB/s 17.470 GiB/s] change: time: [-4.0751% -3.1067% -2.2785%] (p = 0.00 < 0.05) thrpt: [+2.3316% +3.2063% +4.2482%] Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe Benchmarking Troughput/1MiB_stdlib Benchmarking Troughput/1MiB_stdlib: Warming up for 3.0000 s Benchmarking Troughput/1MiB_stdlib: Collecting 100 samples in estimated 5.3567 s (35k iterations) Benchmarking Troughput/1MiB_stdlib: Analyzing Troughput/1MiB_stdlib time: [78.908 µs 79.002 µs 79.100 µs] thrpt: [12.346 GiB/s 12.361 GiB/s 12.376 GiB/s] change: time: [+8.2706% +8.7065% +9.0880%] (p = 0.00 < 0.05) thrpt: [-8.3309% -8.0092% -7.6388%] Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe Benchmarking Troughput/16MiB_write Benchmarking Troughput/16MiB_write: Warming up for 3.0000 s Benchmarking Troughput/16MiB_write: Collecting 100 samples in estimated 5.1046 s (1700 iterations) Benchmarking Troughput/16MiB_write: Analyzing Troughput/16MiB_write time: [1.9566 ms 1.9629 ms 1.9704 ms] thrpt: [7.9299 GiB/s 7.9602 GiB/s 7.9857 GiB/s] change: time: [-9.5856% -9.1602% -8.7316%] (p = 0.00 < 0.05) thrpt: [+9.5670% +10.084% +10.602%] Performance has improved. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) high mild 9 (9.00%) high severe Benchmarking Troughput/16MiB_read Benchmarking Troughput/16MiB_read: Warming up for 3.0000 s Benchmarking Troughput/16MiB_read: Collecting 100 samples in estimated 5.0517 s (2000 iterations) Benchmarking Troughput/16MiB_read: Analyzing Troughput/16MiB_read time: [1.5998 ms 1.6033 ms 1.6074 ms] thrpt: [9.7208 GiB/s 9.7455 GiB/s 9.7666 GiB/s] change: time: [-5.7922% -4.1732% -3.0456%] (p = 0.00 < 0.05) thrpt: [+3.1412% +4.3549% +6.1484%] Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe Benchmarking Troughput/16MiB_create Benchmarking Troughput/16MiB_create: Warming up for 3.0000 s Benchmarking Troughput/16MiB_create: Collecting 100 samples in estimated 5.1876 s (1500 iterations) Benchmarking Troughput/16MiB_create: Analyzing Troughput/16MiB_create time: [2.5976 ms 2.6039 ms 2.6114 ms] thrpt: [5.9833 GiB/s 6.0006 GiB/s 6.0151 GiB/s] change: time: [-63.189% -63.099% -62.987%] (p = 0.00 < 0.05) thrpt: [+170.17% +170.99% +171.66%] Performance has improved. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe Benchmarking Troughput/16MiB_manual Benchmarking Troughput/16MiB_manual: Warming up for 3.0000 s Benchmarking Troughput/16MiB_manual: Collecting 100 samples in estimated 5.0462 s (2100 iterations) Benchmarking Troughput/16MiB_manual: Analyzing Troughput/16MiB_manual time: [1.1353 ms 1.1425 ms 1.1508 ms] thrpt: [13.577 GiB/s 13.677 GiB/s 13.763 GiB/s] change: time: [+15.028% +16.322% +17.589%] (p = 0.00 < 0.05) thrpt: [-14.958% -14.032% -13.065%] Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe Benchmarking Troughput/16MiB_stdlib Benchmarking Troughput/16MiB_stdlib: Warming up for 3.0000 s Benchmarking Troughput/16MiB_stdlib: Collecting 100 samples in estimated 5.0448 s (1900 iterations) Benchmarking Troughput/16MiB_stdlib: Analyzing Troughput/16MiB_stdlib time: [1.3901 ms 1.3965 ms 1.4041 ms] thrpt: [11.128 GiB/s 11.189 GiB/s 11.240 GiB/s] change: time: [-2.3783% -1.1699% +0.0554%] (p = 0.06 > 0.05) thrpt: [-0.0554% +1.1838% +2.4363%] No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe Benchmarking Troughput/512MiB_write Benchmarking Troughput/512MiB_write: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 19.1s, or reduce sample count to 20. Benchmarking Troughput/512MiB_write: Collecting 100 samples in estimated 19.137 s (100 iterations) Benchmarking Troughput/512MiB_write: Analyzing Troughput/512MiB_write time: [105.70 ms 106.25 ms 107.17 ms] thrpt: [4.6656 GiB/s 4.7059 GiB/s 4.7305 GiB/s] change: time: [-3.4867% -2.8708% -1.9544%] (p = 0.00 < 0.05) thrpt: [+1.9933% +2.9557% +3.6127%] Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe Benchmarking Troughput/512MiB_read Benchmarking Troughput/512MiB_read: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 15.9s, or reduce sample count to 30. Benchmarking Troughput/512MiB_read: Collecting 100 samples in estimated 15.917 s (100 iterations) Benchmarking Troughput/512MiB_read: Analyzing Troughput/512MiB_read time: [74.217 ms 74.329 ms 74.451 ms] thrpt: [6.7158 GiB/s 6.7268 GiB/s 6.7370 GiB/s] change: time: [-6.1588% -5.7873% -5.4313%] (p = 0.00 < 0.05) thrpt: [+5.7432% +6.1428% +6.5630%] Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe Benchmarking Troughput/512MiB_create Benchmarking Troughput/512MiB_create: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 19.0s, or reduce sample count to 20. Benchmarking Troughput/512MiB_create: Collecting 100 samples in estimated 18.960 s (100 iterations) Benchmarking Troughput/512MiB_create: Analyzing Troughput/512MiB_create time: [100.19 ms 102.03 ms 104.11 ms] thrpt: [4.8028 GiB/s 4.9007 GiB/s 4.9906 GiB/s] change: time: [-60.347% -59.363% -58.254%] (p = 0.00 < 0.05) thrpt: [+139.54% +146.08% +152.19%] Performance has improved. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) high mild 10 (10.00%) high severe Benchmarking Troughput/512MiB_manual Benchmarking Troughput/512MiB_manual: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.0s, or reduce sample count to 30. Benchmarking Troughput/512MiB_manual: Collecting 100 samples in estimated 15.993 s (100 iterations) Benchmarking Troughput/512MiB_manual: Analyzing Troughput/512MiB_manual time: [44.600 ms 44.820 ms 45.094 ms] thrpt: [11.088 GiB/s 11.156 GiB/s 11.211 GiB/s] change: time: [-3.9187% -2.6812% -1.4457%] (p = 0.00 < 0.05) thrpt: [+1.4669% +2.7551% +4.0785%] Performance has improved. Found 12 outliers among 100 measurements (12.00%) 7 (7.00%) high mild 5 (5.00%) high severe Benchmarking Troughput/512MiB_stdlib Benchmarking Troughput/512MiB_stdlib: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 14.1s, or reduce sample count to 30. Benchmarking Troughput/512MiB_stdlib: Collecting 100 samples in estimated 14.084 s (100 iterations) Benchmarking Troughput/512MiB_stdlib: Analyzing Troughput/512MiB_stdlib time: [28.370 ms 28.449 ms 28.539 ms] thrpt: [17.520 GiB/s 17.575 GiB/s 17.624 GiB/s] change: time: [-4.1694% -3.0445% -2.0873%] (p = 0.00 < 0.05) thrpt: [+2.1318% +3.1401% +4.3509%] Performance has improved. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe ```

It does seem like the create cases are over two times faster.

teoxoy / encase

Inline potentially hot functions more aggressively #39