reactivemarkets / toolbox-cpp

The Reactive C++ Toolbox is an open source library of C++20 components designed for efficient, asynchronous network applications on the Linux platform.
https://reactivemarkets.github.io/toolbox-cpp/
Other
21 stars 20 forks source link

Improve dec_digits performance and benchmarking methodology #250

Closed bnbajwa closed 5 days ago

bnbajwa commented 6 days ago

Note to reviewers: This pull request consists of 2 commits, so view separately for easier reviewing.

The first commit changes how dec_digits is benchmarked (i.e. using randomly generated numbers).

In this commit, you may notice K=7 for the benchmarks. This means that 1 to 7 digit random numbers will be used for benchmarking i.e. [1, 10^7). The reasoning for that is because I think very large numbers (>7 digits) are much less common in production.

BEFORE commit (baseline results):

NAME                                                   COUNT       MIN       %50       %95       %99     %99.9    %99.99
------------------------------------------------------------------------------------------------------------------------
dec_digits_lib                                    3516840000         0         0         0         0         0         0
dec_digits_div                                    1205850000     0.002     0.002     0.002     0.003     0.004     0.005
dec_digits_math                                    152790000     0.018     0.019      0.02     0.021     0.027     0.034
dec_digits_stdio                                    90960000     0.028     0.032     0.033     0.034     0.046     0.062
dec_digits_less                                   3694740000         0         0         0         0         0         0

Note that the results indicate dec_digits_less is faster than than the dec_digits_lib by 1.05x (i.e. 5%)

AFTER commit (K=7):

NAME                                                   COUNT       MIN       %50       %95       %99     %99.9    %99.99
------------------------------------------------------------------------------------------------------------------------
dec_digits_lib                                     478000000     0.005     0.006     0.006     0.006     0.007     0.007
dec_digits_div                                     266000000     0.010     0.011     0.011     0.012     0.012     0.012
dec_digits_math                                    170000000     0.017     0.017     0.017     0.018     0.018     0.018
dec_digits_stdio                                    73000000     0.040     0.041     0.041     0.043     0.043     0.043
dec_digits_less                                    367000000     0.008     0.008     0.008     0.008     0.009     0.009

Now, the benchmark shows dec_digits_lib is faster than dec_digits_less by 1.3x when dealing with integers up to 10^7.

AFTER commit (K=19).

Adding K=19 benchmark results here if anyone is curious how dec_digits performs when dealing with integers of all digit sizes possible (for int64).

NAME                                                   COUNT       MIN       %50       %95       %99     %99.9    %99.99
------------------------------------------------------------------------------------------------------------------------
dec_digits_lib                                     291000000     0.010     0.010     0.010     0.010     0.010     0.010
dec_digits_div                                     137000000     0.020     0.021     0.023     0.024     0.024     0.024
dec_digits_math                                    150000000     0.018     0.019     0.022     0.028     0.032     0.032
dec_digits_stdio                                    60000000     0.049     0.050     0.052     0.055     0.055     0.055
dec_digits_less                                    322000000     0.008     0.009     0.009     0.009     0.010     0.010

But when dealing all integers up to 10^19, dec_digits_less takes the lead again -- 10% faster than it's rival.

The second commit provides a faster implementation of dec_digits along with support for negative integers.

Results with new implementation (K=7):

NAME                                                   COUNT       MIN       %50       %95       %99     %99.9    %99.99
------------------------------------------------------------------------------------------------------------------------
dec_digits_lib_unsigned                           4258000000         0         0         0         0         0         0
dec_digits_lib_signed                             3370000000         0         0         0         0         0         0
dec_digits_branch                                  441000000     0.006     0.006     0.006     0.006     0.007     0.007
dec_digits_div                                     257000000     0.011     0.011     0.011     0.012     0.012     0.012
dec_digits_math                                    164000000     0.018     0.018     0.018     0.018     0.018     0.018
dec_digits_stdio                                    70000000     0.042     0.043     0.044     0.045     0.045     0.045
dec_digits_less                                    386000000     0.007     0.007     0.007     0.008     0.008     0.008

The previous dec_digits_lib is now dec_digits_branch.

New dec_digits unsigned overload is about 9.6x faster than previous. And the signed overload is about 7.6x faster.

Results with new implementation (K=19):

NAME                                                   COUNT       MIN       %50       %95       %99     %99.9    %99.99
------------------------------------------------------------------------------------------------------------------------
dec_digits_lib_unsigned                           4286000000         0         0         0         0         0         0
dec_digits_lib_signed                             3364000000         0         0         0         0         0         0
dec_digits_branch                                  263000000      0.01     0.011     0.011     0.012     0.013     0.013
dec_digits_div                                     130000000     0.022     0.023     0.023     0.023     0.024     0.024
dec_digits_math                                    150000000     0.018      0.02      0.02      0.02      0.02      0.02
dec_digits_stdio                                    57000000     0.052     0.053     0.055     0.055     0.055     0.055
dec_digits_less                                    308000000     0.009     0.009      0.01      0.01      0.01      0.01

Unsigned overload about 16x faster. Signed overload about 12.8x faster