Open qnikst opened 4 years ago
Please add some performance numbers to the pull request.
before the patch:
./Primitives
benchmarking toField/int8
time 160.4 ns (159.7 ns .. 161.6 ns)
0.999 R² (0.996 R² .. 1.000 R²)
mean 162.5 ns (160.9 ns .. 167.4 ns)
std dev 8.235 ns (2.792 ns .. 16.53 ns)
variance introduced by outliers: 71% (severely inflated)
benchmarking toField/int16
time 216.2 ns (213.2 ns .. 219.5 ns)
0.999 R² (0.998 R² .. 1.000 R²)
mean 214.2 ns (213.2 ns .. 215.6 ns)
std dev 3.857 ns (2.628 ns .. 6.190 ns)
variance introduced by outliers: 22% (moderately inflated)
benchmarking toField/int32
time 356.3 ns (354.8 ns .. 358.8 ns)
0.999 R² (0.998 R² .. 1.000 R²)
mean 363.8 ns (359.5 ns .. 371.9 ns)
std dev 19.08 ns (10.28 ns .. 34.14 ns)
variance introduced by outliers: 70% (severely inflated)
benchmarking toField/int64
time 587.2 ns (584.2 ns .. 591.0 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 585.3 ns (582.4 ns .. 589.1 ns)
std dev 10.65 ns (8.311 ns .. 13.65 ns)
variance introduced by outliers: 21% (moderately inflated)
benchmarking toField/word8
time 160.7 ns (160.0 ns .. 161.1 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 158.7 ns (157.6 ns .. 159.8 ns)
std dev 3.762 ns (3.226 ns .. 4.635 ns)
variance introduced by outliers: 34% (moderately inflated)
benchmarking toField/word16
time 203.6 ns (202.5 ns .. 204.8 ns)
1.000 R² (0.999 R² .. 1.000 R²)
mean 202.8 ns (202.0 ns .. 204.1 ns)
std dev 3.682 ns (2.742 ns .. 5.898 ns)
variance introduced by outliers: 23% (moderately inflated)
benchmarking toField/word32
time 341.9 ns (337.1 ns .. 348.3 ns)
0.998 R² (0.996 R² .. 1.000 R²)
mean 337.4 ns (334.7 ns .. 342.0 ns)
std dev 11.82 ns (7.062 ns .. 20.90 ns)
variance introduced by outliers: 51% (severely inflated)
benchmarking toField/word64
time 577.0 ns (570.7 ns .. 584.1 ns)
0.999 R² (0.999 R² .. 1.000 R²)
mean 573.5 ns (570.2 ns .. 577.6 ns)
std dev 12.44 ns (8.991 ns .. 17.87 ns)
variance introduced by outliers: 27% (moderately inflated)
benchmarking toField/float
time 1.000 μs (994.8 ns .. 1.008 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 999.1 ns (995.7 ns .. 1.003 μs)
std dev 12.16 ns (9.843 ns .. 16.90 ns)
variance introduced by outliers: 10% (moderately inflated)
benchmarking toField/double
time 1.072 μs (1.068 μs .. 1.078 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.076 μs (1.072 μs .. 1.081 μs)
std dev 16.04 ns (12.37 ns .. 24.26 ns)
variance introduced by outliers: 14% (moderately inflated)
After the patch:
benchmarking toField/int8
time 110.4 ns (109.4 ns .. 111.3 ns)
1.000 R² (0.999 R² .. 1.000 R²)
mean 110.0 ns (109.5 ns .. 110.7 ns)
std dev 2.123 ns (1.675 ns .. 2.956 ns)
variance introduced by outliers: 26% (moderately inflated)
benchmarking toField/int16
time 119.3 ns (116.6 ns .. 124.0 ns)
0.992 R² (0.987 R² .. 0.996 R²)
mean 119.7 ns (117.0 ns .. 123.1 ns)
std dev 10.61 ns (7.894 ns .. 13.13 ns)
variance introduced by outliers: 88% (severely inflated)
benchmarking toField/int32
time 120.5 ns (119.9 ns .. 121.1 ns)
1.000 R² (0.999 R² .. 1.000 R²)
mean 120.3 ns (119.7 ns .. 121.0 ns)
std dev 2.147 ns (1.741 ns .. 2.653 ns)
variance introduced by outliers: 23% (moderately inflated)
benchmarking toField/int64
time 139.2 ns (137.4 ns .. 142.1 ns)
0.999 R² (0.998 R² .. 1.000 R²)
mean 138.6 ns (137.7 ns .. 140.2 ns)
std dev 4.053 ns (2.336 ns .. 6.299 ns)
variance introduced by outliers: 44% (moderately inflated)
benchmarking toField/word8
time 109.2 ns (108.7 ns .. 109.6 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 108.9 ns (108.5 ns .. 110.0 ns)
std dev 2.172 ns (1.238 ns .. 3.950 ns)
variance introduced by outliers: 27% (moderately inflated)
benchmarking toField/word16
time 117.2 ns (115.5 ns .. 119.8 ns)
0.996 R² (0.993 R² .. 0.998 R²)
mean 120.7 ns (118.3 ns .. 124.3 ns)
std dev 9.617 ns (7.068 ns .. 13.47 ns)
variance introduced by outliers: 86% (severely inflated)
benchmarking toField/word32
time 117.8 ns (116.9 ns .. 118.9 ns)
0.999 R² (0.999 R² .. 1.000 R²)
mean 117.8 ns (117.2 ns .. 118.6 ns)
std dev 2.343 ns (1.847 ns .. 3.179 ns)
variance introduced by outliers: 27% (moderately inflated)
benchmarking toField/word64
time 135.8 ns (135.1 ns .. 136.8 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 137.0 ns (136.3 ns .. 137.7 ns)
std dev 2.329 ns (1.909 ns .. 2.920 ns)
variance introduced by outliers: 21% (moderately inflated)
benchmarking toField/float
time 1.143 μs (1.135 μs .. 1.155 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 1.152 μs (1.145 μs .. 1.160 μs)
std dev 24.15 ns (18.47 ns .. 31.17 ns)
variance introduced by outliers: 25% (moderately inflated)
benchmarking toField/double
time 1.221 μs (1.208 μs .. 1.242 μs)
0.999 R² (0.997 R² .. 1.000 R²)
mean 1.213 μs (1.206 μs .. 1.224 μs)
std dev 29.44 ns (18.03 ns .. 51.01 ns)
variance introduced by outliers: 31% (moderately inflated)
should I include that in the patch content itself?
Actually for float, and double it gives worse result. So I've exclude it from the patch for now.
I'll investigate and see if builders in the bytestring
should be improved.
I'll let @hvr decide. For me just having the numbers somewhere is good (in particular to make sure someone looked at them).
@qnikst
I'll investigate and see if builders in the
bytestring
should be improved.
What was the conclusion of this investigation?
I think it would make sense to add QuickCheck tests that demonstrate that the new implementation matches the behavior of the old. Do you think you could add such tests?
What was the conclusion of this investigation?
I think I didn't come with a concrete solution, but I can't recall since long time passed.
I think it would make sense to add QuickCheck tests that demonstrate that the new implementation matches the behavior of the old. Do you think you could add such tests?
Yes! I'll do will make them on the weekend in the worst case.
@qnikst : QuickCheck tests would be great here!
Instead of keeping internal function for decoding we can reuse much more efficient builders from the bytestring library. It increases speed and improves memory usage and allows more code sharing between the libraries.
Specialization pragmas were removed, they didn't provide much benefit anyway as the code is recursive and optimizer does not inline it. However internal functions were not removed so the user of cassava can use them for semi-efficient numeric types encoding in case if there are no efficient builders available directly for those types.