Use specific builder functions for encoding.

qnikst commented 4 years ago

Instead of keeping internal function for decoding we can reuse much more efficient builders from the bytestring library. It increases speed and improves memory usage and allows more code sharing between the libraries.

Specialization pragmas were removed, they didn't provide much benefit anyway as the code is recursive and optimizer does not inline it. However internal functions were not removed so the user of cassava can use them for semi-efficient numeric types encoding in case if there are no efficient builders available directly for those types.

tibbe commented 4 years ago

Please add some performance numbers to the pull request.

qnikst commented 4 years ago

before the patch:

./Primitives
benchmarking toField/int8
time                 160.4 ns   (159.7 ns .. 161.6 ns)
                     0.999 R²   (0.996 R² .. 1.000 R²)
mean                 162.5 ns   (160.9 ns .. 167.4 ns)
std dev              8.235 ns   (2.792 ns .. 16.53 ns)
variance introduced by outliers: 71% (severely inflated)

benchmarking toField/int16
time                 216.2 ns   (213.2 ns .. 219.5 ns)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 214.2 ns   (213.2 ns .. 215.6 ns)
std dev              3.857 ns   (2.628 ns .. 6.190 ns)
variance introduced by outliers: 22% (moderately inflated)

benchmarking toField/int32
time                 356.3 ns   (354.8 ns .. 358.8 ns)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 363.8 ns   (359.5 ns .. 371.9 ns)
std dev              19.08 ns   (10.28 ns .. 34.14 ns)
variance introduced by outliers: 70% (severely inflated)

benchmarking toField/int64
time                 587.2 ns   (584.2 ns .. 591.0 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 585.3 ns   (582.4 ns .. 589.1 ns)
std dev              10.65 ns   (8.311 ns .. 13.65 ns)
variance introduced by outliers: 21% (moderately inflated)

benchmarking toField/word8
time                 160.7 ns   (160.0 ns .. 161.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 158.7 ns   (157.6 ns .. 159.8 ns)
std dev              3.762 ns   (3.226 ns .. 4.635 ns)
variance introduced by outliers: 34% (moderately inflated)

benchmarking toField/word16
time                 203.6 ns   (202.5 ns .. 204.8 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 202.8 ns   (202.0 ns .. 204.1 ns)
std dev              3.682 ns   (2.742 ns .. 5.898 ns)
variance introduced by outliers: 23% (moderately inflated)

benchmarking toField/word32
time                 341.9 ns   (337.1 ns .. 348.3 ns)
                     0.998 R²   (0.996 R² .. 1.000 R²)
mean                 337.4 ns   (334.7 ns .. 342.0 ns)
std dev              11.82 ns   (7.062 ns .. 20.90 ns)
variance introduced by outliers: 51% (severely inflated)

benchmarking toField/word64
time                 577.0 ns   (570.7 ns .. 584.1 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 573.5 ns   (570.2 ns .. 577.6 ns)
std dev              12.44 ns   (8.991 ns .. 17.87 ns)
variance introduced by outliers: 27% (moderately inflated)

benchmarking toField/float
time                 1.000 μs   (994.8 ns .. 1.008 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 999.1 ns   (995.7 ns .. 1.003 μs)
std dev              12.16 ns   (9.843 ns .. 16.90 ns)
variance introduced by outliers: 10% (moderately inflated)

benchmarking toField/double
time                 1.072 μs   (1.068 μs .. 1.078 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.076 μs   (1.072 μs .. 1.081 μs)
std dev              16.04 ns   (12.37 ns .. 24.26 ns)
variance introduced by outliers: 14% (moderately inflated)

After the patch:

benchmarking toField/int8
time                 110.4 ns   (109.4 ns .. 111.3 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 110.0 ns   (109.5 ns .. 110.7 ns)
std dev              2.123 ns   (1.675 ns .. 2.956 ns)
variance introduced by outliers: 26% (moderately inflated)

benchmarking toField/int16
time                 119.3 ns   (116.6 ns .. 124.0 ns)
                     0.992 R²   (0.987 R² .. 0.996 R²)
mean                 119.7 ns   (117.0 ns .. 123.1 ns)
std dev              10.61 ns   (7.894 ns .. 13.13 ns)
variance introduced by outliers: 88% (severely inflated)

benchmarking toField/int32
time                 120.5 ns   (119.9 ns .. 121.1 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 120.3 ns   (119.7 ns .. 121.0 ns)
std dev              2.147 ns   (1.741 ns .. 2.653 ns)
variance introduced by outliers: 23% (moderately inflated)

benchmarking toField/int64
time                 139.2 ns   (137.4 ns .. 142.1 ns)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 138.6 ns   (137.7 ns .. 140.2 ns)
std dev              4.053 ns   (2.336 ns .. 6.299 ns)
variance introduced by outliers: 44% (moderately inflated)

benchmarking toField/word8
time                 109.2 ns   (108.7 ns .. 109.6 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 108.9 ns   (108.5 ns .. 110.0 ns)
std dev              2.172 ns   (1.238 ns .. 3.950 ns)
variance introduced by outliers: 27% (moderately inflated)

benchmarking toField/word16
time                 117.2 ns   (115.5 ns .. 119.8 ns)
                     0.996 R²   (0.993 R² .. 0.998 R²)
mean                 120.7 ns   (118.3 ns .. 124.3 ns)
std dev              9.617 ns   (7.068 ns .. 13.47 ns)
variance introduced by outliers: 86% (severely inflated)

benchmarking toField/word32
time                 117.8 ns   (116.9 ns .. 118.9 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 117.8 ns   (117.2 ns .. 118.6 ns)
std dev              2.343 ns   (1.847 ns .. 3.179 ns)
variance introduced by outliers: 27% (moderately inflated)

benchmarking toField/word64
time                 135.8 ns   (135.1 ns .. 136.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 137.0 ns   (136.3 ns .. 137.7 ns)
std dev              2.329 ns   (1.909 ns .. 2.920 ns)
variance introduced by outliers: 21% (moderately inflated)

benchmarking toField/float
time                 1.143 μs   (1.135 μs .. 1.155 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 1.152 μs   (1.145 μs .. 1.160 μs)
std dev              24.15 ns   (18.47 ns .. 31.17 ns)
variance introduced by outliers: 25% (moderately inflated)

benchmarking toField/double
time                 1.221 μs   (1.208 μs .. 1.242 μs)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 1.213 μs   (1.206 μs .. 1.224 μs)
std dev              29.44 ns   (18.03 ns .. 51.01 ns)
variance introduced by outliers: 31% (moderately inflated)

should I include that in the patch content itself?

Actually for float, and double it gives worse result. So I've exclude it from the patch for now. I'll investigate and see if builders in the bytestring should be improved.

tibbe commented 4 years ago

I'll let @hvr decide. For me just having the numbers somewhere is good (in particular to make sure someone looked at them).

andreasabel commented 2 years ago

@qnikst

I'll investigate and see if builders in the bytestring should be improved.

What was the conclusion of this investigation?

I think it would make sense to add QuickCheck tests that demonstrate that the new implementation matches the behavior of the old. Do you think you could add such tests?

qnikst commented 2 years ago

What was the conclusion of this investigation?

I think I didn't come with a concrete solution, but I can't recall since long time passed.

I think it would make sense to add QuickCheck tests that demonstrate that the new implementation matches the behavior of the old. Do you think you could add such tests?

Yes! I'll do will make them on the weekend in the worst case.

andreasabel commented 2 years ago

@qnikst : QuickCheck tests would be great here!

haskell-hvr / cassava

Use specific builder functions for encoding. #187