mgsloan / store

Fast binary serialization in Haskell
MIT License
109 stars 35 forks source link

Changes (Ptr Word8, a) to PeekResult {-# UNPACK #-} (Ptr Word8) !a #98

Closed VyacheslavHashov closed 7 years ago

VyacheslavHashov commented 7 years ago

I am writing encoders/decoders for PostgreSQL binary protocol using store-core and I have found that parsers with strict structures work about 20% faster in my case. Your benchmark shows slight performance gains too.

Current version:

benchmarking decode/ (Vector Int)
time                 498.3 ns   (498.3 ns .. 498.4 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 498.3 ns   (498.3 ns .. 498.4 ns)
std dev              166.8 ps   (135.5 ps .. 223.5 ps)

benchmarking decode/1kb storable (Vector Int32)
time                 52.96 ns   (52.92 ns .. 53.01 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 53.04 ns   (53.01 ns .. 53.09 ns)
std dev              141.4 ps   (110.5 ps .. 188.8 ps)

benchmarking decode/10kb storable (Vector Int32)
time                 282.7 ns   (282.4 ns .. 282.9 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 282.7 ns   (282.5 ns .. 282.9 ns)
std dev              628.2 ps   (502.4 ps .. 807.2 ps)

benchmarking decode/1kb normal (Vector Int32)
time                 1.225 μs   (1.223 μs .. 1.229 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.230 μs   (1.225 μs .. 1.236 μs)
std dev              15.87 ns   (10.02 ns .. 22.36 ns)
variance introduced by outliers: 11% (moderately inflated)

benchmarking decode/10kb normal (Vector Int32)
time                 12.38 μs   (12.36 μs .. 12.43 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 12.39 μs   (12.36 μs .. 12.52 μs)
std dev              165.3 ns   (4.813 ns .. 379.8 ns)

benchmarking decode/ (Vector SmallProduct)
time                 2.349 μs   (2.347 μs .. 2.350 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.350 μs   (2.349 μs .. 2.351 μs)
std dev              4.135 ns   (3.487 ns .. 5.236 ns)

benchmarking decode/ (Vector SmallProductManual)
time                 1.508 μs   (1.507 μs .. 1.509 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.508 μs   (1.507 μs .. 1.510 μs)
std dev              4.189 ns   (2.095 ns .. 8.388 ns)

benchmarking decode/ (Vector SmallSum)
time                 1.629 μs   (1.627 μs .. 1.632 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.632 μs   (1.629 μs .. 1.641 μs)
std dev              17.24 ns   (4.673 ns .. 31.10 ns)

benchmarking decode/ (Vector SmallSumManual)
time                 990.8 ns   (990.2 ns .. 991.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 991.3 ns   (990.6 ns .. 994.1 ns)
std dev              3.658 ns   (1.048 ns .. 7.950 ns)

benchmarking decode/ (Vector ((Int,Int),(Int,Int)))
time                 1.303 μs   (1.302 μs .. 1.304 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.303 μs   (1.302 μs .. 1.303 μs)
std dev              1.769 ns   (1.612 ns .. 2.052 ns)

benchmarking decode/ (Vector SomeData)
time                 2.069 μs   (2.068 μs .. 2.070 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.070 μs   (2.069 μs .. 2.070 μs)
std dev              2.211 ns   (1.636 ns .. 3.016 ns)

With strict custom structure:

benchmarking decode/ (Vector Int)
time                 491.6 ns   (491.5 ns .. 491.6 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 491.6 ns   (491.6 ns .. 491.6 ns)
std dev              73.37 ps   (59.71 ps .. 90.25 ps)

benchmarking decode/1kb storable (Vector Int32)
time                 51.07 ns   (51.07 ns .. 51.09 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 51.14 ns   (51.12 ns .. 51.17 ns)
std dev              96.97 ps   (77.08 ps .. 113.0 ps)

benchmarking decode/10kb storable (Vector Int32)
time                 279.3 ns   (279.0 ns .. 279.5 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 279.0 ns   (278.8 ns .. 279.2 ns)
std dev              725.8 ps   (609.2 ps .. 895.4 ps)

benchmarking decode/1kb normal (Vector Int32)
time                 1.233 μs   (1.228 μs .. 1.243 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.230 μs   (1.228 μs .. 1.236 μs)
std dev              8.810 ns   (208.5 ps .. 20.13 ns)

benchmarking decode/10kb normal (Vector Int32)
time                 12.50 μs   (12.50 μs .. 12.51 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 12.52 μs   (12.52 μs .. 12.52 μs)
std dev              6.185 ns   (4.934 ns .. 8.016 ns)

benchmarking decode/ (Vector SmallProduct)
time                 1.986 μs   (1.978 μs .. 1.990 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.983 μs   (1.979 μs .. 1.986 μs)
std dev              11.67 ns   (9.556 ns .. 15.60 ns)

benchmarking decode/ (Vector SmallProductManual)
time                 1.433 μs   (1.433 μs .. 1.434 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.434 μs   (1.433 μs .. 1.434 μs)
std dev              814.6 ps   (555.6 ps .. 1.341 ns)

benchmarking decode/ (Vector SmallSum)
time                 1.424 μs   (1.423 μs .. 1.424 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.424 μs   (1.424 μs .. 1.424 μs)
std dev              477.9 ps   (386.2 ps .. 575.3 ps)

benchmarking decode/ (Vector SmallSumManual)
time                 832.8 ns   (832.5 ns .. 833.0 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 833.0 ns   (832.9 ns .. 833.2 ns)
std dev              344.6 ps   (172.9 ps .. 655.0 ps)

benchmarking decode/ (Vector ((Int,Int),(Int,Int)))
time                 1.331 μs   (1.331 μs .. 1.332 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.332 μs   (1.332 μs .. 1.333 μs)
std dev              1.451 ns   (1.158 ns .. 1.743 ns)

benchmarking decode/ (Vector SomeData)
time                 1.691 μs   (1.689 μs .. 1.693 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.690 μs   (1.688 μs .. 1.692 μs)
std dev              5.669 ns   (5.147 ns .. 6.221 ns)
mgsloan commented 7 years ago

Makes sense, thanks!