Hi, the following patchset replaces some computation by lookup tables and improve the performance a bit more. It's using ghc primitive directly but the portability line is GHC, so hopefully this is not a problem. if it's a problem, it might be possible to use bytestring overloadStrings and unsafeIndexing to almost the same performance increase (but consistently slower on some other benchmarks of mine.)
benchmarks below are means/lb/ub before and after optimization on my machine.
Hi, the following patchset replaces some computation by lookup tables and improve the performance a bit more. It's using ghc primitive directly but the portability line is GHC, so hopefully this is not a problem. if it's a problem, it might be possible to use bytestring overloadStrings and unsafeIndexing to almost the same performance increase (but consistently slower on some other benchmarks of mine.)
benchmarks below are means/lb/ub before and after optimization on my machine.
benchmarking encode/8 mean: 100.5066 ns, lb 100.4542 ns, ub 100.6346 ns, ci 0.950 mean: 89.37258 ns, lb 89.34455 ns, ub 89.42334 ns, ci 0.950
benchmarking encode/32 mean: 292.5857 ns, lb 292.5121 ns, ub 292.7101 ns, ci 0.950 mean: 254.6374 ns, lb 249.8842 ns, ub 261.5605 ns, ci 0.950
benchmarking encode/128 mean: 1.038159 us, lb 1.037985 us, ub 1.038537 us, ci 0.950 mean: 850.7781 ns, lb 850.6118 ns, ub 851.1164 ns, ci 0.950
benchmarking encode/1024 mean: 7.768683 us, lb 7.766970 us, ub 7.772629 us, ci 0.950 mean: 6.838571 us, lb 6.693303 us, ub 7.040833 us, ci 0.950
benchmarking encode/65536 mean: 511.2344 us, lb 510.7413 us, ub 511.9317 us, ci 0.950 mean: 426.9267 us, lb 426.2879 us, ub 427.5551 us, ci 0.950
benchmarking decode/8 mean: 491.4554 ns, lb 491.3104 ns, ub 491.7289 ns, ci 0.950 mean: 420.7300 ns, lb 419.1382 ns, ub 427.0422 ns, ci 0.950
benchmarking decode/32 mean: 1.146035 us, lb 1.145754 us, ub 1.146530 us, ci 0.950 mean: 815.8817 ns, lb 815.1795 ns, ub 817.5643 ns, ci 0.950
benchmarking decode/128 mean: 3.738534 us, lb 3.737629 us, ub 3.740522 us, ci 0.950 mean: 2.379186 us, lb 2.376243 us, ub 2.382636 us, ci 0.950
benchmarking decode/1024 mean: 29.88323 us, lb 29.87664 us, ub 29.89754 us, ci 0.950 mean: 16.65282 us, lb 16.65004 us, ub 16.65840 us, ci 0.950
benchmarking decode/65536 mean: 1.915516 ms, lb 1.914163 ms, ub 1.919997 ms, ci 0.950 mean: 1.056637 ms, lb 1.055106 ms, ub 1.057994 ms, ci 0.950