Open Bodigrim opened 4 years ago
For what it's worth, I've been doing matching perf shoot-outs between text
and text-ascii
(which uses bytestring
under the hood). Originally, text-ascii
used NSN, but this turned out to be severely non-competitive. However, having rewritten it based on bytestring
's Word8
finding implementation (basically a heuristic optimization of the naive algorithm, suggested by Crochemore et al), I got much better results, even at -O0:
All
Matching
Match, Text: OK (0.15s)
5.8 ms ± 434 μs, 0 B allocated, 0 B copied
Match, AsciiText: OK (0.77s)
402 μs ± 17 μs, 619 KB allocated, 199 B copied
Match 1, Text: OK (0.39s)
14 ms ± 1.2 ms, 0 B allocated, 0 B copied
Match 1, AsciiText: OK (0.72s)
3.1 ms ± 64 μs, 0 B allocated, 0 B copied
No match, Text: OK (0.15s)
5.1 ms ± 381 μs, 0 B allocated, 0 B copied
No match, AsciiText: OK (0.23s)
3.9 ms ± 238 μs, 9.5 MB allocated, 1.7 KB copied
At -O2, it's not even close:
All
Matching
Match, Text: OK (1.46s)
105 ms ± 3.1 ms, 754 MB allocated, 154 KB copied
Match, AsciiText: OK (0.77s)
403 μs ± 26 μs, 619 KB allocated, 199 B copied
Match 1, Text: OK (0.19s)
29 ms ± 1.4 ms, 105 MB allocated, 19 KB copied
Match 1, AsciiText: OK (0.28s)
5.1 ms ± 330 μs, 0 B allocated, 0 B copied
No match, Text: OK (0.60s)
92 ms ± 2.0 ms, 675 MB allocated, 116 KB copied
No match, AsciiText: OK (0.44s)
3.7 ms ± 84 μs, 9.5 MB allocated, 1.7 KB copied
The allocations (in my implementation) are still something I'm ironing out. For reference, this is working over a 6ish megabyte text; the 'Match' case tries a non-singleton needle with 96 occurrences, the 'Match 1' case tries a singleton needle, and 'No match' tries a non-matching needle. The implementation (of the index-finding function for matching) is below - BS
is ByteString
, P
is Prelude
:
indices :: ByteString -> ByteString -> [Int]
indices needle haystack = case BS.uncons needle of
Nothing -> []
Just (h, t) -> case BS.elemIndices h haystack of
[] -> []
whole@(i : is) -> if BS.null t
then whole
else go i is
where
go :: Int -> [Int] -> [Int]
go i is = let fragment = BS.take needleLen . BS.drop i $ haystack in
if fragment == needle
then i : case P.dropWhile (\j -> j - i < needleLen) is of
[] -> []
(j : js) -> go j js
else case is of
[] -> []
(j : js) -> go j js
needleLen :: Int
needleLen = BS.length needle
I would argue that this is an excellent choice.
@kozross am I reading your benchmarks right? Does text
become 20x slower when compiled with -O2
?
@Bodigrim - you are. It gets better - it's about a factor of 4 worse than even this on GHC 9 (these results are GHC 8.6).
@kozross Has it been reported to text
issue tracker?
@Bodigrim Not presently. I'm trying to get my own house in order, since I'm not in charge of (or involved in) text
, but text-ascii
is mine and mine alone. I shared this because I figured you might wanna re-use what I end up with, since it'll work for bytestring
with zero changes.
We are kick-starting a new maintainers team for text
, so it may be worth discussing there.
I just run text
benchmarks (https://github.com/haskell/text/pull/315) - and there is indeed a huge regression between GHC 8.6 and GHC 9.0 for many functions, like 2-3 times slower. This is simply terrible on its own, saying nothing about -O2
slow down.
I shared this because I figured you might wanna re-use what I end up with, since it'll work for bytestring with zero changes.
Sure, it is very nice of you, thanks for sharing.
@Bodigrim Once I've figured out my weird alloc-related issue, I'll definitely post a repro on text
's issue tracker, with more specifics.
I've been thinking about
splitOn
andreplace
forByteString
. They can be expressed in terms ofbreakSubstring
, but the more I look at the latter the more doubts I get about its implementation.https://github.com/haskell/bytestring/blob/bd5412c1b7fac3f63cc1d2ea4e75bdf45c04b541/Data/ByteString.hs#L1596-L1613
What's the reason for Karp-Rabin here? It is great to search for multiple patterns at once, but this is not our case. I suspect that for non-pathological cases even a naive loop with
memcmp
could very well be faster. And for pathological inputs Karp-Rabin is O(mn) anyways. If we want to fix the worst case scenario, we should employ Knuth-Moris-Pratt or Boyer-Moore.