haskell-hvr / regex-tdfa

Pure Haskell Tagged DFA Backend for "Text.Regex" (regex-base)
http://hackage.haskell.org/package/regex-tdfa
Other
36 stars 9 forks source link

Large character classes combined with {m,n} is very slow and memory-hungry #3

Open neongreen opened 4 years ago

neongreen commented 4 years ago

https://github.com/ChrisKuklewicz/regex-tdfa/issues/14, originally reported by @jaspervdj


module Main where
import qualified Text.Regex.TDFA as Tdfa

main :: IO ()
main = do
    let pattern = "^[\x0020-\xD7FF]{1,255}$"
        input   = take 100 $ cycle "abcd"

        regex :: Tdfa.Regex
        regex = Tdfa.makeRegexOpts Tdfa.defaultCompOpt Tdfa.defaultExecOpt pattern

        matches :: [Tdfa.MatchArray]
        matches = Tdfa.match regex input

    print matches

This takes over 6 seconds on my machine and claims around 3GB(!) in memory. Removing the {m,n} part and using "^[\x0020-\xD7FF]+$" combined with an explicit length check (in Haskell) is a workaround.