purescript-contrib / purescript-parsing

A parser combinator library based on Parsec
BSD 2-Clause "Simplified" License
149 stars 50 forks source link

Parsing.Combinators.Array #199

Closed jamesdbrock closed 2 years ago

jamesdbrock commented 2 years ago

New module Parsing.Combinators.Array for fast combinator variations which return Array.

See benchmarks in comment below.


Checklist:

jamesdbrock commented 2 years ago

digit 10000

runParser many digit 10000
mean   = 6.62 ms
stddev = 7.42 ms
min    = 4.53 ms
max    = 53.91 ms
runParser Array.many digit 10000
mean   = 65.46 ms
stddev = 6.48 ms
min    = 59.97 ms
max    = 83.47 ms
runParser manyArray digit 10000
mean   = 5.57 ms
stddev = 1.85 ms
min    = 4.97 ms
max    = 17.70 ms
StringParser manyRec CodePoints.anyDigit 10000
mean   = 6.96 ms
stddev = 4.69 ms
min    = 4.57 ms
max    = 21.94 ms
StringParser manyRec CodeUnits.anyDigit 10000
mean   = 5.05 ms
stddev = 389.23 μs
min    = 4.52 ms
max    = 8.14 ms
Regex.match \d* 10000
mean   = 369.43 μs
stddev = 140.75 μs
min    = 329.89 μs
max    = 1.77 ms

many anyChar 10000

runParser many anyChar 10000
mean   = 6.73 ms
stddev = 558.41 μs
min    = 6.15 ms
max    = 10.17 ms
runParser Array.many anyChar 10000
mean   = 67.00 ms
stddev = 7.13 ms
min    = 59.64 ms
max    = 89.04 ms
runParser manyArray anyChar 10000
mean   = 5.78 ms
stddev = 1.41 ms
min    = 5.35 ms
max    = 14.63 ms

skipMany anyChar 10000

runParser skipMany anyChar 10000
mean   = 3.53 ms
stddev = 468.27 μs
min    = 3.33 ms
max    = 6.28 ms
jamesdbrock commented 2 years ago

Maybe a Combinators.Array module would be good. We should probably have Array variations of all of the List combinators.

natefaubion commented 2 years ago

In my experience, using unsafe ST is not necessary, but regardless I wouldn’t recommend it. I originally did that in the language-cst parser, but it was “safe” because the internals of the parser were not exported. I switched to building a reversed List, then using Array.fromUnfoldable and Array.reverse. Both of those array operations are extremely cheap. Array.fromUnfoldable specifically builds an array with mutation and does not need to copy the final result. Array.reverse is much cheaper than List.reverse.

jamesdbrock commented 2 years ago

Interesting, thanks @natefaubion

natefaubion commented 2 years ago

To clarify, I did not notice any difference in performance when I tested the change in the CST parser. ST still has to allocate effect closures and binds at each step, which will be equivalent to allocating Cons cells.

jamesdbrock commented 2 years ago

Here is the benchmark for a manyFFI function which completely cheats. This is a reference for a practical lower bound for the runtime for a many combinator which exploits the fast V8 array push.

manyFFI :: forall s m a. ParserT s m a -> ParserT s m (Array a)
manyFFI p = do
  let result = [] :: Array a
  flip tailRecM unit $ \_ -> alt
    do
      x <- try p
      pure $ runFn2 unsafePush x result
      pure (Loop unit)
    do
      pure (Done unit)
  pure result

foreign import unsafePush :: forall a. Fn2 a (Array a) Unit
export const unsafePush = (x, xs) => { xs.push(x);}

many anyChar 10000

runParser many anyChar 10000
mean   = 6.58 ms
stddev = 524.99 μs
min    = 6.09 ms
max    = 9.90 ms
runParser Array.many anyChar 10000
mean   = 64.95 ms
stddev = 5.84 ms
min    = 60.42 ms
max    = 83.06 ms
runParser manyArray anyChar 10000
mean   = 5.99 ms
stddev = 914.83 μs
min    = 5.33 ms
max    = 10.48 ms
runParser manyFFI anyChar 10000
mean   = 5.27 ms
stddev = 538.36 μs
min    = 4.90 ms
max    = 8.01 ms
jamesdbrock commented 2 years ago

Here is manyArray implemented with Array.fromFoldable and Array.reverse as @natefaubion suggested. Really good speed. Maybe better than the unsafeLiftST version.

manyArray :: forall s m a. ParserT s m a -> ParserT s m (Array a)
manyArray p = do
  rlist <- flip tailRecM Nil $ \xs -> alt
    do
      x <- try p
      pure (Loop (x:xs))
    do
      pure (Done xs)
  pure $ Array.reverse $ Array.fromFoldable rlist

digit 10000

many anyChar 10000

runParser many anyChar 10000
mean   = 6.65 ms
stddev = 574.12 μs
min    = 6.10 ms
max    = 10.26 ms
runParser Array.many anyChar 10000
mean   = 65.44 ms
stddev = 7.05 ms
min    = 59.04 ms
max    = 84.52 ms
runParser manyArray anyChar 10000
mean   = 5.72 ms
stddev = 1.75 ms
min    = 4.67 ms
max    = 15.57 ms
runParser manyFFI anyChar 10000
mean   = 5.33 ms
stddev = 665.11 μs
min    = 4.89 ms
max    = 8.93 ms