Parsing.Combinators.Array

jamesdbrock commented 2 years ago

New module Parsing.Combinators.Array for fast combinator variations which return Array.

See benchmarks in comment below.

Checklist:

[x] Added the change to the changelog's "Unreleased" section with a link to this PR and your username
[x] Linked any existing issues or proposals that this pull request should close
[x] Updated or added relevant documentation in the README and/or documentation directory
[x] Added a test for the contribution (if applicable)

jamesdbrock commented 2 years ago

digit 10000
runParser many digit 10000 mean = 6.62 ms stddev = 7.42 ms min = 4.53 ms max = 53.91 ms
runParser Array.many digit 10000 mean = 65.46 ms stddev = 6.48 ms min = 59.97 ms max = 83.47 ms
runParser manyArray digit 10000 mean = 5.57 ms stddev = 1.85 ms min = 4.97 ms max = 17.70 ms
StringParser manyRec CodePoints.anyDigit 10000 mean = 6.96 ms stddev = 4.69 ms min = 4.57 ms max = 21.94 ms
StringParser manyRec CodeUnits.anyDigit 10000 mean = 5.05 ms stddev = 389.23 μs min = 4.52 ms max = 8.14 ms
*Regex.match \d 10000** mean = 369.43 μs stddev = 140.75 μs min = 329.89 μs max = 1.77 ms
many anyChar 10000
runParser many anyChar 10000 mean = 6.73 ms stddev = 558.41 μs min = 6.15 ms max = 10.17 ms
runParser Array.many anyChar 10000 mean = 67.00 ms stddev = 7.13 ms min = 59.64 ms max = 89.04 ms
runParser manyArray anyChar 10000 mean = 5.78 ms stddev = 1.41 ms min = 5.35 ms max = 14.63 ms
skipMany anyChar 10000
runParser skipMany anyChar 10000 mean = 3.53 ms stddev = 468.27 μs min = 3.33 ms max = 6.28 ms

jamesdbrock commented 2 years ago

Maybe a Combinators.Array module would be good. We should probably have Array variations of all of the List combinators.

natefaubion commented 2 years ago

In my experience, using unsafe ST is not necessary, but regardless I wouldn’t recommend it. I originally did that in the language-cst parser, but it was “safe” because the internals of the parser were not exported. I switched to building a reversed List, then using Array.fromUnfoldable and Array.reverse. Both of those array operations are extremely cheap. Array.fromUnfoldable specifically builds an array with mutation and does not need to copy the final result. Array.reverse is much cheaper than List.reverse.

jamesdbrock commented 2 years ago

Interesting, thanks @natefaubion

natefaubion commented 2 years ago

To clarify, I did not notice any difference in performance when I tested the change in the CST parser. ST still has to allocate effect closures and binds at each step, which will be equivalent to allocating Cons cells.

jamesdbrock commented 2 years ago

Here is the benchmark for a manyFFI function which completely cheats. This is a reference for a practical lower bound for the runtime for a many combinator which exploits the fast V8 array push.

manyFFI :: forall s m a. ParserT s m a -> ParserT s m (Array a)
manyFFI p = do
  let result = [] :: Array a
  flip tailRecM unit $ \_ -> alt
    do
      x <- try p
      pure $ runFn2 unsafePush x result
      pure (Loop unit)
    do
      pure (Done unit)
  pure result

foreign import unsafePush :: forall a. Fn2 a (Array a) Unit

export const unsafePush = (x, xs) => { xs.push(x);}

many anyChar 10000
runParser many anyChar 10000 mean = 6.58 ms stddev = 524.99 μs min = 6.09 ms max = 9.90 ms
runParser Array.many anyChar 10000 mean = 64.95 ms stddev = 5.84 ms min = 60.42 ms max = 83.06 ms
runParser manyArray anyChar 10000 mean = 5.99 ms stddev = 914.83 μs min = 5.33 ms max = 10.48 ms
runParser manyFFI anyChar 10000 mean = 5.27 ms stddev = 538.36 μs min = 4.90 ms max = 8.01 ms

jamesdbrock commented 2 years ago

Here is manyArray implemented with Array.fromFoldable and Array.reverse as @natefaubion suggested. Really good speed. Maybe better than the unsafeLiftST version.

manyArray :: forall s m a. ParserT s m a -> ParserT s m (Array a)
manyArray p = do
  rlist <- flip tailRecM Nil $ \xs -> alt
    do
      x <- try p
      pure (Loop (x:xs))
    do
      pure (Done xs)
  pure $ Array.reverse $ Array.fromFoldable rlist

digit 10000
many anyChar 10000
runParser many anyChar 10000 mean = 6.65 ms stddev = 574.12 μs min = 6.10 ms max = 10.26 ms
runParser Array.many anyChar 10000 mean = 65.44 ms stddev = 7.05 ms min = 59.04 ms max = 84.52 ms
runParser manyArray anyChar 10000 mean = 5.72 ms stddev = 1.75 ms min = 4.67 ms max = 15.57 ms
runParser manyFFI anyChar 10000 mean = 5.33 ms stddev = 665.11 μs min = 4.89 ms max = 8.93 ms

purescript-contrib / purescript-parsing

Parsing.Combinators.Array #199

digit 10000

many anyChar 10000

skipMany anyChar 10000

many anyChar 10000

digit 10000

many anyChar 10000