Closed phadej closed 9 years ago
Cool! This will be immensely useful to have. I have some comments:
sepBy1 :: Prod r e t a -> Prod r e t op -> Grammar r e (Prod r e t [a])
sepBy1 p op = mdo
ops <- rule $ pure [] <|> (:) <$ op <*> p <*> ops
return $ (:) <$> p <*> ops
expr :: Grammar r String (Prod r String Token Expr)
expr = mdo
let var = Var <$> satisfy isIdent
mul <- fmap (foldl1 Mul) <$> (add `sepBy1` symbol "*")
add <- fmap (foldl1 Add) <$> (var `sepBy1` symbol "+")
return mul
Though this is arguably what we are trying to avoid by using Earley. Maybe we should have both versions? I'm not sure.
earleyBench n = parseEarley $ "x" : concat (replicate n ["+", "x"])
Overall the results match what I have seen before (just from running time
on some parsers). It would be interesting to see if there's any low-hanging fruit to improve the timings. Though we also have to remember that a library like Earley that gives all possible parses does more work than libraries like Parsec because it has to try every possible next symbol at every position in the input.
I changed the benchmarks a bit, so the token list is generated outside. For some reason parsec is now much faster?
I also added tree branching expression benchmarks.
What's the difference between earleyTree
and parsecTree
? I suspect something went wrong with this commit or you forgot something. :)
I think the better performance is expected if most of the time that Parsec used before was to construct the input (though Earley should also get the same absolute time improvement).
will push updated version of benchmark soon
will push updated version of benchmark soon
The latest version:
Both seem to be O(n) where n is number of tokens; yet the constant is a bit different :)
Yeah, there's still some work to be done on shaving some of that constant. You can regain a bit of performance by rewriting the grammar as I did above, though ideally you shouldn't have to do that. I'm thinking there might be a way to have a special cased 'fast path' version of the code for when the parse is unambiguous, but I'll have to investigate.
Improving performance should be much easier now that we have proper benchmarks and a test suite. :)
:+1:
I was interested how this compares to parsec, so added a benchmark suite:
I guess it's good to have, so one can catch performance regressions too; though I don't know how to automate that part.