kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
809 stars 95 forks source link

multiple alternate matching performance #184

Open olbrich opened 6 years ago

olbrich commented 6 years ago

I have a use case where I end up creating a custom rule like..

def words
  %w[longestword longword word w].map {|w| str(w)}.reduce(:|)
end

Where the list of words can be quite long. The performance of this is pretty bad, as it creates and checks a regular expression for each of the words in turn until it finds one that works.

There are a couple of optimizations that can be done to speed this up. First, would be to fail the str match if the remaining string is shorter than the word that is trying to be matched. Using a single regex that looks like /(longestword|longword|word|w)/ would probably help match in one try instead of several.

Any advice on how to approach making this performant given that the list of words is somewhat dynamic?

kschiess commented 6 years ago

Is this solved by PR #185?

Parslet performance is an issue, I know. I tried a lot of things; in the end, I find it hard to keep the reflective nature of the code intact and improve performance at the same time. Since there are a lot of parsers that don't reflect on themselves and are plenty fast, I won't duplicate that effort. I hope that makes sense to you.

olbrich commented 6 years ago

If you look at the parser I linked to in that PR you can see the trick I use to work around this... Basically I figure out how many characters are left and immediately reject any of those patterns since there is no way they can match. It seems to help a fair amount. I'll take another look at this now that #185 is merged.