kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
805 stars 95 forks source link

missing matches #181

Closed pinkynrg closed 7 years ago

pinkynrg commented 7 years ago

I wrote a small test like this:

class TEST < Parslet::Parser
  rule(:first) { second >> third.maybe }
  rule (:second) { str("A") | str("AB") }
  rule (:third) { str('!') }
  root :first
end

shouldn't this example parse all the following inputs?

In my case it parses only:

On the other hand if I replace in the "second" rule the "AB" string with a "C" string I can parse the following as expected:

kschiess commented 7 years ago

The '|' operator prefers left side over right. If the left side match is shorter and matches, the right side will never be tested (no backtracking).

In other words, your rule ':second' should probably be:

rule (:second) { str("AB") | str("A") }
pinkynrg commented 7 years ago

Thanks!

pinkynrg commented 7 years ago

My example is tricky, also because I've never built a parser before. Is there a way I can capture with a regex a substring of the original string and then check if the entire selected substring passes a certain rule? Is this something that makes sense when building a parser?

I'm debugging a parser for measurement units. This is the rule I'm looking at:

rule (:simpleton) do
  (prefix.as(:prefix) >> metric_atom.as(:atom) | atom.as(:atom))
end

prefixes are like m (milli), c (centi), k (kilo)... metric_atoms are like m (meter), l (liter)... atoms are like all items, metric and non metric m (meter), l (liter), mma (custom unit), ...

In the case I create a "non metric custom atom" like "mma" things brake! This is because with the above rule mma matches m+m = milli + meter but the final character "a" is left over and the parser fails cause I route it in the wrong branch.

If I were able to say: "look, mma is the thing you need to parse either parse it 100% with the simpleton rule or the measurement_unit doesn't exist".

I hope I was kind of clear.

pinkynrg commented 7 years ago

I solved by adding a rule that checks using match["..."].present | any.absent?