kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
809 stars 95 forks source link

Prevent eager consuming and check all variants #191

Closed maxivak closed 6 years ago

maxivak commented 6 years ago

I have simple texts to parse.

<<long identifier with spaces>> <<freq>>

examples:
-- intel core duo t2400 2000 ghz => should parse to 
{title: "intel core duo t2400", freq: '2000 ghz'}

-- intel core 2 duo 2000 ghz => should parse to 
{title: "intel core 2 duo", freq: '2000 ghz'}. 
in this case  identifier may include number.

I have rules:

root :expression

rule(:expression) {
   title >> space >> freq
}

rule(:title){
identifier | (identifier >> (space >> identifier).repeat)
}

rule(:identifier){
    match['a-zA-Z0-9\-\_'].repeat
}

rule(:freq){
  number >> space >> str('ghz')
}

rule(:number) { digit.repeat(1) >> (str('.') >> digit.repeat(1)).maybe }
rule(:digit) { match('[0-9]').repeat(1) }

But it doesn't work. It always tries to include '2000' to title.

It seems like it eats all text until it can and it doesn't consider all possible variants like Regex does. For example, this regex '[a-z][a-z\d ]+ (\d+ ghz)' will find the result.

Is this the behaviour of Parslet by design and cannot be changed or I am using it wrong?