lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.89k stars 414 forks source link

How do you do negative lookahead? #486

Closed enjoysmath closed 4 years ago

enjoysmath commented 4 years ago

above: expr "\above" (FLOAT "pt")? expr

I want the second expression to be !(FLOAT "pt") expr assuming ! means negative lookahead.

evandrocoan commented 4 years ago

Lark grammars are not regex. They just have some similar features. See some lark Style Cheats from: (Design a new Lark cheatsheet #195)

To use regexes, you need to use activate the regex envinroment with: /regex/

For example:

above: expr "\above" REGEX_TOKEN expr
REGEX_TOKEN: /!(FLOAT "pt")?/
enjoysmath commented 4 years ago

@evandrocoan What I want to do is:

above: expr "\above" POINT_SIZE NOT_POINT_SIZE expr POINT_SIZE: /FLOAT pt/ NOT_POINT_SIZE: /!POINT_SIZE/

That seems kind of ridiculous, being as \above is just one command and there are 100's !

On Sat, Nov 23, 2019 at 8:13 AM evandrocoan notifications@github.com wrote:

Lark grammars are not regex. They just have some similar features. See some lark Style Cheats on: (Design a new Lark cheatsheet #195 https://github.com/lark-parser/lark/issues/195)

To use regexes, you need to use activate the regex envinroment with: /regex/

For example:

above: expr "\above" REGEX_TOKEN exprREGEX_TOKEN: /!(FLOAT "pt")?/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lark-parser/lark/issues/486?email_source=notifications&email_token=AAMIF54ITKOTX7XX3432BBTQVFJCPA5CNFSM4JQ2X2DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE7YI4I#issuecomment-557810801, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMIF5ZEIUABLHVQTNYYCHDQVFJCPANCNFSM4JQ2X2DA .

evandrocoan commented 4 years ago

POINT_SIZE: /FLOAT pt/ you cannot use tokens inside regular expressions.

Please, create a minimal and reproducible example with a minimal grammar to reproduce the problem you are having: How to create a Minimal, Reproducible Example

enjoysmath commented 4 years ago

POINT_SIZE: /FLOAT pt/ you cannot use tokens inside regular expressions.

Please, create a minimal and reproducible example with a minimal grammar to reproduce the problem you are having: How to create a Minimal, Reproducible Example

I am copying you from your previous post. You used FLOAT inside of a regex.

evandrocoan commented 4 years ago

My bad.

ctung commented 4 years ago

Still not sure how to do a negative lookahead lets say I want to parse "QN not followed by O" the regex would normally be /QN(?!O)/

test = """
FQ
FQN
FQNOM
FQNNOM
"""

grammar = r"""
start:    ("\n"|word)*
word:     (PRE1|PRE2) SUFFIX? "\n"

PRE1:     /FQ(?!N)/
PRE2:     /FQN(?!O)/
SUFFIX:   "NOM"
"""

parser = Lark(grammar, parser='lalr')
tree = parser.parse(test)
print(tree.pretty())

gives me this error:

lark.exceptions.UnexpectedCharacters: No terminal defined for 'F' at line 4 col 1

FQNOM
^

Expecting: {'PRE2', 'NEWLINE', 'PRE1'}

Previous tokens: Token(NEWLINE, '\n')

I'm trying to match

FQ -> FQ
FQN -> FQN
FQNOM -> FQ, NOM
FQNNOM -> FQN, NOM

Help?

edit: I figured it out:

grammar = r"""
start:    ("\n"|word)*
word:     (PRE1|PRE2) SUFFIX? "\n"

PRE1:     /FQ(?!NNOM)/
PRE2:     /FQN(?!O)/
SUFFIX:   "NOM"
"""

yields:

start
  word  FQ
  word  FQN
  word
    FQ
    NOM
  word
    FQN
    NOM