Is it possible to parse a list of terminals?

lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

MIT License

4.64k stars 397 forks source link

Is it possible to parse a list of terminals? #1366

Closed Daniel63656 closed 3 months ago

Daniel63656 commented 8 months ago

I already have my tokens as a list of terminals, like this in a toy grammar:

start: A B C A: "a" B: "b" C: "c"

tokens = ["a", "b", "c"]

Can I use a parser that accepts this lists to prevent the unnecessary scanning step? All lexers throw TypeError: expected string or bytes-like object, got 'list

erezsh commented 8 months ago

Yes. See this example: https://github.com/lark-parser/lark/blob/master/examples/advanced/custom_lexer.py

MegaIng commented 8 months ago

You need to add Token types to your list of strings and construct lark.Token instances, otherwise lark has no idea what to do with your strings. This is the primary job the lexers. In your case, the corresponding token types are all just str.upper, so Token(c.upper(), c) constructs the correct token. For your actual usecase, you probably will need to do something more complex.