lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.77k stars 404 forks source link

The performance of instancing a Lark object with a large grammar file #344

Closed simplelife963 closed 5 years ago

simplelife963 commented 5 years ago

I tried the python parser of examples and found that the processing of parsing is really fast, however it took about 2 seconds for instancing a Lark object by using the following statement:

# the grammar_file is the python3.lark file
Lark.open(grammar_file, parser='lalr', **kwargs)

Is there anything I missed to optimize the performance?

Any idea would be really appreciated.

erezsh commented 5 years ago

Did you try the stand-alone parser? It loads faster than regular use of Lark. Also setting lexer='standard' will load a little faster.

simplelife963 commented 5 years ago

I generated the stand-alone parser by running the generator in command-line and it worked really fast and perfectly! Thank you for the solution, Erezsh, you really helped me a lot!

Following example code for Python parser might be useful for others. The parser will return a error message if not passing the PythonIdenter instance into the Lark.

from python_parser import Lark_StandAlone

dsl = """
a = 1
n()
b = 2
c = a + b
if c > 4:
    if a == 1:
        c = 15
    elif b > 2:
        c = 10
    """

args = dict(postlex=PythonIndenter())
parser = Lark_StandAlone(**args)
print(parser.parse(dsl))

By the way, from Python 3.X, it supports non-English characters as variable name/identifier, and the terminal "NAME" in python3.lark might need to be updated for supporting this feature.

erezsh commented 5 years ago

Glad it worked for you. Gj figuring out the indenter

from Python 3.X, it supports non-English characters as variable name/identifier, and the terminal "NAME" in python3.lark might need to be updated for supporting this feature.

I'll be happy to accept a pull request for that