lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.63k stars 397 forks source link

Control the creation of sppf.png with a separate option. #750

Open charles-esterbrook opened 3 years ago

charles-esterbrook commented 3 years ago

Suggestion A Lark Earley parser will generate "sppf.png" if self.debug is True, as seen in early.py:

https://github.com/lark-parser/lark/blob/master/lark/parsers/earley.py - search for "sppf.png"

However, this is an expensive operation that radically slows down the multiple test cases in my project. Lark would benefit from an option like generate_parse_graph to control this. For example,

        parser = lark.Lark.open(
            grammarPath,
            debug=True,
            generate_parse_graph=False,
            parser='earley',  # lalr, earley
            lexer='standard',
            postlex=CustomIndenter(),
            start='file_input',
            keep_all_tokens=True,
            maybe_placeholders=True,
            propagate_positions=True,
            ambiguity='resolve')
            #regex=True)

If the option will require that debug is True (in other words be dependent on it), then a better name might be debug_parse_graph.

Another idea: Make the option to control the file name of the .png file. If None, it doesn't get written.

Describe alternatives you've considered Alternatives include:

Additional context N/A

MegaIng commented 3 years ago

Why are you using debug=True? In general, we don't want to add additional options to Lark. Alternatively, you could uninstall pydot. If we really want to make this controllable, I would suggest using multiple levels, e.g. debug=2 or similar.

charles-esterbrook commented 3 years ago

In general, I'm using debug because I'm in "development mode" instead of "release mode". More specifically, when I was using LALR instead of Earley, I was benefiting from the if self.debug goodies in compute_lalr1_states() and parse_from_state.

I still recommend an sppf_file option which can be True, a filename or something falsey. Or something similar.

Not sure why an sppf_file option shouldn't be added. It's not the type of option that bloats the code or adds more levels of indirection. What I'm trying to say is that it doesn't have complexity like parser, postlexer, etc.

For now, I have uninstalled pydot and continue to read the No pydot stderr message several times a day.

MegaIng commented 3 years ago

"development mode" instead of "release mode".

I would say you are in "test mode". Note that you should activate the logger in "test mode" in case warnings are generated. If you would have done that you would have realized that we are generating a lot of messages with debug=True.

Not sure why an sppf_file option shouldn't be added.

It isn't code bloat. It is API bloat. This is option is rarely used, and doesn't really provide new features, since you can get the same effect by deactivating debug. But it is another entry to an already long list of arguments.

erezsh commented 3 years ago

It seems to me that if you're using debug=True, then you're in debug mode, which means you're trying to solve a problem with the parser. Once you've solved that problem, you are expected to disable debug mode, and run Lark normally. If any change that you make creates a new error, don't worry, Lark will let you know using an exception.

Why do you need to run Lark in debug=True?

charles-esterbrook commented 3 years ago

Maybe the next step is official docs on what can be expected during debug=True?

erezsh commented 3 years ago

Good point. Updated docs.

charles-esterbrook commented 3 years ago

Thanks. I was thinking it could say what the extra info and warnings are. Like unused rules, shift/reduce conflicts, etc.

MegaIng commented 3 years ago

That is a little bit harder, since that is often changing. But it might be worth to try and create an always up-to-date list. It also has more hidden effects (e.g. Stack state dumping/preventing optimizations in lalr)