lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.86k stars 413 forks source link

Make extended grammar work for Token imports. #1060

Open shiyunon opened 2 years ago

shiyunon commented 2 years ago

Suggestion

Currently, (as an example), to extend a statement rule in a grammar called old_gram.lark, we can do %import .old_gram.statement -> old_gram__statement in the new grammar file new_gram.lark. The problem is that this does not work for Tokens and we cannot do %import .old_gram.IDENTIFIER -> old_gram__IDENTIFIER (lark will report an error because lark thinks old_gram__IDENTIFIER as a rule not a token). One workaround would be simply %import .old_gram.IDENTIFIER with the downside to be that the merged transformer of the old/new transformers would not work by default.

There is also another big issue of not being able to do %import .old_gram.IDENTIFIER -> old_gram__IDENTIFIER: if in old_gram.lark, we have the following rule:

identifier_list: IDENTIFIER ("," IDENTIFIER)*

and we import it to new_gram.lark with %import .old_gram.identifier_list -> old_gram__identifier_list. In the generated tree of programs that use identifier_list, IDENTIFIER tokens will be translated to old_gram__IDENTIFIER (under identifier_list Tree). Then, we have some old_gram__IDENTIFIER live in the Tree, but we cannot explicitly add this Token to the tree. Any workaround we try will diverge from old_gram__IDENTIFIER.

My suggestion is to make the default prefix of Token importing to be uppercase, for example, %import .old_gram.IDENTIFIER -> OLD_GRAM__IDENTIFIER.

It would be great if someone can let me know if the above makes sense, I can probably help here too.

erezsh commented 2 years ago

There is the idea of making terminals MixedCase instead of only UPPER_CASE. That way, users can name them MyTerminal if they want, and it will also solve your use-case on the way.

But I'm not sure how I feel about users importing rules and terminals into a namespace explicitly.

Why not have a matching import from the transformer instead?

i.e.

%import .old_gram.IDENTIFIER 

and

class NewTransformer(Transformer):
    from old_transformer import IDENTIFIER
shiyunon commented 2 years ago

Thanks for the reply. Yes, I agree that we can match import from the transformer, but having to do that already makes it a different user experience from importing rules. Thus, I think, this might worth being documented somewhere.

erezsh commented 2 years ago

Maybe so. This is a pretty new feature. I didn't even get a chance to use it much yet, so I can't say for sure what's the recommended usage.

shiyunon commented 2 years ago

Sorry, might have to confirm that I understood the following so that we have the same assumption:

But I'm not sure how I feel about users importing rules and terminals into a namespace explicitly.

Importing rules into a namespace is required for many use cases, right? Based on https://github.com/lark-parser/lark/blob/master/examples/composition/storage.lark and https://github.com/lark-parser/lark/pull/973#issuecomment-907287565

erezsh commented 2 years ago

Yes, importing rules into the local namespace is a common operation.

What you were doing is injecting rules into other namespaces. In this case, the original namespace, so it's not as bad, but still feels like it shouldn't be encouraged. I'd much rather add a %soft_import directive that doesn't remove the namespace.