Newbie questions - Githubissues

brupelo commented 6 years ago

Consider the below snippet:

from lark import Lark, inline_args, Transformer

grammars = [
    """
        ?start: sum | NAME "=" sum
        ?sum: product | sum "+" product | sum "-" product
        ?product: atom | product "*" atom | product "/" atom
        ?atom: NUMBER | "-" atom | NAME | "(" sum ")"
        %import common.CNAME -> NAME
        %import common.NUMBER
        %import common.WS_INLINE
        %ignore WS_INLINE
    """,
    """
        ?start: sum | NAME "=" sum
        ?sum: product | sum "+" product | sum "-" product
        ?product: atom | product "*" atom | product "/" atom
        ?atom: NUMBER | "-" atom | NAME | "(" sum ")"
        EQUAL: "="
        LPAR: "("
        RPAR: ")"
        SLASH: "/"
        STAR: "*"
        MINUS: "-"
        PLUS: "+"
        %import common.CNAME -> NAME
        %import common.NUMBER
        %import common.WS_INLINE
        %ignore WS_INLINE
    """,
    """
        ?start: sum | NAME "=" sum
        ?sum: product | sum "+" product | sum "-" product
        ?product: atom | product "*" atom | product "/" atom
        ?atom: NUMBER | "-" atom | NAME | "(" sum ")"
        OPERATOR : "=" | "(" | ")" | "/" | "*" | "-" | "+"
        %import common.CNAME -> NAME
        %import common.NUMBER
        %import common.WS_INLINE
        %ignore WS_INLINE
    """
]

def test(grammar, text):
    parser = Lark(grammar, start='start')
    # print(parser.parse(text).pretty())
    print(sorted(list(set([t.type for t in parser.lex(text)]))))
    # print([t.name for t in parser.lexer.tokens])

text = "x = 1+2 - 3-4 - 5*6 - 7/8 - (9+10-11*12/13)"
for i, grammar in enumerate(grammars):
    print('grammar {}'.format(i).center(80, '*'))
    test(grammar, text)

whose output is:

***********************************grammar 0************************************
['NAME', 'NUMBER', '__EQUAL', '__LPAR', '__MINUS', '__PLUS', '__RPAR', '__SLASH', '__STAR']
***********************************grammar 1************************************
['EQUAL', 'LPAR', 'MINUS', 'NAME', 'NUMBER', 'PLUS', 'RPAR', 'SLASH', 'STAR']
***********************************grammar 2************************************
['NAME', 'NUMBER', '__EQUAL', '__LPAR', '__MINUS', '__PLUS', '__RPAR', '__SLASH', '__STAR']

got some questions:

0) About grammar0, this set of token types '__EQUAL', '__LPAR', '__MINUS', '__PLUS', '__RPAR', '__SLASH', '__STAR' are generated automagically, how does this work internally?

1) About grammar1, following this method I'll be able to identify easily the token types so I can use the types to syntax highlight with QScintilla, is there any problem with this approach?

2) About grammar2, in case I want to syntax highlight a group of similar tokens, how can I do that? In this case the token types are still generated automatically instead becoming OPERATOR. I'd like to be able to apply one QScintilla style to a bunch of related tokens (ie: OPERATORS= " | "(" | ")" | "/" | "*" | "-" | "+")

erezsh commented 6 years ago

Here is the "algorithm": https://github.com/lark-parser/lark/blob/master/lark/load_grammar.py#L260

grammar1 looks good.

grammar2 won't work. You should "classify" them after the lexing is done. Either use a lookup table, or name them appropriately. E.g.

OP_SLASH: "/" OP_STAR: "*" OP_MINUS: "-" OP_PLUS: "+"

And later: if token.type.startswith('OP_'): ...

brupelo commented 6 years ago

Consider this snippet:

from lark import Lark, inline_args, Transformer
from lark.tree import pydot__tree_to_png

grammar_calc = """
    ?start: sum | NAME EQUAL sum
    ?sum: product | sum EQUAL product | sum MINUS product
    ?product: atom | product STAR atom | product SLASH atom
    ?atom: NUMBER | MINUS atom | NAME | LPAR sum RPAR
    EQUAL: "="
    LPAR: "("
    RPAR: ")"
    SLASH: "/"
    STAR: "*"
    MINUS: "-"
    PLUS: "+"
    %import common.CNAME -> NAME
    %import common.NUMBER
    %import common.WS_INLINE
    %ignore WS_INLINE
"""

def test(grammar, text):
    parser = Lark(grammar, start='start')
    print(sorted(list(set([t.type for t in parser.lex(text)]))))
    print(list(parser.lex(text)))
    # print([t.name for t in parser.lexer.tokens])
    parser.parse(text)

text = "x = 1+2 - 3-4 - 5*6 - 7/8 - (9+10-11*12/13)"
test(grammar_calc, text)

And here's the traceback:

['EQUAL', 'LPAR', 'MINUS', 'NAME', 'NUMBER', 'PLUS', 'RPAR', 'SLASH', 'STAR']
[Token(NAME, 'x'), Token(EQUAL, '='), Token(NUMBER, '1'), Token(PLUS, '+'), Token(NUMBER, '2'), Token(MINUS, '-'), Token(NUMBER, '3'), Token(MINUS, '-'), Token(NUMBER, '4'), Token(MINUS, '-'), Token(NUMBER, '5'), Token(STAR, '*'), Token(NUMBER, '6'), Token(MINUS, '-'), Token(NUMBER, '7'), Token(SLASH, '/'), Token(NUMBER, '8'), Token(MINUS, '-'), Token(LPAR, '('), Token(NUMBER, '9'), Token(PLUS, '+'), Token(NUMBER, '10'), Token(MINUS, '-'), Token(NUMBER, '11'), Token(STAR, '*'), Token(NUMBER, '12'), Token(SLASH, '/'), Token(NUMBER, '13'), Token(RPAR, ')')]
Traceback (most recent call last):
  File "D:\sources\personal\python\pyframework\widgets\editor\t.py", line 32, in <module>
    test(grammar_calc, text)
  File "D:\sources\personal\python\pyframework\widgets\editor\t.py", line 28, in test
    parser.parse(text)
  File "d:\virtual_envs\py364_32\lib\site-packages\lark\lark.py", line 197, in parse
    return self.parser.parse(text)
  File "d:\virtual_envs\py364_32\lib\site-packages\lark\parser_frontends.py", line 137, in parse
    return self.parser.parse(text)
  File "d:\virtual_envs\py364_32\lib\site-packages\lark\parsers\xearley.py", line 126, in parse
    column = scan(i, column)
  File "d:\virtual_envs\py364_32\lib\site-packages\lark\parsers\xearley.py", line 115, in scan
    raise UnexpectedInput(stream, i, text_line, text_column, {item.expect for item in to_scan}, set(to_scan))
lark.lexer.UnexpectedInput: No token defined for: '+' in '+2 - ' at line 1 col 5

Questions:

1) What's the reason of this error? parser.lex returns all tokens correctly and parser.parse crashes... I don't get it. 2) I'd like to give it a shot to pydot__tree_to_png, I've installed pydot on windows7... Do I need to install any additional package (dot or similar...) or it's a ready-to-go package?

erezsh commented 6 years ago

Come on, you can figure it out

brupelo commented 6 years ago

@erezsh

Oh, I hadn't even read the grammar at all, it was completely broken, solved. I'll write a full calculator grammar to practice/master properly the "art of writing EBNF grammars" so it becomes 2nd nature to me. Btw, any recommended reading about writing EBNF grammars? I mean, not the syntax itself but more like the algorithm/steps you follow when you want to extract out the grammar of a particular language or create a grammar from scratch?

About pydot__tree_to_png it seems yeah, you need dot.exe on PATH

One advice, don't take the questions from this general thread like "oh! these guy/s is/are too lazy to solve simple stuff by themselves, what annoying or waste of time" but more like questions that can provide useful content to the lark package somehow (ideas, examples, faq, docs, new use-cases, ...). In my experience, users/communities are a cheap/free way that help to improve products ;)

brupelo commented 6 years ago

Question: On python3.6 000001.1 is a valid token while 01 is not, on python2.7 both 000001.1 and 01 are valid tokens. I don't understand the reason for python to allow numbers(int or floats) having leading zeroes... disambiguation in some cases?

That said, I see in common.g, NUMBER terminal allows leading-zeroes on both INT or FLOATs, any strong reason to allow this?

erezsh commented 6 years ago

Well, I'm not sure I see the value in solving bugs in other people's grammars. There's no way I'm going to find a solution to that problem. The only thing I can do is make the grammars as simple and clean as possible, and hope the users will make enough effort to "get it".

Anyway, I appreciate your suggestions and involvement in Lark, so I'm happy to help you out.

Unfortunately, I don't know of a good guide to writing grammars. Maybe I'll write one some day. There's definitely a method to it, but mostly I just try to describe things according to how I want the resulting AST to look, and then find a way to make it work with the limitations of the algorithm.

Well, you might find a clue here:

    >>> 010.1
    10.1
    >>> 010
    8
    >>> 08
    Traceback (most recent call last):
    File "<stdin>", line 1

In Python2, starting a number with 0 denoted an octal base (equivalent to the newer 0o10).

In Python3 they removed the old syntax. I guess they preferred to throw an error than interpret a different number all of a sudden, to make the transition from 2 to 3 less buggy.

brupelo commented 6 years ago

There's no way I'm going to find a solution to that problem

This is definitely a valid point, makes sense. In fact, I was thinking whether my above question contained something useful to extract out of it and actually there wasn't :) . Even I can't argue about Lark Error exceptions as they're actually quite clear ones. So yeah, it was a bad question tbh and your answer was appropiate.

Anyway, I appreciate your suggestions and involvement in Lark, so I'm happy to help you out.

Glad to help, I tend to get quite passionate about projects that catch my interest (if I've got some spare time) and definitely lark is a really nice interesting. It has clean code and the most important, you expose the right public interfaces to users. This last one may seem not important but it is... I consider that hiding the complexity of libraries is one of the most important keys when it comes to make a product popular. Plus, even if never learned properly the stuff about compilers/grammars/... is always been a subject I've been quite interested on so I'm learning cool stuff thanks to use lark.

Right now I was looking at your adaptions python2.g & python3.g and looking at tokenizer.c. Out of curiosity, did you use that file as a reference to come up with the tokens yourself?

About good guides to writing grammars, it'd be awesome finding some tutorial/book/article talking about this stuff, I find it a really interesting/useful subject. For instance, consider this more complete (it's still broken) grammar to parse mathy expressions (taken from antlr repo):

grammar_calc2 = """
    equation: expression relop expression
    expression: multiplyingExpression ((PLUS | MINUS) multiplyingExpression)*
    multiplyingExpression: powExpression ((TIMES | DIV) powExpression)*
    powExpression: : signedAtom (POW signedAtom)*
    signedAtom: PLUS signedAtom
       | MINUS signedAtom
       | func
       | atom
    atom: scientific
       | variable
       | constant
       | LPAREN expression RPAREN
    scientific: SCIENTIFIC_NUMBER
    constant: PI
       | EULER
       | I
    variable: VARIABLE
    func: funcname LPAREN expression (COMMA expression)* RPAREN
    funcname: COS
       | TAN
       | SIN
       | ACOS
       | ATAN
       | ASIN
       | LOG
       | LN
       | SQRT
    relop: EQ
       | GT
       | LT
    COS: 'cos'
    SIN: 'sin'
    TAN: 'tan'
    ACOS: 'acos'
    ASIN: 'asin'
    ATAN: 'atan'
    LN: 'ln'
    LOG: 'log'
    SQRT: 'sqrt'
    LPAREN: '('
    RPAREN: ')'
    PLUS: '+'
    MINUS: '-'
    TIMES: '*'
    DIV: '/'
    GT: '>'
    LT: '<'
    EQ: '='
    COMMA: ','
    POINT: '.'
    POW: '^'
    PI: 'pi'
    EULER: E2
    I: 'i'
    VARIABLE: VALID_ID_START VALID_ID_CHAR*
    VALID_ID_START: ('a' .. 'z') | ('A' .. 'Z') | '_'
    VALID_ID_CHAR: VALID_ID_START | ('0' .. '9')
    SCIENTIFIC_NUMBER: NUMBER ((E1 | E2) SIGN? NUMBER)?
    NUMBER: ('0' .. '9') + ('.' ('0' .. '9') +)?
    E1: 'E'
    E2: 'e'
    SIGN: ('+' | '-')
    WS: [ \r\n\t] + -> skip
"""

Now, let's imagine this 2 possible scenarios:

Let's say I want to write a EBNF grammar from scratch but the only information is clear to me at the very beginning is the set of grammar tokens={function_names, numbers, identifiers, operators} but not the relationships between them. Now, let's say I've got the typical operator table precedence at my hands. Considering these assumptions, would you say there is a mechanical way (or some sort of advices) so you figure out one of the possible resulting EBNF grammars?
Or even the opossite situation, let's consider the above EBNF grammar, how would you figure out the operator precedence out of it?

Btw, I tend to edit my questions few times, guess github doesn't notify you about this fact? Just asking as I'd like to know each time you answer you're referring to the last version of my question...

erezsh commented 6 years ago

you expose the right public interfaces to users. This last one may seem not important

It is the most important thing about a library.

did you use that file as a reference to come up with the tokens yourself

Honestly don't remember. I adapted the official python ebnf. I don't recall where I got the terminals from.

Precedence is just a pattern. In case of LR parsers, the best way to write it is:

  multiplyingExpression: [multiplyingExpression (TIMES | DIV)] powExpression

But at first glance the antlr grammar should work too.

brupelo commented 6 years ago

like questions that can provide useful content to the lark package somehow (ideas, examples, faq, docs, new use-cases, ...)

I forgot to say that some guys out there use github ticket issues to elaborate on interesting topics so after that they can just writ about it on their websites/blog/tutorials... One example is the creator of nuitka (which btw is an amazing piece of python tech), he just does that, one quote from him few days ago

While replying I got into a fair bit of output and explaining yesterday night. I will break it down into multiple replies of actual quality, i.e. explaining things in ways actually understandable, coming over the next day or two. That way it won't be a monster thread, and something I can turn into actual blog posts in the end.

So, please don't hesitate to elaborate on topics you could consider interesting ones and if you feel like it ;D . It's always nice to learn from people who're expert on certain topics/areas.

Btw, as we speaking I'd like to make a little exercise... transforming antlr grammas into lark ones... This will serve me in the future as I'd like to write a little glsl minifier but I have 0 idea about these topics. Minifiers/Autoformatters/Transformers... so, any advice that can be helpful, just let me know. I guess it's all about having certain deterministic rules that transform Tree_n into Tree_n+1, anyway... Lark rocks, more I discover about this stuff more unlimited options I foresee to code :D

brupelo commented 6 years ago

@erezsh Related to my previous question about using lark to create minifiers/code_formatters/transformers... Let's put a trivial example, consider this snippet:

import os
from lark import Lark, inline_args, Transformer
from lark.tree import pydot__tree_to_png

grammar = r"""
    start: equation*
    equation: expression relop expression SEMI
    expression: multiplyingexpression ((PLUS | MINUS) multiplyingexpression)*
    multiplyingexpression: powexpression ((TIMES | DIV) powexpression)*
    powexpression: signedatom (POW signedatom)*
    signedatom: PLUS signedatom
       | MINUS signedatom
       | func
       | atom
    atom: scientific
       | variable
       | constant
       | LPAREN expression RPAREN
    scientific: SCIENTIFIC_NUMBER
    constant: PI
       | EULER
       | I
    variable: VARIABLE
    func: funcname LPAREN expression (COMMA expression)* RPAREN
    funcname: COS
       | TAN
       | SIN
       | ACOS
       | ATAN
       | ASIN
       | LOG
       | LN
       | SQRT
    relop: EQ
       | GT
       | LT
    COS: "cos"
    SIN: "sin"
    TAN: "tan"
    ACOS: "acos"
    ASIN: "asin"
    ATAN: "atan"
    LN: "ln"
    LOG: "log"
    SQRT: "sqrt"
    LPAREN: "("
    RPAREN: ")"
    PLUS: "+"
    MINUS: "-"
    TIMES: "*"
    DIV: "/"
    GT: ">"
    LT: "<"
    EQ: "="
    COMMA: ","
    POINT: "."
    POW: "^"
    PI: "pi"
    EULER: E2
    I: "i"
    VARIABLE: VALID_ID_START VALID_ID_CHAR*
    VALID_ID_START: "a".."z"|"A".."Z"|"_"
    VALID_ID_CHAR: VALID_ID_START | ("0".."9")
    SCIENTIFIC_NUMBER: NUMBER ((E1 | E2) SIGN? NUMBER)?
    NUMBER: "0".."9" + ("." ("0" .. "9") +)?
    E1: "E"
    E2: "e"
    SIGN: "+" | "-"
    SEMI: ";"
    WS: /[ \r\n\t]/+
    %ignore WS
"""

def test(grammar, text, display=True):
    parser = Lark(grammar, start='start')
    if display:
        filename = "1.png"
        pydot__tree_to_png(parser.parse(text), filename)
        os.system("start " + filename)
    else:
        print(sorted(list(set([t.type for t in parser.lex(text)]))))
        print(list(parser.lex(text)))

text = """
    x = cos(0); y = sin(PI)+0.5;
    result = x+y;
"""
test(grammar, text, display=True)

Let's assume the above grammar is valid (the posted code will run ok but haven't tested thoroughly), question:

How can I use lark to interpret text and evaluate result?

Help me with that so I can push a little pyqt mini-matlab as example :)

PS: The generated tree is really verbose one, any chance to simplify it?

erezsh commented 6 years ago

@brupelo This is all covered in the calculator example: https://github.com/lark-parser/lark/blob/master/examples/calc.py

Use the ? rule modifier to remove nodes with only 1 child (where the node doesn't provide any information, like multiplyingexpression.
Use the Transformer class to evaluate the tree. You can do it incrementally.

brupelo commented 6 years ago

In evaluating-the-tree you mention:

A transformer is a class with methods corresponding to branch names. For each branch, the appropriate method will be called with the children of the branch as its argument, and its return value will replace the branch in the tree.

I understand that explanation... what I don't understand is this https://github.com/lark-parser/lark/blob/master/examples/calc.py#L44 as I don't see any rule in the grammar called assign_var. What about order&precedence, how is the tree traversed?

erezsh commented 6 years ago

assign_var is an alias.

The tree is traversed bottom-up, which means the leaves first, the root last.

brupelo commented 6 years ago

Is it possible to get a list of all non-terminals used by a lark grammar?

erezsh commented 6 years ago

{r.origin for r in lark_parser.rules}

brupelo commented 6 years ago

If you use that with the latest grammar I've posted you'll get {'variable', 'start', 'relop', 'scientific', 'powexpression', '__anon_star_2', 'signedatom', 'equation', '__anon_star_1', 'constant', 'expression', 'funcname', 'func', '__anon_star_3', '__anon_star_4', 'atom', 'multiplyingexpression', '__anon_star_0'}. I assume each of the __anon_star correspond to these ones... but, is it correct to consider these ones as rules?

erezsh commented 6 years ago

What do you mean by correct?

brupelo commented 6 years ago

Let me ask it differently, does it make sense that a regular expression operator such as * can be considered as a rule? Not a big deal in any case, if the __anon_ prefix is used consistently I can just filter out the set and move on

erezsh commented 6 years ago

Yes, it's a rule.

Just filter out anything that begins with __, that's what it's for.

brupelo commented 6 years ago

Any idea why adding methods at runtime to the transformer won't work if done like this:

import os
import sys
import types
from lark import Lark, inline_args, Transformer, InlineTransformer
from lark.tree import pydot__tree_to_png

class Calc():

    class Transformer(InlineTransformer):
        pass

        # def func(self, *args):
        #     print(sys._getframe().f_code.co_name)
        #     # print(args)

        # def funcname(self, *args):
        #     print(sys._getframe().f_code.co_name)
        #     print(args)

    def __init__(self):
        grammar = r"""
            start: equation*
            equation: expression relop expression SEMI
            expression: multiplyingexpression ((PLUS | MINUS) multiplyingexpression)*
            ?multiplyingexpression: powexpression ((TIMES | DIV) powexpression)*
            powexpression: signedatom (POW signedatom)*
            signedatom: PLUS signedatom
               | MINUS signedatom
               | func
               | atom
            atom: scientific
               | variable
               | constant
               | LPAREN expression RPAREN
            scientific: SCIENTIFIC_NUMBER
            constant: PI
               | EULER
               | I
            variable: VARIABLE
            func: funcname LPAREN expression (COMMA expression)* RPAREN
            funcname: COS
               | TAN
               | SIN
               | ACOS
               | ATAN
               | ASIN
               | LOG
               | LN
               | SQRT
            relop: EQ
               | GT
               | LT
            COS: "cos"
            SIN: "sin"
            TAN: "tan"
            ACOS: "acos"
            ASIN: "asin"
            ATAN: "atan"
            LN: "ln"
            LOG: "log"
            SQRT: "sqrt"
            LPAREN: "("
            RPAREN: ")"
            PLUS: "+"
            MINUS: "-"
            TIMES: "*"
            DIV: "/"
            GT: ">"
            LT: "<"
            EQ: "="
            COMMA: ","
            POINT: "."
            POW: "^"
            PI: "pi"
            EULER: E2
            I: "i"
            VARIABLE: VALID_ID_START VALID_ID_CHAR*
            VALID_ID_START: "a".."z"|"A".."Z"|"_"
            VALID_ID_CHAR: VALID_ID_START | ("0".."9")
            SCIENTIFIC_NUMBER: NUMBER ((E1 | E2) SIGN? NUMBER)?
            NUMBER: "0".."9" + ("." ("0" .. "9") +)?
            E1: "E"
            E2: "e"
            SIGN: "+" | "-"
            SEMI: ";"
            WS: /[ \r\n\t]/+
            %ignore WS
        """

        transformer = Calc.Transformer()
        self.parser = Lark(
            grammar, start='start', parser='lalr', transformer=transformer
        )
        rules = {
            r.origin for r in self.parser.rules if not r.origin.startswith("__")
        }
        print(rules)

        def patch_transformer(target):
            def method(target, *args):
                print(sys._getframe().f_code.co_name, args)
            target.funcname = types.MethodType(method, target)

        patch_transformer(transformer)
        print(dir(transformer))

    def display(self, text=None):
        if text:
            self.last_text = text

        filename = "__out__.png"
        tree = self.parser.parse(self.last_text)
        pydot__tree_to_png(tree, filename)
        os.system("start " + filename)

    def run(self, text=None):
        if text:
            self.last_text = text
        return self.parser.parse(self.last_text)

if __name__ == "__main__":
    c = Calc()
    tree = c.run("""
        x = cos(0); y = sin(PI)+0.5;
        result = x+y;
    """)
    # print(tree)
    # c.display()

I know the above code is ugly, but I'm trying to figure out how to add automatically debug code to all grammar rules in the Transform object.

When I say debug code I mean simple callbacks that are "nop" operations and they won't modify the structure of the tree

erezsh commented 6 years ago

            grammar, start='start', parser='lalr', transformer=transformer

This is why. Create the instance after you patch up the class.

brupelo commented 6 years ago

How can i get the prefix/postfix/infix versions of a math expression on the previous example?

erezsh commented 6 years ago

be more clear

brupelo commented 6 years ago

Ok, here's the thing, at this point I think I've covered the very basics and I've got a rough idea about all of this. As you know, on the other unclosed glsl thread, the idea is writing a glsl parser but not only that, eventually I'd like to write a glsl minifier and a glsl autoformatter, for the autoformatting the original idea was using clang-format but now I've discovered lark I don't see any reason why I couldn't code my own autoformatter.

Before dealing with the real problem, which is dealing with the quite meaty glsl grammar I'm training a little bit with dummy examples like the above posted and trying to cover all future use-cases... evaluation, transformation, simplification/expansion/optimization, autoformatting. I think the above trivial mathy grammar is a perfect framework to play with... Usually on my real-world project I use this very same technique, I start getting proficient through dummy examples and once is all clear I code the whole thing straightaway non-stopping :)

That said (as you know I like writing, that way i practice my English skills, I kill two birds with one stone ;))... What I meant before was:

How can i get the prefix/postfix/infix versions of a math expression on the previous example?

Is it possible to use the tree generated by lark to print the prefix/postfix/infix version of a math expression the parsed has been fed with? Imagine I fed the parser with this free-form text (A + B) * (C + D), I'd like to know how to print with lark:

(A + B) * (C + D)    (Infix)
* + A B + C D        (Prefix)
A B + C D + *        (Postfix)

Btw, one curiosity I had asked you in a previous message (which I've deleted... you'll see me closing comments all the time if I consider they're not useful ones), do you use StackOverflow very often? I've seen there is the lark-parser tag there but is still very young one with little traffic (just 7 answers last time i've checked)... Asking cos SO can serve as good source of incoming traffic to make Lark more popular.

Also, I'm assuming here this niche about compilers&parsing isn't very small... Usually when it comes to hard-to-grasp subjects, niches tend to be quite small :)

brupelo commented 6 years ago

@erezsh Ok, let me add a little example to my above question, consider this dummy snippet:

from lark import Lark
from lark.tree import Interpreter

class FooInterpreter(Interpreter):

    def _debug(self, child):
        try:
            print(child.data)
        except Exception as e:
            print(child)

    def number(self, child):
        self._debug(child)

    def relop(self, child):
        self._debug(child)

    def semi(self, child):
        self._debug(child)

    def equation(self, child):
        self._debug(child)

    def add(self, child):
        self._debug(child)

    def sub(self, child):
        self._debug(child)

l = Lark("""
    ?start: sum
          | NAME "=" sum    -> assign_var

    ?sum: product
        | sum "+" product   -> add
        | sum "-" product   -> sub

    ?product: atom
        | product "*" atom  -> mul
        | product "/" atom  -> div

    ?atom: NUMBER           -> number
         | "-" atom         -> neg
         | NAME             -> var
         | "(" sum ")"

    %import common.CNAME -> NAME
    %import common.NUMBER
    %import common.WS_INLINE

    %ignore WS_INLINE
""", start='start', parser='lalr')

tree = l.parse("""(A + B) * (C + D)""")

# (A + B) * (C + D)    (Infix)
# * + A B + C D        (Prefix)
# A B + C D + *        (Postfix)

print(tree.pretty())
print('-' * 80)
FooInterpreter().visit_children(tree)

How can I modify it to so FooInterpreter will be able to make preorder & postorder tree traversal. Also, how do you use Interpreter as it is to traverse the whole tree? I guess I need to loop through node's children but I'm not sure how :/

Right now the output is:

mul
  add
    var A
    var B
  add
    var C
    var D

--------------------------------------------------------------------------------
add
add

Right now, the above code is just visiting these 2 nodes:

alt text

Thanks in advance.

brupelo commented 6 years ago

@erezsh Hi, I've noticed for some reason you stopped answering my newbie questions, any particular reason? Are they unclear ones? Very long ones? Difficult to understand what I'm asking? Don't know the answer? Lack of time or interest to answer? Other :) ?

Please let me know, that way I can improve the type of questions from now on or just avoid certain topics, that way I'll save my time asking and I'll just research them by myself, although as I said one of the reason about asking is practicing&improving my poor English skills :)

On the other hand, thanks you didn't solve my last questions I've been force to learn a little bit more about this passionate world of parse&compilers: tree traversal algorithms, compiler techniques like SDT/SDD, ... cool stuff. Even I've started reading my arcaic compiler notes to refresh the very basics :)

One advice, you're the real expert using your library so I'd recommend you use stackoverflow to promote your library as much as possible, SO is a really good way to increase the traffic so more devs will give it a shot.

For instance, this question... you can see that question didn't contain the lark-parser tag but by just providing a simple answer I've put an outbound link to lark, which means more traffic coming here (even if little as that question won't become very popular ;))

erezsh commented 6 years ago

Lack of time or interest to answer?

Yes, somewhat.

I've been force to learn a little bit more about this passionate world of parse&compilers

Also part of the reason.

I think in the last month, you've asked more questions than all the other users combined. Also, many of the questions were just about Python or coding in general, and not Lark. And many of the questions about Lark already had the answer in the docs and tutorials. I don't mind answering, but it takes time and effort than I can better spend on other things, such as improving Lark. Please try to do your "homework" and spend some time on solving the problem before asking here. And when you ask, try to focus on the specific line or concept that are the issue, instead of posting a 200 line code paste and asking "what isn't it working?" I hope you won't take offense, and that you keep asking questions and participating in Lark, under these guidelines.

SO is a really good way to increase the traffic so more devs will give it a shot.

Yes, but limited time, etc. If you find a question there that seems very relevant, feel free to ping me and I'll check it out.

erezsh commented 6 years ago

Sorry if my wording was harsh! I tried to be honest, and explain my point of view. But I know I can be too honest sometimes.

I hope it won't deter you from participating and asking questions in the future.

brupelo commented 6 years ago

Of course not, no worries, it's fine and that's what I was asking for in the first place, I like honesty even if wording is harsh as I don't like wasting time myself neither, making questions is also time consuming and I already got quite a few projects on my plate to spend time with, the reason about so many question the last few days was because Lark and the whole compilers subject catched my attention the last few days.

I do think Lark is a very promising/useful python project and I've seen very interesting parts/ideas on the code, others not so much who needs to be refactored. I've always wanted to catch up a little bit with compiler design theory but never found the time as the main hobbies I invest time on are physics/maths/3d/pbrt/software_engineering... In fact, I've got on my bookshelf couple of books about compiler design but never started to read them beucase of lack of time, they've got all dusty over the last years :D. I'll be following this lightweight course before reading any advanced book/paper, I do really like watching these online courses and I'd like to catch up a bit more with the whole thing.

About wording being harsh, let me elaborate a bit here. Thing is, if I'm not getting answers to the questions I'm doing I like to know mainly the reasons why. If they're bad questions I'll be able to improve and make better ones next time and if they're good interesting ones I'll also like to know the reasons why they won't get any answers neither, so I'll know what to expect or not from the people/community I'm asking to. I do like people making constructive criticisim about my questions, I really do. It's a good thing taking criticism to learn from and that's what I always try to do (If I think they're good points, logically speaking). I extrapolate this proactive attitude to not only questions/code but also in general, that's what learning is all about.

I also like being able to review myself stuff but only if I know for sure people I'm reviewing won't take it personal... if I know they do I'll just shut up my mouth. Why? I've seen already countless times of very self-centered coders taking reviewing personal, I belive that's a bad attitude and makes projects getting stuck, of course, I won't argue about it.

But that attitude happens mostly with young unexperienced coders. For instance, take a look to the answer from Kay here, specially the part when he says "getting a critical review like that really made my day. Very exciting. I often wished I would get more of that.". That statement and also the type of project nuitka is proves to me few things: 1) he's a coder who's able to think out of the box 2) not only talented but he also must have quite a lot of experience in the coding world and in the IT business, as that's the type of reaction of someone quite mature one. Accepting constructive criticisim is a good thing and proves wisdom (IMHO).

On the other hand, if I'm investing my time on something but I feel people is not interested at all that will demotivate me. I mean, usually the main driver for me when doing and participating on something is mainly learning new cool stuff myself, that's the main reason but the fact I know I'll be helping other people with my little contributions is also a good feeling. For instance, if I ask something on StackOverflow but I don't get any upvote, comment, reviewing, comments... and days later I've been able to come up with a good answer myself, I won't probably find any reason that motivates me to spend my time writing a good answer on that site... so I'll probably delete my question and move on investing my time with other stuff.

I hope it won't deter you from participating and asking questions in the future.

No, it won't but I'll be less active about it here on github at least. I'll be continue tracking how the project evolves and I'll continue using it and learning more about the code, answering on StackOverflow #lark-parser tag and I'll be hanging around on freenode new #lark channel I've created in any case.

About this last one, would you mind putting somewhere in the main README.md some sort of announcement saying there is #lark channel on freenode, it'd be cool meet with people who likes using IRC. For instance, #pandas got 17 people and #numpy 29 atm... That's quite a lot of people :)

Sorry whether this answer got a bit too long but as I've said already couple of times, I like writing :) , even if my English is creepy.

Yours, B

erezsh commented 6 years ago

Good to hear. I enjoy answering questions when they are well-researched or already reduced to the core of the problem. Yours were some yes, some no, and I was a little overwhelmed. I'm glad you took it positively.

Feel free to review, criticize, and make suggestions to Lark. That's always good to have.

Regarding a #lark channel, no problem. I was going to add it, I just wanted to provide a link to a web client along with it, and didn't have time to do the "research" for which is the best one. But I have to say, I think the channel will be empty. Lark isn't nearly as popular (or widely applicable) as numpy or even pandas.

my English is creepy.

Your English is pretty good, but I think you meant crappy, not creepy :laughing:

lark-parser / lark

Newbie questions #157