thedeepengine / full-metal-weaviate

High level wrapper for Weaviate: reduce query and load boilerplate
3 stars 0 forks source link

Suggestions for improved pyparsing parser #1

Open ptmcg opened 4 days ago

ptmcg commented 4 days ago

Happy to see that pyparsing is helping in developing your expression parsing! Here are a few suggestions.

  1. It is not necessary when using infixNotation to include the parenthesized grouping as part of the input operand, infixNotation will define this for you. So you can simplify your filter compiler thusly:
        # lpar, rpar = map(Literal, "()")
        # expr = Forward()
        # g = Group(lpar + expr + rpar)
        # atom = condition | Group(lpar + expr + rpar).setParseAction(lambda t: t[1])
        expr = infixNotation(condition, [
            ('&', 2, opAssoc.LEFT, lambda t: {'and': t[0][::2]}),
            ('|', 2, opAssoc.LEFT, lambda t: {'or': t[0][::2]})
        ])

Btw, I like your lambdas for embedded parse actions, and good catch on using the [::2] notation to get all operands in each & or | grouping. (Oh, and note that all the non-PEP8 naming like "setName" and "infixNotation" has been updated to PEP8 names like "set_name" and "infix_notation" - in a future pyparsing version the non-PEP8 names will start raising DeprecationWarnings, and later on will be dropped altogether.)

1a. I'm not sure if "not" is supported, but if so you should be able to add it using something like this (pick another unary not operator if you don't like "!" - I've used "~" for example, since it is less likely to be confused with "|")

        expr = infixNotation(condition, [
            ('!', 1, opAssoc.RIGHT, lambda t: {'not': t[0][1]}),
            ('&', 2, opAssoc.LEFT, lambda t: {'and': t[0][::2]}),
            ('|', 2, opAssoc.LEFT, lambda t: {'or': t[0][::2]})
        ])
  1. I created a railroad diagram for your code by adding some calls to set_name() and pyparsing.autoname_elements() in each of the functions that define parts of the parsers, and then at the end adding:
get_filter_compiler().create_diagram("weaviate_filter_compiler.html")

resulting in: Screenshot_10-11-2024_212045_localhost

You'll need to install a couple extra python modules to get railroad diagramming to work, jinja2 and railroad-diagrams.

I could not create a railroad diagram for the return field compiler, as there is some infinite recursion issue with this expression that the diagram generation code doesn't handle (I'll open a pyparsing issue around this, as we should be able to handle any valid pyparsing expression that is not left-recursive).

  1. Check out the run_tests() method on your generated parsers, I wrote this to make it easier to, well, run sample tests. See how it is used in some of the code in the pyparsing examples directory.

  2. Lastly, I recommend to anyone who is going to write a parser, whether they are using pyparsing or some other parsing package, to start by writing a simple BNF for your grammar. It needn't be formally rigorous, but enough so that you can mentally walk through some of your test case expressions before actually writing a lot of code, and then finding you've backed yourself into a corner. The BNF will be a sort of pseudocode for your parser, and in many cases, converting to pyparsing is then a pretty direct process. Once you've done this, add the BNF as a module level comment where you have your parser code, so that other devs (or even just yourself 6 months from now) will have that as a kind of roadmap into your code.

thedeepengine commented 4 days ago

Hey @ptmcg,

You just made my day! =)

I’m really happy you took the time to look at my work. I didn’t expect that at all.

I’ll push the changes you proposed and try Railroad.

I’m fairly new to grammar parsing and I’m starting to see how powerful it is. So I'll definitely take more time trying to master that skill. It's mind boggling when starting to notice similarities and creating categories of languages and data structures that can be thought through grammar parsing properties.

I want to be able to do that quickly. This is fun.

Thanks Paul.