kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
805 stars 95 forks source link

Recursion help. #198

Closed xlgmokha closed 5 years ago

xlgmokha commented 5 years ago

According to https://kschiess.github.io/parslet/overview.html

Left recursion is impossible to express in these grammars.

I am trying to write a parser for the SCIM 2.0 filter syntax described here.

The parser that I have so far looks like:

      class Filter < Parslet::Parser
        root :filter

        # FILTER = attrExp / logExp / valuePath / *1"not" "(" FILTER ")"
        rule(:filter) do
          (attribute_expression | logical_expression | value_path) |
            (not_op >> lparen >> filter >> rparen)
        end

        # valuePath = attrPath "[" valFilter "]" ; FILTER uses sub-attributes of a parent attrPath
        rule(:value_path) do
          attribute_path >> lbracket >> value_filter >> rbracket
        end

        # valFilter = attrExp / logExp / *1"not" "(" valFilter ")"
        rule(:value_filter) do
          attribute_expression |
            logical_expression |
            (not_op >> lparen >> value_filter >> rparen)
        end

        # attrExp = (attrPath SP "pr") / (attrPath SP compareOp SP compValue)
        rule(:attribute_expression) do
          (attribute_path >> space >> presence) |
            (attribute_path >> space >> comparison_operator >> space >> comparison_value)
        end

        # logExp = FILTER SP ("and" / "or") SP FILTER
        rule(:logical_expression) do
          filter >> space >> (and_op | or_op) >> space >> filter
        end

        # compValue = false / null / true / number / string ; rules from JSON (RFC 7159)
        rule(:comparison_value) do
          falsey | null | truthy | number | string
        end

        # compareOp = "eq" / "ne" / "co" / "sw" / "ew" / "gt" / "lt" / "ge" / "le"
        rule(:comparison_operator) do
          equal | not_equal | contains | starts_with | ends_with |
            greater_than | less_than | less_than_equals | greater_than_equals
        end

        # attrPath = [URI ":"] ATTRNAME *1subAttr ; SCIM attribute name ; URI is SCIM "schema" URI
        rule(:attribute_path) do
          (uri >> colon).repeat(0, 1) >> attribute_name >> sub_attribute.repeat(0, 1)
        end

        # ATTRNAME  = ALPHA *(nameChar)
        rule(:attribute_name) do
          alpha >> name_character.repeat(0, nil)
        end

        # nameChar = "-" / "_" / DIGIT / ALPHA
        rule(:name_character) { hyphen | underscore | digit | alpha }

        # subAttr = "." ATTRNAME ; a sub-attribute of a complex attribute
        rule(:sub_attribute) { dot >> attribute_name }

        # uri = 1*ALPHA 1*(":" 1*ALPHA)
        rule(:uri) do
          # alpha.repeat(1, nil) >> (colon >> (alpha.repeat(1, nil) | version)).repeat(1, nil)
          str('urn:ietf:params:scim:schemas:') >> (
            str('core:2.0:User') |
            str('core:2.0:Group') | (
              str('extension') >>
              colon >>
              alpha.repeat(1) >>
              colon >>
              version >>
              colon >>
              alpha.repeat(1)
            )
          )
        end
        rule(:presence) { str('pr') }
        rule(:and_op) { str('and') }
        rule(:or_op) { str('or') }
        rule(:not_op) { str('not').repeat(0, 1) }
        rule(:falsey) { str('false') }
        rule(:truthy) { str('true') }
        rule(:null) { str('null') }
        rule(:number) do
          str('-').maybe >> (
            str('0') | (match('[1-9]') >> digit.repeat)
          ) >> (
            str('.') >> digit.repeat(1)
          ).maybe >> (
            match('[eE]') >> (str('+') | str('-')).maybe >> digit.repeat(1)
          ).maybe
        end
        rule(:equal) { str('eq') }
        rule(:not_equal) { str('ne') }
        rule(:contains) { str('co') }
        rule(:starts_with) { str('sw') }
        rule(:ends_with) { str('ew') }
        rule(:greater_than) { str('gt') }
        rule(:less_than) { str('lt') }
        rule(:greater_than_equals) { str('ge') }
        rule(:less_than_equals) { str('le') }
        rule(:string) do
          quote >> (str('\\') >> any | str('"').absent? >> any).repeat >> quote
        end
        rule(:lparen) { str('(') }
        rule(:rparen) { str(')') }
        rule(:lbracket) { str('[') }
        rule(:rbracket) { str(']') }
        rule(:digit) { match('\d') }
        rule(:quote) { str('"') }
        rule(:single_quote) { str("'") }
        rule(:space) { match('\s') }
        rule(:space?) { space.maybe }
        rule(:alpha) { match['a-zA-Z'] }
        rule(:dot) { str('.') }
        rule(:colon) { str(':') }
        rule(:hyphen) { str('-') }
        rule(:underscore) { str('_') }
        rule(:version) { digit >> dot >> digit }
      end

link

I have been able to get some very simple parsing working but I seem to be stuck on the following spec.

  specify { expect(subject.value_filter).to parse('type eq "work" and value co "@example.com"') }

The above test produces the following error:

|| Scim::Kit::V2::Filter
||   should parse "type eq \"work\" and value co \"@example.com\"" (FAILED - 1)
|| 
|| Failures:
|| 
||   1) Scim::Kit::V2::Filter should parse "type eq \"work\" and value co \"@example.com\""
||      Failure/Error: specify { expect(subject.value_filter).to parse('type eq "work" and value co "@example.com"') }
||      
||      SystemStackError:
||        stack level too deep
||      # ./spec/scim/kit/v2/filter_spec.rb:57:in `block (2 levels) in <top (required)>'
|| 
|| Finished in 0.00951 seconds (files took 0.55388 seconds to load)
|| 1 example, 1 failure
|| 
|| Failed examples:
|| 
|| rspec ./spec/scim/kit/v2/filter_spec.rb:57 # Scim::Kit::V2::Filter should parse "type eq \"work\" and value co \"@example.com\""

I'm not sure if the ABNF described above is an example of left recursion and if perhaps Parslet is the wrong tool for this problem.

I apologize in advance if this is the wrong forum for this question. Any guidance would be greatly appreciated. 🙏

alexagranov commented 5 years ago

I would go to StackOverflow for help, maintainers of open source software do not generally view questions disguised as issues logged on their software in the most favorable light.

http://kschiess.github.io/parslet/tricks.html