h0tk3y / better-parse

A nice parser combinator library for Kotlin
Apache License 2.0
420 stars 42 forks source link

How are delegates selected? #35

Open jeffzoch opened 3 years ago

jeffzoch commented 3 years ago

Im finding that the order I declare my delegates in a parser grammar affects whether or not it parses. I have a grammar like the following:

internal class Parser: Grammar<List<Command>>() {
    internal val comments by regexToken("#.*\n", true)
    internal val str by regexToken("\".*\"")
    internal val queryType by regexToken("[A-Z]+(?:_[A-Z]+)*")
    internal val word by regexToken("[A-Za-z]+")
    internal val LPAR by literalToken("(")
    internal val RPAR by literalToken(")")
    internal val COLON by literalToken(":")
    internal val LBRACE by literalToken("{")
    internal val RBRACE by literalToken("}")
    internal val equals by literalToken("=")
    internal val ws by regexToken("\\s+",true)
    internal val newline by regexToken("[\r\n]+",true)
    internal val comma by literalToken(",")
    internal val param: Parser<ValueMetadata> by (word and -COLON and word) map { (p, t) ->
        ValueMetadata(p.text, Type.valueOf(t.text))
    }
    val params by -LPAR and separatedTerms(param, comma, true) and -RPAR
    val outputs by -LPAR and separatedTerms(param, comma, true) and -RPAR
    val cmdParser by ( -LBRACE  and queryType and -equals and str and -RBRACE )
    val funcParser: Parser<Command> by (word and params and -COLON and params and cmdParser) map {
        (name, inputs, outputs, cmdFunc) ->
        val (type,cmd) = cmdFunc
        Command(name.text,
                inputs,
                outputs,
                cmd.text.subSequence(1, cmd.length - 1).toString(),
                QueryType.valueOf(type.text)
        )
    }

    override val rootParser: Parser<List<Command>> by zeroOrMore(funcParser)
}

thats meant to parse

# Documentation that should be ignored
findFoo(test:String,entity:String):(foo:String,bar:Int) {
  SQL_QUERY = "select foo,bar from baz where z = :name and y = :entity"
}
# Documentation that should be ignored
findBar(test:String,entity:String):(foo:String,bar:Int) {
  SQL_QUERY = "select foo,bar from baz where z = :name and y = :entity"
}

into a list of Commands. By just switching the order of str, queryType, and word the parse will fail / pass on different test cases with errors like Could not parse input: UnparsedRemainder(startsWith=word@2 for "findFoo" at 39 (2:1))

h0tk3y commented 3 years ago

The tokens you declare with delegation are matched in the same order as declared. So if the tokenizing is ambiguous (which is often the case) then the tokens declared earlier are prioritized.

Note also this section in the README:

Note: the tokens order matters in some cases, because the tokenizer tries to match them in exactly this order. For instance, if literalToken("a") is listed before literalToken("aa"), the latter will never be matched. Be careful with keyword tokens! If you match them with regexes, a word boundary \b in the end may help against ambiguity.