sirthias / parboiled2

A macro-based PEG parser generator for Scala 2.10+
Other
717 stars 86 forks source link

Unexpected ClassCastException with "zeroOrMore" reduction rules #206

Closed ljleb closed 5 years ago

ljleb commented 5 years ago

I am trying to parse my own language with parboiled.

This line from a test for the parser implementation:

'': next.toString(in: in) in

fails to be parsed because this rule always throws a ClassCastException:

((TypeRule() ~ PartialObjectLiteral()) | ReferenceRule()) ~ (VolatileObjectLiteral() ~ AccessRule() | AccessRule()).+ ~ VolatileObjectLiteral()

when the parser successfully matches it.

Here is the log:

java.lang.ClassCastException: class com.github.louisjl.shared.syntax.volatile.LiteralVolatileExpression cannot be cast to class com.github.louisjl.shared.syntax.PartialExpression (com.github.louisjl.shared.syntax.volatile.LiteralVolatileExpression and com.github.louisjl.shared.syntax.PartialExpression are in unnamed module of loader 'app')
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$VolatileObjectLiteral$4(ExoParser.scala:81)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$VolatileObjectForString$1(ExoParser.scala:219)
    at com.github.louisjl.shared.parser.ExoParser.rec$8(ExoParser.scala:213)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$StringRule$1(ExoParser.scala:212)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$2(ExoParser.scala:103)
    at com.github.louisjl.shared.parser.ExoParser$CustomRule.$anonfun$$qmark$1(ExoParser.scala:336)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$4(ExoParser.scala:114)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$5(ExoParser.scala:123)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$6(ExoParser.scala:128)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectRule$1(ExoParser.scala:49)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$3(ExoParser.scala:109)
    at com.github.louisjl.shared.parser.ExoParser$CustomRule.rec$20(ExoParser.scala:328)
    at com.github.louisjl.shared.parser.ExoParser$CustomRule.$anonfun$$plus$1(ExoParser.scala:327)
    at com.github.louisjl.shared.parser.ExoParser$CustomRule.$anonfun$$times$1(ExoParser.scala:320)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$4(ExoParser.scala:114)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$5(ExoParser.scala:123)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$6(ExoParser.scala:128)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectRule$1(ExoParser.scala:49)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$3(ExoParser.scala:109)
    at com.github.louisjl.shared.parser.ExoParser$CustomRule.rec$20(ExoParser.scala:328)
    at com.github.louisjl.shared.parser.ExoParser$CustomRule.$anonfun$$plus$1(ExoParser.scala:327)
    at com.github.louisjl.shared.parser.ExoParser$CustomRule.$anonfun$$times$1(ExoParser.scala:320)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$4(ExoParser.scala:114)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$5(ExoParser.scala:123)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$6(ExoParser.scala:128)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectRule$1(ExoParser.scala:49)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$3(ExoParser.scala:109)
    at com.github.louisjl.shared.parser.ExoParser$CustomRule.rec$20(ExoParser.scala:328)
    at com.github.louisjl.shared.parser.ExoParser$CustomRule.$anonfun$$plus$1(ExoParser.scala:327)
    at com.github.louisjl.shared.parser.ExoParser$CustomRule.$anonfun$$times$1(ExoParser.scala:320)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$4(ExoParser.scala:114)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$5(ExoParser.scala:123)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectLiteral$6(ExoParser.scala:128)
    at com.github.louisjl.shared.parser.ExoParser.$anonfun$PartialObjectRule$1(ExoParser.scala:49)
    at com.github.louisjl.shared.parser.ExoParser.PartialMain(ExoParser.scala:43)
    at com.github.louisjl.shared.parser.ExoParser$.$anonfun$parse$1(ExoParser.scala:16)
    at org.parboiled2.Parser.runRule$1(Parser.scala:143)
    at org.parboiled2.Parser.phase0_initialRun$1(Parser.scala:151)
    at org.parboiled2.Parser.__run(Parser.scala:209)
    at com.github.louisjl.shared.parser.ExoParser$.parse(ExoParser.scala:16)
    at com.github.louisjl.shared.parser.ExoParser$.parseAsVolatile(ExoParser.scala:26)
    at com.github.louisjl.shared.syntax.PartialReferenceToVolatileSpec.$anonfun$new$2(PartialReferenceToVolatileSpec.scala:34)
        at ... [scalatest trace]

I will try to reproduce using a minimal code base soon: there is a lot of stuff in my project that is not required for the exception to show up. This might help for now.

ljleb commented 5 years ago

Okay, so I successfully reproduced on 2.1.7:

object App {
    def main(args: Array[String]): Unit = {
        println(new ABParser("ab").Main.run().get)
    }
}

import org.parboiled2._
import shapeless._

trait A

trait B

case class B2A(b: B) extends A

case class A2B(a: A) extends B

class ABParser(override val input: ParserInput) extends Parser {
    def Main: Rule1[A] = rule {
        BRule ~ BReduction ~ B2ARule
    }

    def BRule: Rule1[B] = rule {
        push(new B {})
    }

    def BReduction: Rule[B :: HNil, B :: HNil] = rule {
        ((B2ACapture | B2ARule) ~ A2BCapture).*
    }

    def B2ARule: Rule[B :: HNil, A :: HNil] = rule {
        MATCH ~> B2A
    }

    def B2ACapture: Rule[B :: HNil, A :: HNil] = rule {
        str("a") ~> B2A
    }

    def A2BCapture: Rule[A :: HNil, B :: HNil] = rule {
        str("b") ~> A2B
    }
}

Executing the above app crashes (on parboiled2-2.1.7 with SBT):

Exception in thread "main" java.lang.ClassCastException: B2A cannot be cast to B
    at ClassCastParser.B2ARule(ClassCastParser.scala:25)
    at ClassCastParser.Main(ClassCastParser.scala:14)
    at App$.$anonfun$main$1(App.scala:3)
    at org.parboiled2.Parser.runRule$1(Parser.scala:143)
    at org.parboiled2.Parser.phase0_initialRun$1(Parser.scala:151)
    at org.parboiled2.Parser.__run(Parser.scala:209)
    at App$.main(App.scala:3)
    at App.main(App.scala)

Process finished with exit code 1
ljleb commented 5 years ago

I think the problem comes from the rule BReduction: the program seems to skip the rule A2BCapture at that point (I found this using intellij debugger variables).

@sirthias Is this a known bug?

ljleb commented 5 years ago

I made a repository of the above code here to ease spotting the bug.

sirthias commented 5 years ago

This appears to be a duplicate of this bug: https://github.com/sirthias/parboiled2/issues/171 Here is the mailing list discussion for it: https://groups.google.com/forum/#!topic/parboiled-user/-d7X-LsCQqQ

ljleb commented 5 years ago

Closing as this is indeed a duplicate of #171 despite the fact that a side effect is generating the bug this time.

ljleb commented 5 years ago

@sirthias Sorry to bother again in advance.

I'm not sure if this is in fact the same issue as #171, because changing

def BReduction: Rule[B :: HNil, B :: HNil] = rule {
    ((B2ACapture | B2ARule) ~ A2BCapture).*
}

to

def BReduction: Rule[B :: HNil, B :: HNil] = rule {
    (B2ACapture ~ A2BCapture | B2ARule ~ A2BCapture).*
}

still throws the same exception.

In intellij, the execution follows this order, in debug mode, with "ab" as input:

I must be dumb or something, but I don't get in what this is related to #171. I don't get either what happens behind the scenes that invalidates the value stack in this code snippet. May you please provide a brief explanation for this?

sirthias commented 5 years ago

In the expression

B2ACapture ~ A2BCapture | B2ARule ~ A2BCapture

the | operator has higher precedence than the ~ operator. However, the sub-expression A2BCapture | B2ARule is actually illegal, since it has no legal type! You cannot have a rule that expects either and A or a B at the top of the value stack. You could have a rule that expects an Either[A, B] on top of the value stack, but that's a different thing.

With #171 fixed this shouldn't compile.

ljleb commented 5 years ago

Thanks! Everything makes a lot more sense now.

Overriding the precedence of the operators should then be solvable, in this case, by adding parentheses to the expression -- like the following:

(B2ACapture ~ A2BCapture) | (B2ARule ~ A2BCapture)

There should finally be no problem star-ing this expression, I think.

ljleb commented 5 years ago

I just tried adding parentheses and I still end up with the same exception. In fact, this:

def BReduction: Rule[B :: HNil, B :: HNil] = rule {
    SingleBReduction.*
}

def SingleBReduction: Rule[B :: HNil, B :: HNil] = rule {
    BReductionLeft | BReductionRight
}

def BReductionLeft: Rule[B :: HNil, B :: HNil] = rule {
    B2ACapture ~ A2BCapture
}

def BReductionRight: Rule[B :: HNil, B :: HNil] = rule {
    B2ARule ~ A2BCapture
}

won't work either at runtime, and I don't see the precedence of operations interfering anywhere.

Edit: I tried removing the n-arity of the operation, i.e.

def BReduction: Rule[B :: HNil, B :: HNil] = rule {
    SingleBReduction // no ".*" nor ".+" here
}

This successfully runs. Is there some rule simplification | transformation happening at compile time that I'm not aware of, like inlining?

sirthias commented 5 years ago

Well, your parser does indeed show a bug that hasn't been reported up to now, which is quite curious since it's actually quite severe.

I've just created #207 and a simplified test for it. Hopefully the ticket description is understandable.

The reason you've tripped across this problem is that your parser setup is somewhat unorthodox, with the heavy in-place mutations of the value stack. My guess would be that reduction rules are normally used quite sparingly and only in "simple", more local contexts (as shown in the README), which is why nobody came across this bug up to now.

Anyway, thank you for reporting this!

ljleb commented 5 years ago

Alright! I'm glad I could contribute to the todo value stack. 🤘