sirthias / parboiled2

A macro-based PEG parser generator for Scala 2.10+
Other
717 stars 86 forks source link

high CPU usage on CharPredicate.Digit #518

Open yanns opened 11 months ago

yanns commented 11 months ago

When updating from JVM 19 to JVM 21, we notice very high CPU usage.

A profiling reveals a method using a simple rule:

def Digits = rule(oneOrMore(Digit))

Screenshot 2023-12-19 at 17 30 38

I don't know if anyone has already seen this behavior before, and/or has a clue where to look at.

sirthias commented 11 months ago

Is it the same bytecode that runs fine on 19 but "hangs" in 21?

yanns commented 11 months ago

Is it the same bytecode that runs fine on 19 but "hangs" in 21?

not exactly as we changed ThisBuild / scalacOptions ++= Seq("-release", "11") to ThisBuild / scalacOptions ++= Seq("-release", "21"). I can try to change this to see if it has an impact.

yanns commented 11 months ago

I could reproduce this issue with the same bytecode (release 11) deployed on a JVM 21.

Screenshot 2023-12-20 at 15 31 59

edit: it's very sporadic and happens only on a few instances sometimes.

sirthias commented 11 months ago

Hmm... if it's not deterministically exhibiting this behavior on one particular machine (setup) then I wonder how two different runs might differ from each other. Are you somehow sharing the parser from different threads or are you 100% sure that it's only ever a single thread running a particular parser instance? Could it be that the parser is running against partial input and the error might be triggered by hitting package boundaries at odd locations?

yanns commented 11 months ago

Are you somehow sharing the parser from different threads or are you 100% sure that it's only ever a single thread running a particular parser instance?

There's one new instance per input, and also per thread

Could it be that the parser is running against partial input and the error might be triggered by hitting package boundaries at odd locations?

The input is coming from user, so it can be that it's partial. I've looked at the queries that were successfully parsed before the instance got stuck, and they seem to be all valid. But maybe one invalid is never logged out. But it's a great point.

sirthias commented 11 months ago

Can you show what your WhiteSpace and Digits rules look like? Maybe they are written in a way that also consumes the EOI virtual char after the input? That virtual character cannot be consumed and is sometimes the cause of an infinite loop, e.g. when you try to consume all the chars that are not a digit or sth like that...

yanns commented 11 months ago
def Digits = rule(oneOrMore(Digit)) // from CharPredicate.Digit
def WhiteSpace = rule(quiet(zeroOrMore(WhiteSpaceChar)))
val WhiteSpaceChar = CharPredicate(" \n\r\t\f")
sirthias commented 11 months ago

Hmm... this really is curious. I can't see how this could ever cause an infinite loop. What input are you reading from? Byte Arrays or Strings?

In order to debug further it's be great to get a hold of a pathological input and try to reproduce the problem under the microscope...

yanns commented 11 months ago

We're parsing Strings.

I'm trying to get more data. I can reproduce the issue only on production only on some instances. I could find some input strings where the parsing takes long time.

But I cannot reproduce the issue locally, even with those inputs. I'll keep up informed. Thanks for the help! If you have any other idea on where I could look at, please don't hesitate to share with me.

yanns commented 7 months ago

Some update: this issue also occurs with JVM 19 (maybe less frequently, but I cannot be sure here). -> It's not related to the JVM version.