10-60X slower compared to java regexes in some cases

brian-brazil commented 9 years ago

Original report: https://groups.google.com/forum/#!topic/re2j-discuss/8c3L06m6wbY

pattern is : ".d|e|cart|jinjian|kk." String is: "aajinjianaksdjflaajinjianaksdjflaajinjianaksdjfl"

Java takes: 330ms re2j takes: 4257ms

Thanks for your time to take a look at it.

This came up also on https://github.com/prometheus/jmx_exporter/issues/23 @bbaja42

alandonovan commented 9 years ago

I can replicate the problem, and spent the best part of a day trying to fix it, but struggled to make sense of the output of a number of low-level Java profiling tools. I'll dump my state here in case others want to pick it up.

Various attempts at profiling (SIQUIT, hprof, Yourkit, /profilez) show that Machine.step and Machine.add are the bottleneck, is as to be expected.

            at com.google.re2j.Machine.step(Machine.java:270)
            at com.google.re2j.Machine.match(Machine.java:235)
            at com.google.re2j.RE2.doExecute(RE2.java:244)
            at com.google.re2j.RE2.match(RE2.java:284)
            at com.google.re2j.Matcher.genMatch(Matcher.java:287)
            at com.google.re2j.Matcher.matches(Matcher.java:233)
            at com.google.re2j.Pattern.matches(Pattern.java:151)
            at com.google.re2j.A.main(A.java:29)

No particular statement seems to be to blame, but the profilers all show the effects of skewing due to unwinding the stack at GC safepoints, which typically means loop back-edges. I was unable to build the "honest-profiler" for OpenJDK: https://github.com/RichardWarburton/honest-profiler

The program allocates very little memory (80MB in 10 seconds), so I don't believe GC is a factor. It's almost pure computation on an already-allocated data structure.

  Pattern re = Pattern.compile(".*(d|e|cart|jinjian|.*)");

  // Typical times:
  // 31933ns/call (N=1e6)
  // 36231ns/call (N=1e6)
  void benchmark(int N) {
    long t0 = System.nanoTime();
    for(int i = 0 ; i < N ; i++) {
      re.matches("aajinjianaksdjflaajinjianaksdjflaajinjianaksdjfl");
    }
    long t1 = System.nanoTime();
    if (N > 0) {
      System.out.println(((t1-t0) / N) + "ns/call (N=" + N + ")");
    }
  }

and the Go code was:

// Typical times: 2.149µs/call
package main

import (
        "fmt"
        "regexp"
        "time"
)

func main() {
        re := regexp.MustCompile(".*(d|e|cart|jinjian|.*)")
        t0 := time.Now()
        const N = 1e6
        for i := 0; i < N; i++ {
                re.MatchString("aajinjianaksdjflaajinjianaksdjflaajinjianaksdjfl")
        }
        fmt.Printf("%s/call\n", time.Since(t0)/N)
}

jpaulynice commented 8 years ago

I'm wondering the same thing as well and was surprised...the Java regex is significantly faster...5x. I have 3 patterns that are compiled then cached....then search for a match on all of them:

--re2j before caching: ^.*thingToMatch.*$ Found 1 matches in 26 ms

--java before caching: ^.*thingToMatch.*$ Found 1 matches in 5 ms

--re2j with cache: ^.*thingToMatch.*$ Found 1 matches in 6 ms

shevek commented 5 years ago

Use this: http://www.brendangregg.com/perf.html#FlameGraphs

git clone https://github.com/brendangregg/FlameGraph
git clone https://github.com/jvm-profiling-tools/perf-map-agent
cd perf-map-agent
cmake && make
PID=$(jps -l | grep whatever)
FLAMEGRAPH_DIR=../FlameGraph/ PERF_RECORD_SECONDS=120 ./bin/perf-java-flames $PID

Then you will have safepoint-free performance profile.

You will probably also want to set -XX:+PreserveFramePointer -XX:MaxInlineSize=0 on the VM running the tests, or C2 will inline things and cause the flame graph to lie to you. This will lose you to 10%-20% "real" performance, but that isn't the point.

Using GetStackTraceAsync (Honest Profiler, etc) is a nice idea, but it's still more intrusive than perf.

dagnelies commented 5 years ago

I was looking for a faster regex package ...but I can confirm this one is much slower than plain java.util.regex

Also, it is not fully compatible with java's character classes like \p{Punct}.

arnaudroger commented 5 years ago

@dagnelies I did a few modification that did bring quite a big perfomance boost here https://github.com/arnaudroger/re2j/tree/jug there are quite a few changes so have not come to do a PR yet. it ends up being 5 to 8 times faster on my benchmark - if you don't use capture - the algo used in re2j will always be slower at the median, but at the 99 percentile, I had cases where java regex ran into minutes while re2j was behaving itself.

google / re2j

10-60X slower compared to java regexes in some cases #12