Open brian-brazil opened 9 years ago
I can replicate the problem, and spent the best part of a day trying to fix it, but struggled to make sense of the output of a number of low-level Java profiling tools. I'll dump my state here in case others want to pick it up.
Various attempts at profiling (SIQUIT, hprof, Yourkit, /profilez) show that Machine.step and Machine.add are the bottleneck, is as to be expected.
at com.google.re2j.Machine.step(Machine.java:270)
at com.google.re2j.Machine.match(Machine.java:235)
at com.google.re2j.RE2.doExecute(RE2.java:244)
at com.google.re2j.RE2.match(RE2.java:284)
at com.google.re2j.Matcher.genMatch(Matcher.java:287)
at com.google.re2j.Matcher.matches(Matcher.java:233)
at com.google.re2j.Pattern.matches(Pattern.java:151)
at com.google.re2j.A.main(A.java:29)
No particular statement seems to be to blame, but the profilers all show the effects of skewing due to unwinding the stack at GC safepoints, which typically means loop back-edges. I was unable to build the "honest-profiler" for OpenJDK: https://github.com/RichardWarburton/honest-profiler
The program allocates very little memory (80MB in 10 seconds), so I don't believe GC is a factor. It's almost pure computation on an already-allocated data structure.
Pattern re = Pattern.compile(".*(d|e|cart|jinjian|.*)");
// Typical times:
// 31933ns/call (N=1e6)
// 36231ns/call (N=1e6)
void benchmark(int N) {
long t0 = System.nanoTime();
for(int i = 0 ; i < N ; i++) {
re.matches("aajinjianaksdjflaajinjianaksdjflaajinjianaksdjfl");
}
long t1 = System.nanoTime();
if (N > 0) {
System.out.println(((t1-t0) / N) + "ns/call (N=" + N + ")");
}
}
and the Go code was:
// Typical times: 2.149µs/call
package main
import (
"fmt"
"regexp"
"time"
)
func main() {
re := regexp.MustCompile(".*(d|e|cart|jinjian|.*)")
t0 := time.Now()
const N = 1e6
for i := 0; i < N; i++ {
re.MatchString("aajinjianaksdjflaajinjianaksdjflaajinjianaksdjfl")
}
fmt.Printf("%s/call\n", time.Since(t0)/N)
}
I'm wondering the same thing as well and was surprised...the Java regex is significantly faster...5x. I have 3 patterns that are compiled then cached....then search for a match on all of them:
--re2j before caching:
^.*thingToMatch.*$
Found 1 matches in 26 ms
--java before caching:
^.*thingToMatch.*$
Found 1 matches in 5 ms
--re2j with cache:
^.*thingToMatch.*$
Found 1 matches in 6 ms
Use this: http://www.brendangregg.com/perf.html#FlameGraphs
git clone https://github.com/brendangregg/FlameGraph
git clone https://github.com/jvm-profiling-tools/perf-map-agent
cd perf-map-agent
cmake && make
PID=$(jps -l | grep whatever)
FLAMEGRAPH_DIR=../FlameGraph/ PERF_RECORD_SECONDS=120 ./bin/perf-java-flames $PID
Then you will have safepoint-free performance profile.
You will probably also want to set -XX:+PreserveFramePointer -XX:MaxInlineSize=0
on the VM running the tests, or C2 will inline things and cause the flame graph to lie to you. This will lose you to 10%-20% "real" performance, but that isn't the point.
Using GetStackTraceAsync (Honest Profiler, etc) is a nice idea, but it's still more intrusive than perf.
I was looking for a faster regex package ...but I can confirm this one is much slower than plain java.util.regex
Also, it is not fully compatible with java's character classes like \p{Punct}
.
@dagnelies I did a few modification that did bring quite a big perfomance boost here https://github.com/arnaudroger/re2j/tree/jug there are quite a few changes so have not come to do a PR yet. it ends up being 5 to 8 times faster on my benchmark - if you don't use capture - the algo used in re2j will always be slower at the median, but at the 99 percentile, I had cases where java regex ran into minutes while re2j was behaving itself.
Original report: https://groups.google.com/forum/#!topic/re2j-discuss/8c3L06m6wbY
pattern is : ".d|e|cart|jinjian|kk." String is: "aajinjianaksdjflaajinjianaksdjflaajinjianaksdjfl"
Java takes: 330ms re2j takes: 4257ms
Thanks for your time to take a look at it.
This came up also on https://github.com/prometheus/jmx_exporter/issues/23 @bbaja42