Open dmitshur opened 8 years ago
I wrote a quick and dirty benchmark to confirm whether this is still the case:
package main
import (
"regexp"
"runtime"
"testing"
)
var sideEffect interface{}
func BenchmarkCompile(b *testing.B) {
for i := 0; i < b.N; i++ {
re := regexp.MustCompile(`^[0-9]{4}(-[0-9]{2}(-[0-9]{2}([ T][0-9]{2}(:[0-9]{2}){1,2}(.[0-9]{1,6})` +
`?Z?([\+-][0-9]{2}:[0-9]{2})?)?)?)?$`)
runtime.KeepAlive(re)
}
}
...and the difference appears to still be about an order of magnitude:
$ gopherjs test --bench=. --benchtime=30s
goos: linux
goarch: js
BenchmarkCompile 106006 327538 ns/op
$ go test -bench=. -benchtime=30s
goos: linux
goarch: amd64
pkg: repro/016-regexp-compile-perf
cpu: Intel(R) Core(TM) i5-10600KF CPU @ 4.10GHz
BenchmarkCompile-12 2120029 16992 ns/op
I'm not perfectly sure that Go compiler didn't sneak in some uninvited optimization, but generally I can believe this result.
I've looked at a couple of profiles and it seems like a lot of time is spent cloning objects and appending to slices.
This is not quite as bad a performance difference as #276, but I thought it worth bringing up and seeing if this can be improved.
I've looked at
regexp
andregexp/syntax
packages, and it appears they use pure Go code, there is no assembly to optimize it for certain architectures.Given the following regexp:
It takes about 200 µs to compile it using
gc
compiler:But it's closer to 20 ms to compile the same regexp using
gopherjs
compiler:I really don't like regexpes and wish they didn't exist, but unfortunately some Go packages use them (for example,
github.com/microcosm-cc/bluemonday
, /cc @buro9) quite significantly even at package level, meaning any Go package that importsbluemonday
will take an additional 200 ms~ just to initialize (imagine 10 regexpes compiled at package level, if each one takes 20 ms, that's 200 ms).Since
regexp
package is pure Go, hopefully general performance improvements can translate to better results here./cc @neelance @slimsag