seanjensengrey / unladen-swallow

Automatically exported from code.google.com/p/unladen-swallow
Other
0 stars 0 forks source link

Add a regex benchmark suite #14

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Python currently doesn't have a good-quality regex benchmark suite that can
be run automatically, have statistics drawn from it, etc. We need such a
thing before starting work on regex performance.

Possible resources:
- Fredrik Lundh's original benchmarks for SRE:
http://mail.python.org/pipermail/python-dev/2000-August/007797.html
- V8's JS regex benchmarks:
http://v8.googlecode.com/svn/data/benchmarks/v3/regexp.js

Ideally we would do a search of the Python regexes in Google Code Search or
similar corpus and distill some representative set from them. V8's may be
good enough, though.

Original issue reported on code.google.com by collinw on 14 Apr 2009 at 11:37

GoogleCodeExporter commented 9 years ago

Original comment by collinw on 27 May 2009 at 11:07

GoogleCodeExporter commented 9 years ago

Original comment by collinw on 29 May 2009 at 12:09

GoogleCodeExporter commented 9 years ago

Original comment by collinw on 29 May 2009 at 4:13

GoogleCodeExporter commented 9 years ago
r615 adds V8's regex benchmarks to perf.py. I still want to include the 
original SRE
benchmarks mentioned in
http://mail.python.org/pipermail/python-dev/2000-August/007797.html.

Original comment by collinw on 8 Jun 2009 at 8:48

GoogleCodeExporter commented 9 years ago
As of r623, "perf.py -b regex_effbot" runs the regex benchmarks listed in the 
link 
above. Thanks to David Laing for the patch!

There's also now a regex benchmark group that runs both regex_v8 and 
regex_effbot. "perf.py -b regex" will do the trick.

The mailing list thread cited above lists some macrobenchmarks that I'd still 
like to 
include; regex_v8 is based on popularity of JavaScript regexes (which makes me 
a 
little nervous), and regex_effbot is pretty micro. Some real-app 
macrobenchmarks 
would make me more confident.

Original comment by collinw on 21 Jun 2009 at 2:56

GoogleCodeExporter commented 9 years ago
I'd also like a regex_compile benchmark that stresses regex compilation time. 
If we start 
compiling regexes to machine code, we need to know how much that will hurt us.

This could be as simple as importing regex_v8 and regex_effbot, collecting the 
regex 
strings from those modules, then compiling them all several times.

Original comment by collinw on 15 Jul 2009 at 8:41

GoogleCodeExporter commented 9 years ago
Added regex_compile in r782, which captures the regexes used by regex_v8 and 
regex_effbot.

Between regex_compile, regex_effbot and regex_v8, that should be good enough to 
get 
started optimizing the regex engine. I'd still like some macro-level regex 
benchmarks, 
but that's low-priority given what we already have in place.

Original comment by collinw on 28 Jul 2009 at 9:07