Open GoogleCodeExporter opened 9 years ago
You are running your benchmark under cProfile. This will enable tracing and
will
disable the use of JITted code. Could you try again and just produce one time
for
the entire run without using cProfile? Try using time.time().
If you want to give Unladen more of an advantage, try running the benchmark
once and
then timing a second run. This will give the code a chance to "warm up" but
probably
won't be directly relevant to your real-world use unless you want to write a
daemon
that never shuts down.
Original comment by abbeyj
on 6 Jan 2010 at 6:44
Is http://svn.whatwg.org/webapps/index the input data you're using?
Original comment by collinw
on 7 Jan 2010 at 12:39
I whipped up a benchmark based on html5lib that integrates with perf.py,
parsing
the full HTML 5 spec with html5lib. That benchmark indicates that Unladen is
faster
than CPython 2.6.1, but only after the first run.
http://tinyurl.com/ydomq7t is a timeline of iteration times for CPython 2.6 vs
Unladen trunk@head, each parsing the HTML 5 spec 10 times. CPython wins on the
first iteration, but after that, Unladen's performance appears to improve over
time.
Note that this benchmark intentionally isn't doing any priming; I want us to be
penalized appropriately for that first slow run, since this isn't a daemon
process
we're talking about.
I still think it's a problem that we're only a second faster than CPython. I
need to do
some profiling to see why, but the time Unladen spends blocked on compilation
is
certainly a factor.
Original comment by collinw
on 7 Jan 2010 at 7:59
The provided profiling information indicates a problem with the hotness
function:
We have a function with a total runtime of 14.6 sec which is not considered hot:
337750 14.619 0.000 62.482 0.000 html5lib/tokenizer.py:59(__iter__)
Original comment by e...@4geeks.de
on 7 Jan 2010 at 8:09
Aha, this will be an excellent test case for the while-loop hotness patch!
Original comment by collinw
on 7 Jan 2010 at 8:12
Benchmark added in r970.
Original comment by collinw
on 8 Jan 2010 at 12:35
With my patch from http://codereview.appspot.com/167061
### html5lib ###
Min: 15.609000 -> 14.156000: 1.1026x faster
Avg: 16.285900 -> 15.096900: 1.0788x faster
Not significant
Stddev: 1.24404 -> 1.64767: 1.3245x larger
Original comment by abbeyj
on 8 Jan 2010 at 12:53
From the profile:
337750 14.619 0.000 62.482 0.000 html5lib/tokenizer.py:59(__iter__)
Looking at this more closely, I think this function (and functions like it) is
a good
candidate/motivation for pursuing on-stack replacement.
- This is a while-loop heavy generator.
- For a given run of html5lib, this function is only invoked once. The high
call count
comes from counting each .next() call on the generator object.
- Even if while loops are counted as hot (as in abbeyj's patch), this function
won't be
compiled because it's only called once. Despite this, the function still racks
up a high
hotness score of 1641014, more than enough to be compiled.
I know that the JVM's primary benefit from OSR was in benchmarks (not
real-world
code), but does Java have anything like generators that promote long-lived
function
"calls" like this?
We don't strictly need OSR; special-casing for generators might be enough. If
we
checked for hotness every time we reentered the generator, that would be enough.
Original comment by collinw
on 8 Jan 2010 at 1:04
Running abbeyj's patch through the new Misc/diff_hotness.py tool I added in
r974 gives this:
### Newly-hot functions
html5lib/tokenizer.py:59 (__iter__) now has 1641024 points
### Hotter functions
html5lib/tokenizer.py:146 (consumeEntity) -> +14438 points
html5lib/inputstream.py:405 (charsUntil) -> +55 points
### Colder functions
html5lib/html5parser.py:1003 (startTagListItem) -> -9091 points
html5lib/treebuilders/_base.py:131 (elementInScope) -> -7403 points
html5lib/html5parser.py:1480 (endTagOther) -> -9091 points
html5lib/treebuilders/_base.py:201 (elementInActiveFormattingElements) -> -5916 points
Example command: PYTHONPATH=lib/html5lib/ ../hg/fc-obj-inst/python.exe
performance/bm_html5lib.py -n 2; the -n2 is necessary because of my
above message.
Looks good. I'll look over the impact on other benchmarks.
Original comment by collinw
on 8 Jan 2010 at 1:32
Misc notes:
- html5parser.py:196(normalizedTokens) is another candidate for pseudo-OSR
(same reasons as
html5lib/tokenizer.py:59(__iter__)).
- str.startswith() should be converted to METH_ARG_RANGE (494557 calls per run;
easy win).
- We should find a way to specialize isinstance() calls if a) the given object
and type are monomorphic, b) type(given_object)
is given_type.
- Inlining will help a lot; there are a number of single-line methods that show
up as very hot that should be inlined into their
callers.
- --with-instrumentation for bail sites needs to be extended to track frequency.
I'll file individual bugs for these in the morning.
Original comment by collinw
on 8 Jan 2010 at 8:44
r980 added while loops into the hotness model, meaning that
"html5lib/tokenizer.py:59 (__iter__)" is now
being compiled to machine code.
### html5lib ###
Min: 13.185609 -> 12.256459: 1.0758x faster
Avg: 13.876981 -> 12.997757: 1.0676x faster
Not significant
Stddev: 1.15007 -> 1.41818: 1.2331x larger
Timeline: http://tinyurl.com/y8aldmr
If you look at the timeline graph, r980 is about a second faster, just not on
the first run (which increases
the stddev, which lowers the significance score).
Original comment by collinw
on 8 Jan 2010 at 10:43
r982: --with-instrumentation now tracks bail site frequency.
Original comment by collinw
on 12 Jan 2010 at 12:45
Original comment by collinw
on 12 Jan 2010 at 2:49
Original comment by collinw
on 12 Jan 2010 at 2:49
r1029 added a separate no-warmup html5lib benchmark. The old html5lib benchmark
is now called html5lib_warmup.
Original comment by collinw
on 22 Jan 2010 at 10:11
Reid implemented pseudo-OSR for generators in r1032.
### html5lib_warmup ###
Min: 12.435418 -> 12.430771: 1.0004x faster
Avg: 13.130518 -> 13.066833: 1.0049x faster
Not significant
Stddev: 1.45305 -> 1.15964: 1.2530x smaller
Timeline: http://tinyurl.com/ykyvzxn
Let's look at that first run in more detail:
### html5lib ###
Min: 17.347363 -> 16.364512: 1.0601x faster
Avg: 17.382957 -> 16.393608: 1.0603x faster
Significant (t=85.867780, a=0.95)
Stddev: 0.02557 -> 0.02596: 1.0154x larger
Timeline: http://tinyurl.com/yconbz5
My testing shows this shaving a full second off the html5lib runtime. Woot!
Current benchmarking vs CPython 2.6.4:
### html5lib ###
Min: 14.016869 -> 16.396508: 1.1698x slower
Avg: 14.059663 -> 16.447700: 1.1699x slower
Significant (t=-175.046714, a=0.95)
Stddev: 0.02967 -> 0.03132: 1.0555x larger
Timeline: http://tinyurl.com/ycy927u
This was using the new html5lib, not the old html5lib (now called
html5lib_warmup), so it's not comparable to above
results. We're still slower than CPython, but we're gaining ground.
Original comment by collinw
on 22 Jan 2010 at 11:40
Original issue reported on code.google.com by
geoffers
on 6 Jan 2010 at 6:30Attachments: