Closed GoogleCodeExporter closed 9 years ago
The exception idea is a good bet. Django's variable resolution uses the same
syntax
for attribute lookup, index lookup, and key lookup, this means that each nested
resolution probably averages 1-3 exceptions, plus any other exceptions in
template
rendering.
Original comment by alex.gay...@gmail.com
on 8 Jun 2009 at 6:14
So I added code to gather performance measurements for exceptions in r620, and
we
were surprised to find that Django doesn't actually throw any exceptions while
rendering templates. It seems that they use getattr with a default and then
introspect on the result to decide how to handle it.
That was a good bet, but now we need a new one.
Original comment by reid.kle...@gmail.com
on 15 Jun 2009 at 6:58
Django will throw plenty of exceptions in nested variable resoltuion:
http://code.djangoproject.com/browser/django/trunk/django/template/__init__.py#L
701 .
However as it turns out that doesn't occur anywhere in the Unladen Swallow
benchmark. Which is helpful in that it eliminates that as a possible source of
the
regression, however it may mean that real world templates (which will often
include
nested lookups) have even worse regressions. I'll try to look into this later
today.
Original comment by alex.gay...@gmail.com
on 15 Jun 2009 at 7:03
Huh. I had no idea we had that problem with our benchmarks. However, we did
find
that exceptions aren't actually any slower in LLVM than they are in the eval
loop, so
I don't think that the nested lookup should be any slower.
Original comment by reid.kle...@gmail.com
on 15 Jun 2009 at 7:44
Latest theory: Django templates are using eval or exec during their execution,
which
would incur JIT compilation costs under "-j always -O2" (formerly "-L2"). This
turns out
not to be the case, as the benchmark doesn't use eval or exec during the timed
loop.
Original comment by collinw
on 18 Jun 2009 at 2:06
Running PyBench to find an localized regressions:
perf.py args: --verbose -r -b pybench --args ",-j always -O2" branches/release-
2009Q1-maint/python trunk/python
Interesting localized regressions that may be impacting Django:
ListSlicing: 1107ms 1800ms -38.5% 1120ms 2010ms -44.3%
SmallTuples: 899ms 1068ms -15.8% 932ms 1184ms -21.3%
Original comment by collinw
on 18 Jun 2009 at 6:10
I've manually traced the method calls and neither list slicing nor tuple
handling
appear to be particularly stressed by the most called code paths (the inner loop
obviously):
http://code.djangoproject.com/browser/django/trunk/django/template/__init__.py#L
824
http://code.djangoproject.com/browser/django/trunk/django/template/__init__.py#L
794
http://code.djangoproject.com/browser/django/trunk/django/template/__init__.py#L
543
http://code.djangoproject.com/browser/django/trunk/django/template/defaultfilter
s.py#L379
I take in there weren't any other micro benchmarks that showed similar large
performance gaps?
Original comment by alex.gay...@gmail.com
on 18 Jun 2009 at 7:34
The problem isn't list slicing or tuple handling per se: it's the
UNPACK_SEQUENCE
opcode, which is used for, e.g., "a, b, c, d, e = range(5)". It's lines like
this that are
hurting those microbenchmarks.
Looking at
http://code.djangoproject.com/browser/django/trunk/django/template/__init__.py#
L543, I see two interesting for loops, one at L562 and an inner loop at L564.
These
both use UNPACK_SEQUENCE to unpack the arguments in the "for func, args in
self.filters", etc, lines.
I'm going to experiment with compiling _PyEval_UnpackIterable() with Clang so
that
LLVM can analyze it.
Original comment by collinw
on 18 Jun 2009 at 8:12
That's probably a good bet. The UNPACK_SEQUENCE bytecode will be evaluated
22500
times for the loop at 562, and never for the loop at 564 (it's an empty list
every
single time through).
Original comment by alex.gay...@gmail.com
on 18 Jun 2009 at 8:35
I think I figured out what the issue is (again ;) ). Currently there is no LLVM
definition for the IMPORT_NAME opcode, however this function:
http://code.djangoproject.com/browser/django/trunk/django/template/defaultfilter
s.py#L379
Which is in the inner loop (get's called 22500 times) uses an IMPORT_NAME,
which I'm
guessing means it never get's JIT'd. This means that it's a) only ever
interpretted
and b) causes 22500 switches bewteeen JIT and non-JIT code which probably
causes the
performance issue (unless of course I totally misunderstodd how import is
handled ;) ).
Original comment by alex.gay...@gmail.com
on 23 Jun 2009 at 5:18
In Unladen Swallow, IMPORT_NAME is a function, not an opcode, so there
shouldn't be
any barrier to LLVM-compiling the Django function.
The string_filter() decorator that's being wrapped around escape() might be
worth taking
a look at. I don't know that *args, **kwargs functions have been optimized.
Jeffrey
exposed the stack manipulation in CALL_FUNCTION to LLVM in r665, and it would
be
worth doing something similar for the other CALL_FUNCTION_* opcodes.
Original comment by collinw
on 23 Jun 2009 at 5:26
Whoops. If no one get's to the CALL_FUNCTION_<VAR|KW|VAR_KW> thing by tonight
I'll
try taking a whack at it.
Original comment by alex.gay...@gmail.com
on 23 Jun 2009 at 5:40
Interesting:
2009Q1 vs trunk@r669 (-j always -O2; 32-bit; gcc 4.0.1)
django:
Min: 1.088 -> 1.031: 5.52% faster
Avg: 1.090 -> 1.032: 5.56% faster
Significant (t=290.883794, a=0.95)
Stddev: 0.002 -> 0.001: 156.47% smaller
I need to see if this is a 32- vs 64-bit issue, or if the original observations
were
incorrect, or if we've fixed it, or what. I haven't been able to reproduce the
original
observations, even on the original configuration (gcc 4.3.1; 64-bit), so it may
have
been measurement error, or we may have fixed things in the interim.
There're definitely still improvements to be made with UNPACK_SEQUENCE and
CALL_FUNCTION_*, though. I've sent a patch out for review to add an
unpack_sequence benchmark, and something similar should be done for
CALL_FUNCTION_*.
Original comment by collinw
on 24 Jun 2009 at 11:44
Original comment by collinw
on 13 Jul 2009 at 5:51
Original issue reported on code.google.com by
collinw
on 8 Jun 2009 at 6:08