vasanthaganeshk / unladen-swallow

Python2 Jit compiler
https://code.google.com/archive/p/unladen-swallow/
Other
0 stars 0 forks source link

Speed up globals/builtins lookups #67

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Currently, looking up builtin functions does a lookup on the globals dict 
(which almost always fails), then a lookup on the builtins dict. We should 
version these dictionaries, avoiding the lookups if the version numbers 
haven't changed.

One approach to this is in http://bugs.python.org/issue1518, but I'm not 
satisfied with it. It imposes an intermediary object between the code and 
the globals/builtins dicts, and that intermediary is constantly polling the 
dicts to see if they've changed.

I'm thinking that I'd rather have the frame objects register with the dicts 
and have the dicts push status updates to the frames. In this scheme, you 
wouldn't need real version numbers on the dicts; instead, they'd simply set 
dirty bits on the registered frames to say "your assumptions are invalid". 
The generated code would see these bits and bail to the interpreter and 
request recompilation of the now-invalid code.

The patch in http://bugs.python.org/issue1518 resulted in an 8% speedup 
when applied to the 2009Q1 release's eval loop without much tuning. I 
believe that the benefit to the LLVM-generated machine code would be 
higher.

Original issue reported on code.google.com by collinw on 14 Jul 2009 at 9:52

GoogleCodeExporter commented 8 years ago

Original comment by collinw on 15 Jul 2009 at 8:41

GoogleCodeExporter commented 8 years ago
Added TSC instrumentation in r743.

From bm_django.py -n 1:

for transitions from LOAD_GLOBAL_ENTER_EVAL to LOAD_GLOBAL_EXIT_EVAL:
occurrences: 205222
median delta: 276.0
mean delta: 15809.3157459
min delta: 228
max delta: 521121456
stddev: 1773216.81854

for transitions from LOAD_GLOBAL_ENTER_LLVM to LOAD_GLOBAL_EXIT_LLVM:
occurrences: 1668332
median delta: 360.0
mean delta: 15994.3224634
min delta: 252
max delta: 724745844
stddev: 1998407.29119

Original comment by collinw on 16 Jul 2009 at 4:11

GoogleCodeExporter commented 8 years ago
r749 adds sys.setbailerror(), useful for making sure that we're not failing 
guards where 
we don't expect to be (ie, make sure we're actually using the machine code 
versions 
where).

Original comment by collinw on 21 Jul 2009 at 5:27

GoogleCodeExporter commented 8 years ago
Committed as r815.

(Dapper; i486-linux-gnu; gcc 4.0.3; Core 2 6600 @ 2.40GHz)
./perf.py -r -b
ai,slowspitfire,call_simple,2to3,rietveld,django,slowpickle,slowunpickle --args
"-j always" ../a/python ../b/python

ai:
Min: 0.466231 -> 0.457457: 1.92% faster
Avg: 0.468500 -> 0.459558: 1.95% faster
Significant (t=29.396645, a=0.95)
Stddev: 0.00171 -> 0.00252: 32.28% larger

2to3:
Min: 69.455441 -> 63.814299: 8.84% faster
Avg: 69.485237 -> 63.848094: 8.83% faster
Significant (t=179.726302, a=0.95)
Stddev: 0.04656 -> 0.05245: 11.22% larger

call_simple:
Min: 1.311656 -> 0.980650: 33.75% faster
Avg: 1.317529 -> 0.986582: 33.54% faster
Significant (t=1188.927520, a=0.95)
Stddev: 0.00229 -> 0.00158: 45.26% smaller

django:
Min: 1.003850 -> 0.922123: 8.86% faster
Avg: 1.008057 -> 0.924197: 9.07% faster
Significant (t=243.432452, a=0.95)
Stddev: 0.00277 -> 0.00205: 35.42% smaller

rietveld:
Min: 0.555269 -> 0.530369: 4.69% faster
Avg: 0.560270 -> 0.535943: 4.54% faster
Significant (t=19.701851, a=0.95)
Stddev: 0.00791 -> 0.00948: 16.58% larger

slowpickle:
Min: 0.641616 -> 0.617384: 3.92% faster
Avg: 0.642830 -> 0.617874: 4.04% faster
Significant (t=150.344545, a=0.95)
Stddev: 0.00165 -> 0.00021: 674.33% smaller

slowspitfire:
Min: 0.659548 -> 0.661412: 0.28% slower
Avg: 0.661986 -> 0.664182: 0.33% slower
Significant (t=-5.837525, a=0.95)
Stddev: 0.00226 -> 0.00301: 25.10% larger

slowunpickle:
Min: 0.275447 -> 0.268517: 2.58% faster
Avg: 0.275702 -> 0.268875: 2.54% faster
Significant (t=109.194368, a=0.95)
Stddev: 0.00014 -> 0.00061: 76.95% larger

--with-tsc stats:
Dapper; i486-linux-gnu; gcc 4.0.3; Core 2 6600 @ 2.40GHz
PYTHONPATH=../tests/lib/django/ ./python ../tests/performance/bm_django.py -n 
1

trunk@head:
for transitions from LOAD_GLOBAL_ENTER_LLVM to LOAD_GLOBAL_EXIT_LLVM:
occurrences: 1668332
median: 414.0
inter-decile mean: 419.504765986
min delta: 297
max delta: 301086
inter-decile stddev: 43.909562956

this patch:
for transitions from LOAD_GLOBAL_ENTER_LLVM to LOAD_GLOBAL_EXIT_LLVM:
occurrences: 1668332
median: 279.0
inter-decile mean: 279.6894444
min delta: 252
max delta: 96075
inter-decile stddev: 10.0954254049

Original comment by collinw on 28 Aug 2009 at 10:27