vasanthaganeshk / unladen-swallow

Python2 Jit compiler
https://code.google.com/archive/p/unladen-swallow/
Other
0 stars 0 forks source link

Collect feedback only after code objects have been determined to be "warm" #139

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Right now our eval loop is ~60% slower than CPython's because we collect
feedback.  In theory, this isn't a problem, because if we have a good
hotness metric, then the eval loop performance doesn't matter because the
hot code will get compiled regardless.

However, I think a slow eval loop is bad for a couple of reasons:
- Code that gets run once, but is still important for performance, like
import time.
- The eval loop still shows up relatively high in our profile, so it can't
hurt to speed it up.
- There are short-lived scripts that do care about performance, such as hg
and bzr.  Unless we happen to run a CPU intensive command, these won't
benefit from JIT compilation.

Finally, if we delay collecting feedback, we can lazily load our JIT code
and LLVM.  The feedback code (currently) depends on LLVM's DenseMap, so if
we lazily-loaded it without going through a warmup period, we would need to
load LLVM right away.

Lazily loading LLVM will likely improve the startup time of short-lived
scripts even more.

I remember Jörg started to implement this, but it drastically hurt our
performance on html5lib with inlining because without the early feedback,
we mistook a polymorphic callsite for a monomorphic one and pessimized the
code.  Therefore, this feature should probably wait until we can free and
recompile JITed code. 

Original issue reported on code.google.com by reid.kle...@gmail.com on 20 Feb 2010 at 7:50

GoogleCodeExporter commented 8 years ago

Original comment by reid.kle...@gmail.com on 20 Feb 2010 at 7:50

GoogleCodeExporter commented 8 years ago
The html5lib bails where actually a bug in my guards. Fixed now.

My warming phase changes were hurt by the fact that it got some cold path
optimizations wrong, which resulted in a lot of bails. The time saved in the 
eval
loop was not enough to offset that.

Original comment by joerg...@gmail.com on 20 Feb 2010 at 8:24

GoogleCodeExporter commented 8 years ago
So I actually finished implementing this, and it was not a win. It helped 
startup time 
some (~15% IIRC), but hurt macrobenchmark performance almost across the board. 
I 
tracked down a degradation in Django to a branch misprediction where skipping 
even 
the first 50 runs would miss valuable profiling data.

I'm happy to talk about this further, but I no longer think it's an obvious 
win. I think 
we'd need to get fairly sophisticated to avoid the macro degradations.

Original comment by collinw on 20 Feb 2010 at 10:09

GoogleCodeExporter commented 8 years ago
I think it will be worth having in the long run so that we can lazily load LLVM 
and
improve our startup time.  Once we can recompile things, missing feedback won't 
be
such of an issue.  When we merge the background thread patch, the extra 
compilation
time won't hurt the benchmarks significantly.  I agree we should wait on this 
until
recompilation lands.

Original comment by reid.kle...@gmail.com on 22 Feb 2010 at 4:42