lukego / blog

Luke Gorrie's blog
566 stars 11 forks source link

How speculative optimizations work in RaptorJIT #26

Open lukego opened 6 years ago

lukego commented 6 years ago

Static C/C++ compilers optimize code based on what they can determine with certainty at compile time. For example, when compiling the code a + b, the compiler might think,

I know that a and b are double floats and therefore I can add them with an addsd instruction.

And this would naturally lead to code like this:

addsd xmm0, xmm1

Tracing JIT compilers like RaptorJIT (i.e. LuaJIT family) make optimizations based on speculations that they make at runtime. The compiler runs code at least once before it decides how to optimize it. So the tracing JIT compiler might see the code a + b and think,

I just ran this code and saw that a and b were both double floats. Supposing they will tend to also be double floats in the future that would mean I could add them with an addsd instruction.

which would naturally lead to code more like this:

cmpsd xmm0, xmm1        ; check prediction that both arguments are floats
junord misprediction    ; exit on misprediction (NaN)
addsd xmm0, xmm1

In this case the "real work" is done in the same way by both compilers but the tracing JIT includes some extra checks due to the speculative nature of its optimization.

So which is better? I would say that the answer is neither: they are just different. The static compiler can make optimizations with certainty but it is limited to information that can be inferred from the source code. The tracing JIT has to make optimizations speculatively but it can specialize code using all of the information available at runtime.

Overall there are a couple of main advantages and disadvantages to speculative just-in-time compilation.

The advantages are that it is flexible and that it uses runtime information effectively. Flexible because you can specialize the generated code based on any predictions you care to make. Effective beause you can actually run code before you optimize it and that helps you to make informed predictions about how it will run in the future.

The disadvantages are that the predictions have to be checked at runtime and that the optimizations are only beneficial when the predictions usually come true. The generated code always runs guard instructions to test predictions before running the specialized machine code. If the guards succeed then the specialized code can safely run. If the guards fail then it is necessary to exit (branch) through a chain of alternative implementations that can pick up from the current point of execution and continue based on different predictions. This search for suitably specialized code will hurt performance when it happens frequently.

emmericp commented 6 years ago

It's not immediately obvious why the second assembler listing works. The trick is: Lua/RaptorJIT uses NaN tagging, i.e., pointers etc look like NaN values when interpreted as double. NaN values have ~50 bits free to encode arbitrary information in them so this works.

Thought that tidbit might be useful to someone not familar with the internals of Lua/RaptorJIT :)

lukego commented 6 years ago

Quite so :).

I'm using a bit of artistic license too. The JIT would actually check the type while loading the slot from memory and you wouldn't really have other values making it into xmm registers. Hopefully I can be forgiven and the points that I am trying to make still stand though :)