In #27 we said that "every call to a Lua function is inlined, always, without exception." But what are these calls inlined into? What is the unit of compilation?
A reasonable first approximation is to say that the JIT compiles each of the innermost loops in the program separately. Innermost loops are any loops that don't themselves contain loops, neither directly in their source code nor indirectly via a function they call. The unit of compilation is then from the start of each inner loop to its end, and all function calls made within the loop are inlined into the generated code.
This inlining means that each time you call a function in your source code the JIT will compile a separate copy of that function. Consider this inner loop that calls the same function multiple times:
function add(a, b) return a + b end
local x = 0
for i = 1, 100 do
x = add(x, 42)
x = add(x, "42")
end
Here we have two calls to the function add and parameter b is an integer in the first call but a string in the second call. (This is not an error because Lua automatically coerces strings to numbers.) Is the diversity of types for the b parameter going to cause problems with speculative optimization? No: the JIT will compile each of the calls separately and specialize the code for each call for separate argument types. This code compiles efficiently because at runtime each of the calls uses self-consistent types.
Let's be advanced and consider the merits of a couple of higher-order functions:
function apply(fn, x)
return fn(x)
end
function fold(fn, val, array)
for i = 1, #array do
val = fn(val, array[i])
end
return val
end
Is this code appropriate for a tracing JIT? The potential problem is that each time we compile a call to these functions the JIT will speculatively inline the body of the function that we pass in as fn. This is extremely efficient if we are always passing the same value for fn but otherwise it is expensive. If our intention is to call these library functions many times with different function parameters then we have to consider how those calls will compile.
The answer is that apply is fine but fold is problematic. The reason that apply is fine is that each call will be compiled and optimized separately and so there will not be any interaction between the different fn values. The reason that fold is problematic is that it passes many different values of fninto a loop and calls it there. The loop in fold will be its own compilation unit, meaning that only one copy of this loop body will be compiled, and this compiled code will see all of the different values of fn at runtime. This will force the compiler to repeatedly mispredict the value that fn will have and to generate less-than-optimal code as a result.
The moral of this story is that if you want to optimize the same code for many different uses then you should be careful to make sure each use is compiled separately. Loosely speaking that means to avoid sharing the same inner loop in your source code for multiple uses.
How does that work in practice? In simple terms it means that instead of writing code like this:
local x = 0
x = fold(fn1, x, array)
x = fold(fn2, x, array)
You should simply write two separate loops (for...end) in your source code.
local x = 0
for i = 1, #array do x = fn1(x, array[i]) end
for i = 1, #array do x = fn2(x, array[i]) end
Here fn1 and fn2 are called from separate inner loops and so each will be compiled and optimized separately. Simple enough, eh? (This is assuming that these really are inner loops i.e. that neither fn1 nor fn2 contains a loop!)
If you are really determined to write higher-order functions that have attractive performance characteristics then take a look at advanced tricks like https://github.com/luafun/luafun/pull/33.
(Incidentally: These units of compilation are called traces.)
In #27 we said that "every call to a Lua function is inlined, always, without exception." But what are these calls inlined into? What is the unit of compilation?
A reasonable first approximation is to say that the JIT compiles each of the innermost loops in the program separately. Innermost loops are any loops that don't themselves contain loops, neither directly in their source code nor indirectly via a function they call. The unit of compilation is then from the start of each inner loop to its end, and all function calls made within the loop are inlined into the generated code.
This inlining means that each time you call a function in your source code the JIT will compile a separate copy of that function. Consider this inner loop that calls the same function multiple times:
Here we have two calls to the function
add
and parameterb
is an integer in the first call but a string in the second call. (This is not an error because Lua automatically coerces strings to numbers.) Is the diversity of types for theb
parameter going to cause problems with speculative optimization? No: the JIT will compile each of the calls separately and specialize the code for each call for separate argument types. This code compiles efficiently because at runtime each of the calls uses self-consistent types.Let's be advanced and consider the merits of a couple of higher-order functions:
Is this code appropriate for a tracing JIT? The potential problem is that each time we compile a call to these functions the JIT will speculatively inline the body of the function that we pass in as
fn
. This is extremely efficient if we are always passing the same value forfn
but otherwise it is expensive. If our intention is to call these library functions many times with different function parameters then we have to consider how those calls will compile.The answer is that
apply
is fine butfold
is problematic. The reason thatapply
is fine is that each call will be compiled and optimized separately and so there will not be any interaction between the differentfn
values. The reason thatfold
is problematic is that it passes many different values offn
into a loop and calls it there. The loop infold
will be its own compilation unit, meaning that only one copy of this loop body will be compiled, and this compiled code will see all of the different values offn
at runtime. This will force the compiler to repeatedly mispredict the value thatfn
will have and to generate less-than-optimal code as a result.The moral of this story is that if you want to optimize the same code for many different uses then you should be careful to make sure each use is compiled separately. Loosely speaking that means to avoid sharing the same inner loop in your source code for multiple uses.
How does that work in practice? In simple terms it means that instead of writing code like this:
You should simply write two separate loops (
for
...end
) in your source code.Here
fn1
andfn2
are called from separate inner loops and so each will be compiled and optimized separately. Simple enough, eh? (This is assuming that these really are inner loops i.e. that neitherfn1
norfn2
contains a loop!)If you are really determined to write higher-order functions that have attractive performance characteristics then take a look at advanced tricks like https://github.com/luafun/luafun/pull/33.
(Incidentally: These units of compilation are called traces.)