Open lukego opened 7 years ago
(Added a semi-minimal example.)
My understanding was that allocation sinking could work across traces. Is it that allocation sinking doesn't work across sub-traces that are loops?
@wingo I don't know the answer. Traces with loops? Traces with common parents? Traces with different roots? Side-traces referencing values from their parent traces? etc. This is all still a bit of a mystery to me. If we could write this down in a comprehensible way then perhaps we would have one half of the problem solved.
One of the biggest problems that we have in Snabb development is when 64-bit values (integers or FFI pointers) in local variables are allocated on the heap. This is a problem for two related reasons.
The first problem is that it is very difficult to predict whether a line of code like
local x = pointers[i]
will cause a heap allocation of a "box" for the pointer, or whether the allocation will be "sunk" so that the raw value is kept in a register. The answer can be complex: it may be that the same line of code is compiled several times in different traces, and that some of those traces will cause allocations while others will not. The answer can also be that the behaviour is non-deterministic and will vary from one execution of the program to the next depending on how JIT heuristics interact with the workload.The second problem is that there is a wide performance gulf between the two cases. On the one hand a heap allocation is extremely expensive and requires invocation of the garbage collector. On the other hand a sunk allocation is extremely fast and requires basically no instructions at all.
These problems are related. The wide difference in performance means that it is critical to always sink allocations, but the unpredictability of the JIT heuristics makes this requirement extremely difficult for application developers to satisfy.
Some solution is needed. Either the allocation sinking must be made predictable so that all programmers can depend on it, or the box allocation needs to be made efficient enough that the difference is not a big deal. (Or both.)
Example
Here is a simple example program:
The line
local x = pointers[i]
looks like it should be cheap. It's only loading a pointer into a local variable and then never even using it. However it is actually very expensive. I believe the issue is that the variablex
is still in scope when we transition to a new trace for the inner loop and this causes the pointer to be heap-allocated as a boxed Lua object.The expense is partly due to requiring calls to the C functions
lj_gc_step_jit()
andlj_mem_newgco()
.This is not satisfactory.