Problem: Expensive heap-allocated boxes for 64-bit values (integers and pointers)

lukego commented 7 years ago

One of the biggest problems that we have in Snabb development is when 64-bit values (integers or FFI pointers) in local variables are allocated on the heap. This is a problem for two related reasons.

The first problem is that it is very difficult to predict whether a line of code like local x = pointers[i] will cause a heap allocation of a "box" for the pointer, or whether the allocation will be "sunk" so that the raw value is kept in a register. The answer can be complex: it may be that the same line of code is compiled several times in different traces, and that some of those traces will cause allocations while others will not. The answer can also be that the behaviour is non-deterministic and will vary from one execution of the program to the next depending on how JIT heuristics interact with the workload.

The second problem is that there is a wide performance gulf between the two cases. On the one hand a heap allocation is extremely expensive and requires invocation of the garbage collector. On the other hand a sunk allocation is extremely fast and requires basically no instructions at all.

These problems are related. The wide difference in performance means that it is critical to always sink allocations, but the unpredictability of the JIT heuristics makes this requirement extremely difficult for application developers to satisfy.

Some solution is needed. Either the allocation sinking must be made predictable so that all programmers can depend on it, or the box allocation needs to be made efficient enough that the difference is not a big deal. (Or both.)

Example

Here is a simple example program:

local ffi = require("ffi")
local pointers = ffi.new("void *[101]")

-- Loop that references pointers and compiles into multiple traces
for i = 1, 100 do
   local x = pointers[i]  -- Load a pointer into a local variable
   for i = 1, 100 do end  -- This loop will require a separate trace
end

The line local x = pointers[i] looks like it should be cheap. It's only loading a pointer into a local variable and then never even using it. However it is actually very expensive. I believe the issue is that the variable x is still in scope when we transition to a new trace for the inner loop and this causes the pointer to be heap-allocated as a boxed Lua object.

The expense is partly due to requiring calls to the C functions lj_gc_step_jit() and lj_mem_newgco().

This is not satisfactory.

lukego commented 7 years ago

(Added a semi-minimal example.)

wingo commented 7 years ago

My understanding was that allocation sinking could work across traces. Is it that allocation sinking doesn't work across sub-traces that are loops?

lukego commented 7 years ago

@wingo I don't know the answer. Traces with loops? Traces with common parents? Traces with different roots? Side-traces referencing values from their parent traces? etc. This is all still a bit of a mystery to me. If we could write this down in a comprehensible way then perhaps we would have one half of the problem solved.

raptorjit / raptorjit

Problem: Expensive heap-allocated boxes for 64-bit values (integers and pointers) #91

Example