Tested with the following synthetic benchmark ran with luajit -jdump. The new code produces a inner loop with less instructions and without the calls to lj_carith_modu64 and lj_carith_divu64 which are produced by current code.
local switch_this = true
local ffi=require'ffi'
ffi.cdef[[int printf(const char *format, ...);]]
local a, b
for i=1,100 do
if switch_this then
a = ffi.cast("uint32_t", ffi.cast("uintptr_t", ffi.C.printf) + i)
b = bit.rshift(ffi.cast("uintptr_t", ffi.C.printf) + i, 32)
else
a = (ffi.cast("uintptr_t", ffi.C.printf) + i) % 2^32
b = (ffi.cast("uintptr_t", ffi.C.printf) + i) / 2^32
end
end
The patch appears to work. Tested with dynasm_demo.lua and with bf.lua.
Tested with the following synthetic benchmark ran with
luajit -jdump
. The new code produces a inner loop with less instructions and without the calls tolj_carith_modu64
andlj_carith_divu64
which are produced by current code.The patch appears to work. Tested with
dynasm_demo.lua
and withbf.lua
.