pallene-lang / pallene

Pallene Compiler
MIT License
668 stars 29 forks source link

Optimization: Use the stack for local fixed length arrays #511

Open ewmailing opened 2 years ago

ewmailing commented 2 years ago

I encountered some algorithms that use temporary fixed length arrays. For example, here is a popular compute which day of the week algorithm which I translated to Pallene:

-- https://en.wikipedia.org/wiki/Determination_of_the_day_of_the_week
-- Tomohiko Sakamoto algorithm for finding the day of the week
-- It returns 0 = Sunday, 1 = Monday, etc. 
function m.DayOfWeek(y:integer, m:integer, d:integer) : integer
    local t:{integer} = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4};
    if ( m < 3 ) then
        y = y - 1;
    end
    -- note t[m-1] becomes t[m] in Lua 1-indexed arrays
    return (y + y//4 - y//100 + y//400 + t[m] + d) % 7;
end

After reflecting on what Pallene might do and then poking at the emitted C code, I think Pallene is creating an array in Lua on the heap, using it, and then the Lua garbage collector will have to clean it up later. Since this table is fixed size and is local to only this function, a much more performant solution would be to create the array on the stack in C, which avoids both the allocator hits and avoids creating unnecessary work for the garbage collector.

I expect this is probably a long way off for Pallene and will require things like doing escape analysis. But I thought I would file this as a potential nice-to-have optimization way down the line.

hugomg commented 2 years ago

A limitation of using the stack is that we would still need to initialize the "array" elements every time the function is called. In this particular case where the array elements are compile-time constants it might be better to move the array outside the function:

local t:{integer} = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4}
function m.dayOfWeek(i: integer)
    return t[i]
end

Still, it's an intriguing idea...

ewmailing commented 2 years ago

In response to your suggestion of moving the array outside the function, yes, that is a possible workaround, but it has trade-offs which I am not always willing to make. Specifically, since Lua has a limit on the number of local variables you can have in a scope (something I sometimes hit my head on), I am reluctant to do that when working in larger modules (or working with teams of people). Also, bug #508 is still causing me a lot of problems, and I find that one terrifying because I am aimlessly changing code hoping my program will build again, so I've been limiting my use of global local variables hoping it helps.

As for this optimization specifically, I guess there are two cases (and I guess I'm assuming basic primitive types too).

  1. There is a case where the array has compile time constants and is compile time fixed length.
  2. There is a more dynamic case where the array elements and length are only known in the run.

For case 1, I think the the Pallene compiler could emit:

static int function_42_lua(lua_State *L)
{ 
    ...
    static const int[] = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4};
    ...
}

This would allow the C compiler to optimize out needing to initialize the array elements every time the function is called.

As for case 2, I would argue that even though there is still work in initializing the array on the stack every time the function is called, it is still a win. I consider avoiding memory allocations on the heap, and avoiding work on the garbage collector to be significant wins. Perhaps I am hyper-sensitive to these issues because I spent too much time working in game engines (and also dealing with pathologically stupid and slow garbage collectors). From a soft-realtime perspective, constant latency factors are manageable, but it is the arbitrary random latencies that get you. It seems to me that Pallene is in a unique situation that it can optimize some of these cases, so that makes it a huge win if it does.

I know this is a bit advanced for the current stage Pallene is in right now, and would need things like escape analysis and constant analysis. But I wanted to put it on the radar.