Idea: Optimization of "struct" and "fields" objects

This is just an idea.

"struct" and "fields" allow for defining types that have a fixed structure. May be this information can be used for more efficient access to these objects both in terms of memory usage and performance?

The following optimizations could be potentially possible:

1) More efficient layout of objects in memory. In theory, fields could be mapped to equivalent of a C-struct with corresponding fields.

Or at least to an Lua-table which has only array elements. Each field name gets mapped to an array-index. obj.fieldX becomes obj[idX]. This could be more memory efficient than hash-table based storage of those fields.

2) (1) is also somewhat similar to Python's named tuples: https://docs.python.org/2/library/collections.html#collections.namedtuple They may use less memory and be accessed more efficiently.

3) May be LuaJIT/compiler may make use of the fact that the structure of objects does not change during run-time? This information could be used e.g. for copy propagation and some other optimizations... I'm not sure though that LuaJIT is able on its own to understand that structure of objects(i.e. tables) is fixed (this needs to be tested yet). Eventually, it may require some help, e.g. some hints ( a-la PyPy JIT hints).

Note: Overall, it would be interesting to understand if a compiler or runtime of Shine may somehow provide hints to LuaJIT to enable better optimizations. I understand that currently LuaJIT does not provide support for this. But if it would, what could be done? This kind of input could interesting for Mile Pall to convince him that supporting some sort of hints could be very valuable.

4) Since Lua is very dynamic, I can imagine that Shine compiler could potentially only statically make use of the fixed layout only in the scope where a variable was declared. But when it gets passed to a different function (or another scope, in general), then this statically known information is most likely lost (unless some sort of very clever escape analysis is used). I don't know if LuaJIT can do any better using traces and propagating this information along traces.

As I said, these are just some ideas. I'm not sure how feasible they are and if something like this is possible at all.

What do you think?

On 4/6/14 11:19 PM, romix wrote:

This is just an idea.

"struct" and "fields" allow for defining types that have a fixed structure. May be this information can be used for more efficient access to these objects both in terms of memory usage and performance?

The following optimizations could be potentially possible:

1) More efficient layout of objects in memory. In theory, fields could be mapped to equivalent of a C-struct with corresponding fields.

Well, if the values are not cdata types then that wouldn't work.

Or at least to an Lua-table which has only array elements. Each field name gets mapped to an array-index. obj.fieldX becomes obj[idX]. This could be more memory efficient than hash-table based storage of those fields.

I've experimented with this kind of thing. So you do save on space. However it also means that these structs respond to both obj[1] as well as obj.fieldX, and saying the latter is an indirection: you need a reverse mapping from fieldX to the slot index. So you can't really get rid of the hash lookup for fieldX - it either references the member value directly, or it references the slot which references the value.

What's more, there's also no guarantee that the members are going to be actually stored in the "array part" of the table. That depends on the order in which the fields are initialized. If fieldX was slot 1 and fieldY slot 2, then setting fieldY first will make the underlying table use the hash part anyway, so you don't gain anything.

The thing is that Lua's tables are fast because the don't re-hash the string every time. They use pointer hashing - which is essentially integer hashing. Strings are interned (well except for short strings, but that's an implementation detail).

I've had cases where using LuaJIT's tables are faster than C structs. It does some badass optimizations.

2) (1) is also somewhat similar to Python's named tuples: https://docs.python.org/2/library/collections.html#collections.namedtuple They may use less memory and be accessed more efficiently.

3) May be LuaJIT/compiler may make use of the fact that the structure of objects does not change during run-time? This information could be used e.g. for copy propagation and some other optimizations... I'm not sure though that LuaJIT is able on its own to understand that structure of objects(i.e. tables) is fixed (this needs to be tested yet). Eventually, it may require some help, e.g. some hints ( a-la PyPy JIT hints).

LuaJIT has a TDUP instruction. What the compiler does is store the constant parts tables in the bytecode, and then duplicate these instead of reconstructing them. So if you had

t = { answer = 42, greet = function() print("Hi") end }

Then the {answer = 42} part is constant, whereas the function is a GC object so needs to be attached after the TDUP. This definitely speeds things up, but has limited use for our case, because struct definitions have other tables as their field values (i.e. the type).

It's not like this stuff is actually slow, right?

Note: Overall, it would be interesting to understand if a compiler or runtime of Shine may somehow provide hints to LuaJIT to enable better optimizations. I understand that currently LuaJIT does not provide support for this. But if it would, what could be done? This kind of input could interesting for Mile Pall to convince him that supporting some sort of hints could be very valuable.

It's been suggested before on the mailing list to make LuaJIT more friendly as a compilation target for other languages. I can't remember Mike's response exactly, but it didn't seem to be a goal of his. It's actually hard to erase the underlying Lua semantics. For instance, in many languages 0 and the empty string evaluate to false. Not so in Lua. So if you wanted that, you'd have to wrap every boolean context evaluation in your own function call.

If your language has mutable strings, then you'd need to replace the native string type and half the VM instructions become useless to you. Then there's the GC. If you wanted to have real low level classes and objects (not based on tables, but language primitives as in Java), then you'd pretty much need to do your own tracing and garbage collection because you don't have mark and sweep hooks.

So it's actually a fairly non-trivial job to make the LuaJIT VM generic in this way.

4) Since Lua is very dynamic, I can imagine that Shine compiler could potentially only statically make use of the fixed layout only in the scope where a variable was declared. But when it gets passed to a different function (or another scope, in general), then this statically known information is most likely lost (unless some sort of very clever escape analysis is used). I don't know if LuaJIT can do any better using traces and propagating this information along traces.

Then struct declarations would need to be added to the language, and not be part of a library (they're just functions). Right now the compiler just sees a function call.

An interesting idea would definitely be to integrate the FFI more into the language so that C declarations are parsed directly, but with Shine syntax. I've toyed with this too:

struct Buffer
   size is size_t
   data is char*
end

You'd then get the ffi.cdef stuff done for you behind the scenes.

This gets hairy though, because what does char* mean in the rest of the language? Can I call it? What if I want to cast it? If casts have their own syntax, what if I try to cast a table? Tricky stuff.

Right now the fact that you need to pass strings to ffi.cdef and do all the operations via a library makes it clear that you're dealing with stuff that's external to the language (i.e. the foreign in ffi).

As I said, these are just some ideas. I'm not sure how feasible they are and if something like this is possible at all.

What do you think?

Sure, all good ideas. For now though, I feel that Shine is squeezing about as much out of LuaJIT as a sugar language can hope to, without going down a complexity rabbit hole.

Sure, all good ideas. For now though, I feel that Shine is squeezing about as much out of LuaJIT as a sugar language can hope to, without going down a complexity rabbit hole.

Absolutely! It was not proposed as a high-priority feature for a near future. I just wanted to have it mentioned for later, should we ever detect serious performance problems with classes/structs.

And of course I'm interested in understanding how to use LuaJIT as a target for other languages. In particular for things that have a semantic diverging from Lua's semantics. And for things that may be CPU-intensive, i.e. where certain optimizations (or lack of them) start playing an important role... But this is of course out of scope for Shine, at least in its current development stage. Concentrating on optimizations makes sense (if at all), once the language has stabilized and performance becomes an issue...

It's been suggested before on the mailing list to make LuaJIT more friendly as a compilation target for other languages. I can't remember Mike's response exactly, but it didn't seem to be a goal of his.

It was actually me who asked for it ;-) I still think that LuaJIT could be a wonderful target for many languages if it would be a bit more friendly... I haven't given up on this idea yet. But this is another story ...

On 4/7/14 10:52 AM, romix wrote:

It was actually me who asked for it ;-) I still think that LuaJIT could be a wonderful target for many languages if it would be a bit more friendly... I haven't given up on this idea yet. But this is another story ...

If you really wanted to diverge, then you actually could get pretty far. I would start by implementing my own tagged value types, so the language has its own idea of strings, numbers, booleans, arrays, hashes, classes, namespaces etc., and it would use functions as "basic blocks".

Then I'd move all my VM "instructions" into a C library and use only branching and CALL instructions which call into said C library using the FFI.

So a = b + c becomes: a = op.add(b, c) where a, b and c are all your own language's tagged values, and a = foo.bar() would become a = op.callmeth(foo, 'bar'), and so on.

With this scheme you're still using the stack slots and GC roots, but you get control over everything else. Dispatching to a C function via the FFI on my system has about a 2 nanosecond overhead. Half a billion ops/second is probably fast enough (plain LuaJIT will still blow it out of the water though).

Then the only problem you have is for non-scalar types: arrays, hashes and so forth. For mallocd data, __gc hook set via ffi.gc doesn't actually free anything (obviously), but it can be used to implement a simple reference counting scheme. You'd only be notified if the reference is no longer anchored. Then, if it is an array, for example, you'd traverse it yourself and decrement the refs on its elements recursively, freeing those which hit zero. The added bonus for this is that if you use an mmap based malloc (jemalloc, for instance), then you don't hit the 1GB limitation on x64.

Coroutines could get tricky, but it's still doable.

One of the main reasons I didn't go this route is because I wanted to be able to leverage existing Lua libraries. Starting an ecosystem around an entirely new language is hard if there aren't any libraries you can use (of course, you'd still have access to C libraries via the FFI).

richardhundt / shine

Idea: Optimization of "struct" and "fields" objects #52