titan-lang / titan

The Titan programming language
http://titan-lang.org
MIT License
406 stars 13 forks source link

Maps #229

Closed hishamhm closed 6 years ago

hishamhm commented 6 years ago

This PR adds maps to Titan, declared with { type : type } syntax and initialized with {[exp] = exp, ... }. Semantics are supposed to be straightforward, and the implementation followed that of arrays.

There are two items I'd love to have some help with:

I needed to de-static-ify one function in the Lua VM, and then added a forward declaration in the preamble. Not sure if this is the way we've been doing this.

Also, I don't know if the implementation I did in the coder for setting the Lua table is the most efficient one — but it seems to pass the tests! (except for that one...)

Marking this as "wip" until we figure out the pending test bug and the map-of-records tests.

codecov-io commented 6 years ago

Codecov Report

Merging #229 into master will decrease coverage by 0.45%. The diff coverage is 94.51%.

Impacted Files Coverage Δ
titan-compiler/parser.lua 96.32% <ø> (ø) :arrow_up:
titan-compiler/types.lua 86.41% <100%> (+0.92%) :arrow_up:
titan-compiler/ast.lua 100% <100%> (ø) :arrow_up:
titan-compiler/checker.lua 89.92% <89.47%> (-0.69%) :arrow_down:
titan-compiler/coder.lua 96.71% <96.62%> (+0.32%) :arrow_up:
titan-compiler/driver.lua 63.08% <0%> (-10.75%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b473e32...8494cc2. Read the comment docs.

mascarenhas commented 6 years ago

The strange bug was that you were using the type of the values instead of the keys to decide how to look for the key in codeindexmap. 😄 You also cannot use double quotes inside the code that we are passing to os.execute.

The failing tests with records where due to the module name that you were passing to checker.check being different from the one passed to driver.compile, and this was confusing the code generator.

All tests are passing now, but I need to give this a proper review which I will only be able to do on monday.

About getgeneric, I really do not want to have to mess with Lua sources apart from luaconf.h... what I did for other static functions that I needed to call was just to copy them over. The problem here is that getgeneric uses mainposition, which is also static, which uses l_hashfloat, which is also static, which thankfully does not use other static functions, but all of these three would have to be copied over to the preamble.

Blindly copying them over will lead to "unused function" warnings for code that uses maps, though, so we also need a flag in code generator that will only add those functions to the preamble if the module has code that calls getgeneric.

On the plus side, copying this code over lets us specialize getgeneric to be less generic, as Titan has more type information available, this might speed up some cases. The big switch in mainposition is not needed for most Titan types used as keys, so we could just inline the correct case in the specialized getgeneric functions.

By the way, for maps with integer keys you can just call the same code that generates code for arrays, as their semantics will be identical. It is an open question if we want to make them the same type or not.

Having maps is going to be very cool! 😄

hugomg commented 6 years ago

I agree with what Fabio said: instead of inlining the spefific "get" code (one if statement per table get) you could create a separate function for each case (get_string, get_float, and so on). This is how we have been doing it in the lablua branch and it helps keep the generated code more readable.

About making functions not static this is something that we will need to clear up when we decide how Titan is going to be packaged. Will it continue to rely on a patched Lua or will we copy over all functions that we need? Anyway, making a Lua function not static is a bit of a code smell but it is also partially in line with what we are already doing...

By the way, do you know how much we gain by this optimization of calling getgeneric directly? It is possible that sticking with the more generic luaH_get could have been fine all along (and the code would have less cases to test).

hishamhm commented 6 years ago

you could create a separate function for each case (get_string, get_float, and so on).

Honest question: is that acceptable performance-wise? I learned to distrust the GCC inliner over the years, but those memories may be out-of-date and I defer to your recent experience on performance measurements.

Will it continue to rely on a patched Lua or will we copy over all functions that we need?

This sounds like a great idea! There are not that many minor Lua versions anyway, so we could even support all of 5.3.x, keeping one big "include" file for each minor (and some of them might even be the same) and testing them in parallel on Travis.

By the way, do you know how much we gain by this optimization of calling getgeneric directly? It is possible that sticking with the more generic luaH_get could have been fine all along (and the code would have less cases to test).

No idea, I just saw the switch statement there and thought it would make sense to "unwrap" it, because that would produce heavily biased branches (which reminds me I forgot to use TITAN_LIKELY!), and because the implementation of array access was already "unwrapped" from the lua* function, doing things like ->array and such. But yeah, mea culpa on premature optimization.

hugomg commented 6 years ago

I learned to distrust the GCC inliner

If you want to inline those functions then yes, not putting them in separate functions would be the way to go.

So far in the lablua branch the functions we split off are mostly about error handling and IO, which don't care about being inlined. In this case I don't know if inlining would help or not because I haven't tested it.

which reminds me I forgot to use TITAN_LIKELY

In this case I am not sure we actually want to add the LIKELY annotation. I think in all cases both branches are equaly likely (short vs long string, integer-valued floats vs other floats).

and because the implementation of array access was already "unwrapped"

We are actually following PUC Lua here :) Check out luaV_fastgeti in lvm.h

The motivation for accessing ->array directly is that it completely avoids functions calls in the fast case (integers in the array part). For other types of key there is always a function call so inlining won't necessarily help. Lua only calls luaH_get, luaH_getstr and luaH_getshortstr in this manner (see grep luaV_gastget) so it is possible that just using these would suffice and we wouldn't need to de-staticfy getgeneric. But the only way to know for sure would be testing it.

hishamhm commented 6 years ago

In this case I am not sure we actually want to add the LIKELY annotation. I think in all cases both branches are equaly likely (short vs long string, integer-valued floats vs other floats).

Yes, after posting I went back and read the code and came to the same conclusion :)

hishamhm commented 6 years ago

We are actually following PUC Lua here :) Check out luaV_fastgeti in lvm.h

no luaV_fastgeti in Lua 5.3! :D

hishamhm commented 6 years ago

By the way, for maps with integer keys you can just call the same code that generates code for arrays, as their semantics will be identical.

Oh, good call! I can change this in a future iteration if it's not too pressing.

It is an open question if we want to make them the same type or not.

Currently these types have one interesting semantic difference I'd like to highlight: # is not available for maps (I left it out deliberately in this PR, as IMO the Lua meaning of # is pretty accidental on the Lua objects backing maps; # in tables is meant for tables-as-arrays and we have a separate type for that).

mascarenhas commented 6 years ago

I agree with what Fabio said: instead of inlining the spefific "get" code (one if statement per table get) you could create a separate function for each case (get_string, get_float, and so on). This is how we have been doing it in the lablua branch and it helps keep the generated code more readable.

My suggestion was that we actually have enough type information that the first hash lookup could actually be fully inlined, leaving just the probing in case this fails to another function, this second function might as well be generic.

mascarenhas commented 6 years ago

Currently these types have one interesting semantic difference I'd like to highlight: # is not available for maps (I left it out deliberately in this PR, as IMO the Lua meaning of # is pretty accidental on the Lua objects backing maps; # in tables is meant for tables-as-arrays and we have a separate type for that).

Yeah, I noticed that after my comment, when I was reviewing the code. Let's keep them separate!

mascarenhas commented 6 years ago

This sounds like a great idea! There are not that many minor Lua versions anyway, so we could even support all of 5.3.x, keeping one big "include" file for each minor (and some of them might even be the same) and testing them in parallel on Travis.

I don't really see the need to expend any effort on supporting different point releases instead of tracking the latest and maybe the one just before that, to give some time for people to upgrade. Upgrading from 5.3.x to 5.3.(x+1) is supposed to be painless unless you are relying on one of the bugs that has been fixed.

But yeah, mea culpa on premature optimization.

Exploiting the type information that we have, specially when it is such low-hanging fruit, is not premature optimization, it is one of the points of having the types in the first place!

hishamhm commented 6 years ago

I don't really see the need to expend any effort on supporting different point releases instead of tracking the latest and maybe the one just before that, to give some time for people to upgrade. Upgrading from 5.3.x to 5.3.(x+1) is supposed to be painless unless you are relying on one of the bugs that has been fixed.

Just so it works out of the box with "any Lua 5.3" out there (people use distros that have outdated packages often, or aren't in a position where they can upgrade). If it's not too much work, of course. I don't know how much internal refactoring usually happens in Lua point releases, but I don't expect it to be a lot.

mascarenhas commented 6 years ago

Right now it is impossible to use Titan with a distro Lua, anyway, even if the distro packages Lua 5.3.4, as that Lua will not export the private non-static API functions (by default they are marked as __attribute__((visibility("hidden")))).

We cannot just link all of Lua statically with each Titan module, as each Lua will have its own luaO_nilobject, as well as its own indexes in the registry for the hook table and C library cache, as well as trip the luaL_checkversion call in luaL_openlib in case the Lua linked by Titan tries to load a another C library.

It will be a lot more trouble than just changing a line in luaconf.h and patching some functions from static to non-static, but I think we will end up having a lua.c with a curated list of all the private Lua functions that Titan depends on, patched to get around the luaO_nilobject issue, but this way we will be able to load Titan modules in stock Lua without worry.

Of course things are much easier in the other direction, where we are creating a Titan executable that embeds Lua, here we expose static functions that we rely on as LUAI_FUNC and just link the Titan modules with the object files of this slightly patched Lua source, and everything will work fine.

mascarenhas commented 6 years ago

I think we can merge this now, and fix the getgeneric issue properly later.

mascarenhas commented 6 years ago

We cannot just link all of Lua statically with each Titan module, as each Lua will have its own luaO_nilobject, as well as its own indexes in the registry for the hook table and C library cache, as well as trip the luaL_checkversion call in luaL_openlib in case the Lua linked by Titan tries to load a another C library.

From what I have seen of the Lua 5.3.4 source code it appears that luaO_nilobject is a nonissue, as it is never stored as is inside any data structure, just returned by several functions as a sentinel value, so we should be ok with Lua inside a Titan module and "main" Lua, or Lua inside another Titan module, having different luaO_nilobject.

The version check looks like it can be worked around by just not linking statically with lapi.c, where lua_version is defined; this file is part of the public API, so its functions will be found at runtime (and do the right thing). Same for &HOOKKEY and &CLIBS, which are in ldblib.c and loadlib.c, respectively.

mascarenhas commented 6 years ago

I am already working on a PR that will do away with the need for patched Lua both for compiling and for loading Titan modules (we will still need the Lua sources to access the object files for the Lua internals, though).

hishamhm commented 6 years ago

@mascarenhas great news! "wip - do not merge" label removed, so feel free to approve and merge!