richardhundt / shine

A Shiny Lua Dialect
Other
231 stars 18 forks source link

Some issues with arrays #38

Open romix opened 10 years ago

romix commented 10 years ago
a4 = [ [1,2] , [3,4]]
print a4 

results in 1 2 3 4. So either array is formed in a wrong way or printing of arrays is not quite correct

nyanga> a1 = [1,2,3,4,5,6,7,8,9]
nyanga> print a1.slice(1,3)
[string "nyanga.core"]:0: attempt to perform arithmetic on global 'i' (a nil value)

P.S. BTW, Richard, have you seen this language called Croc? It is a mixture of Lua, D, Squirrel, etc: http://jfbillingsley.com/croc/wiki/Lang

Overall, it is rather similar to Nyanga. There could be some interesting things there.

richardhundt commented 10 years ago

On 3/5/14 8:44 AM, romix wrote:

  • Some operations on arrays seem to have a strange semantics:

|a4 = [ [1,2] , [3,4]] print a4 |

results in |1 2 3 4|. That's just stringification... the structure is correct. I guess it's time for that some decent data dumper implementation for stringifying this stuff. So either array is formed in a wrong way or printing of arrays is not quite correct

  • Another thing I detected:

|nyanga> a1 = [1,2,3,4,5,6,7,8,9] nyanga> print a1.slice(1,3) [string "nyanga.core"]:0: attempt to perform arithmetic on global 'i' (a nil value) | Yep that was a bug. Fixed in git head.

  • Shouldn't arrays support slices by means of indexing using ranges, just like strings do? |a[1,10]|could produce a slice of an array

This is now implemented (although using the standard range syntax: a[1..10])

  • Would it make sense to have a concatenation operator for arrays? |[1,2,3]~[4,5,6]|should produce |[1,2,3,4,5,6]|. And |~=|operator should change its left operand accordingly.

Hmm... this is where it gets hairy. What happens when one operand is not an array? What does this mean? [1,2,3] ~ {4} I guess the best would be to make it strict and raise an error if both operands aren't arrays. I'll have to think about it a bit more.

P.S. BTW, Richard, have you seen this language called Croc? It is a mixture of Lua, D, Squirrel, etc: http://jfbillingsley.com/croc/wiki/Lang

Overall, it is rather similar to Nyanga. There could be some interesting things there. Yeah I've run into it before.

  • For example, the idea of decorators could be interesting.

This would be pretty easy to add. I'll add it to my todo list (along with case guards and macros). I'd like to let things stabilize for a bit though, before I go ripping into the parser again. Also, I need to make some progress on the project which was the reason for all the recent hacking on Nyanga in the first place. So next up is an HTTP library (based on Joyent's parser).

*

  • Also meta-programming and reflection is useful. Does Nyanga support metaprogramming or reflection right now? Can I e.g. get information about all (static) methods of a class, all (static) variables of a class, etc?

Well, a class is a Lua table, so:

nyanga> class A
      | foo() end
      | end
nyanga> for k,v in A do print k, v end
__newindex    function(o, k, v): 0x0b0593d0
__tostring    function(self, ...): 0x0b0446b8
__members__    table: 0x0b065808
__index    function(o, k): 0x0b065830
__name    A
__getters__    table: 0x0b101640
__setters__    table: 0x0b101668

A class's internals are therefore visible and can be changed at runtime. The main method table is __members__. The __getters__ and __setters__ table are for methods declared as get or set. Also, classes don't cache their lookups on instances, so if you change a class's __members__ table, then this change is reflected in all existing instances.

Here you can see methods:

nyanga> for k,v in A.__members__ do print k, v end
__tostring__    function(o): 0x0b0593f8
foo    function(self): 0x0b06e448

Only 'tostring' is predefined. To get a class from an object, just use getmetatable. I've tried to keep the internals as plain as possible. No magic. My feeling is that it should be OK for somebody who knows what they're doing to be able to manipulate the language at this level.

— Reply to this email directly or view it on GitHub https://github.com/richardhundt/nyanga/issues/38.

romix commented 10 years ago

Thanks for fixing the slicing bug. Works now!

This is now implemented (although using the standard range syntax: a[1..10]

Yes, range syntax. Comma syntax was a spelling error on my side ;-) But it doesn't work for me yet. Have you committed it already?

nyanga> a = [1,2,3,4,5,6]
nyanga> print a[1..3]
nil
richardhundt commented 10 years ago

On 3/5/14 11:51 AM, romix wrote:

Thanks for fixing the slicing bug. Works now!

This is now implemented (although using the standard range syntax:
a[1..10]

Yes, range syntax. Comma syntax was a spelling error on my side ;-) But it doesn't work for me yet. Have you committed it already?

Forgot to push :( Should be there now. nyanga> a = [1,2,3,4,5,6] nyanga> print a[1..3] nil

— Reply to this email directly or view it on GitHub https://github.com/richardhundt/nyanga/issues/38#issuecomment-36730589.

romix commented 10 years ago

OK. Works now!

BTW, I'm wondering if open-ended slices like in Python make sense? e.g. a[3..] or a[..10]?

And what about negative indexes? In Lua they are OK. In Python they mean counting from the end, i.e. a[-i] means a[a.length-i]?

romix commented 10 years ago

A few other small things:

function f(a=1, b="Name", c=3.0)
     print "a=${a} b=${b} c=${c}"
end

it should be possible to invoke it like:

f(a=5)
f(b='World', a=7)
f(c=1.2345)
f(10, c=88.99)

Obviously, positional arguments cannot occur in the invocation after named ones. The implementation should be almost trivial, I guess. But it makes life much easier. What do you think?

richardhundt commented 10 years ago

On 3/6/14 8:22 AM, romix wrote:

A few other small things:

*

I think concatenation could be also interesting for tables. The
semantic should the same as dict1.update(dict2) in Python

This implies that all tables have a metatable. It used to be that way. There used to be a Table type but I removed it for the sake of simplicity. The reason was that LPeg table captures would have to be tweaked to set a metatable for consistency, which was a bit ugly. It also means that passing these cooked tables into plain Lua code might lead to surprises. I think we'd need some more motivating reasons for this sort of change.

*

A more interesting feature: You have implemented default
parameters now, which is cool. But it would be nice to support
named parameters at least a static variant of it (so, no
dictionaries with named arguments passed around like in Python).
E.g. if I have this:
function f(a=1, b="Name", c=3.0) print "a=${a} b=${b} c=${c}" end

it should be possible to invoke it like:

f(a=5) f(b='World', a=7) f(c=1.2345) f(10, c=88.99)

Obviously, positional arguments cannot occur in the invocation after named ones. The implementation should be almost trivial, I guess. But it makes life much easier. What do you think?

This is harder than it seems. There are two ways of doing this. 1) early binding. 2) debug.getinfo

1) With early binding, the compiler would need to adjust the parameters at the call site. In order to do this,it would need to fetch the function signature at compile time so that it'd know the order of the parameters. This, in turn, means that each module will have to export static information (function signatures) along with its code so that the compiler can pull this out without actually running the code. Obviously this won't work if I'm calling into Lua.

2) Use Lua 5.2 debug info, where you can do it dynamically. So f(a=5) would compile to (the equivalent of):

local info = debug.getinfo(f, 'u')
local args = { }
for i=1, info.nparams do
   local name = debug.getlocal(f, i)
   if name == 'a' then
      args[i] = a
   end
end
f(unpack(args, 1, info.nparams))

As you can see though, even this would mean creating a temporary table for the call (the debug info can be cached). You also can't get around the short for loop, and LuaJIT doesn't like short loops. So all-in-all, you're just better of passing in a table in this case.

If we went with 1), I'd need to add an extern (Lua) ... end construct which would tell the compiler to use Lua-style bindings within the block to maintain interoperability with Lua code.

This is all a can of worms which would dramatically increase complexity. Early binding gives you certain benefits, of course. You can implement lazily evaluated parameters, and such, but I think the engineering cost outweighs the value that it brings. Certainly at this stage in the project.

Unless you can come up with some brilliant idea for doing this, from where I'm sitting it all looks kinda hard ;)

— Reply to this email directly or view it on GitHub https://github.com/richardhundt/nyanga/issues/38#issuecomment-36830933.

richardhundt commented 10 years ago

On 3/5/14 3:28 PM, romix wrote:

OK. Works now!

BTW, I'm wondering if open-ended slices like in Python make sense? e.g. |a[3..]| or |a[..10]|?

This is just another way of saying: a[3..#a-1] and a[0..10].

However, ranges are objects. So r = 3.. would have to mean r = Range(3, math.huge). Could do it, but I'd then probably need to change the syntax to r = [3..] to make the parser work, and then split up parsing ranges and parsing slices (otherwise you end up with a[[3..]] which is ugly).

And what about negative indexes? In Lua they are OK. In Python they mean counting from the end, i.e. a[-i] means a[a.length-i]?

This should be working now. Let me know if it's not.

— Reply to this email directly or view it on GitHub https://github.com/richardhundt/nyanga/issues/38#issuecomment-36746902.

romix commented 10 years ago

You are right. Coming mostly from a strongly-typed, statically compiled world, I tend to forget from time to time that things are done differently in the very dynamic Lua world :-)

romix commented 10 years ago

OK. Negative indexes work now. But if I use them in ranges, it is not quite correct yet , IMHO:

nyanga> a = [1,2,3,4,5]
nyanga> print a[-3..-1]
3, 4, 5, 1
nyanga> print a[2..-2]
3, 4, 5
nyanga> print a[2..-1]
3, 4, 5, nil
richardhundt commented 10 years ago

On 3/6/14 9:34 AM, romix wrote:

OK. Negative indexes work now. But if I use them in ranges, it is not quite correct yet , IMHO:

nyanga> a = [1,2,3,4,5] nyanga> print a[-3..-1] 3, 4, 5, 1 nyanga> print a[2..-2] 3, 4, 5 nyanga> print a[2..-1] 3, 4, 5, nil Should be workling now. Thanks!

— Reply to this email directly or view it on GitHub https://github.com/richardhundt/nyanga/issues/38#issuecomment-36834695.

richardhundt commented 10 years ago

On 3/6/14 9:34 AM, romix wrote:

OK. Negative indexes work now. But if I use them in ranges, it is not quite correct yet , IMHO:

nyanga> a = [1,2,3,4,5] nyanga> print a[-3..-1] 3, 4, 5, 1 nyanga> print a[2..-2] 3, 4, 5 nyanga> print a[2..-1] 3, 4, 5, nil Should be working now. Thanks!

— Reply to this email directly or view it on GitHub https://github.com/richardhundt/nyanga/issues/38#issuecomment-36834695.

romix commented 10 years ago

Yep. Ranges with negative indexes work now. Thanks!

BTW, current build is broken:

../boot/bin/ngac -g -n "data.queue" data/queue.nga  /Users/roman/Documents/Private/src/git/nyanga/tmp/nyanga/build/data_queue.o
../boot/bin/ngac -g -n "data.buffer" data/buffer.nga  /Users/roman/Documents/Private/src/git/nyanga/tmp/nyanga/build/data_buffer.o
tvmjit: src/ngac.lua:96: data/buffer.nga: No such file or directory
stack traceback:
    [C]: in function 'assert'
    src/ngac.lua:96: in function 'start'
    ../boot/bin/ngac:2: in main chunk
    [C]: at 0x01000010f0
richardhundt commented 10 years ago

On 3/7/14 9:06 PM, romix wrote:

Yep. Ranges with negative indexes work now. Thanks!

BTW, current build is broken:

../boot/bin/ngac -g -n "data.queue" data/queue.nga /Users/roman/Documents/Private/src/git/nyanga/tmp/nyanga/build/data_queue.o ../boot/bin/ngac -g -n "data.buffer" data/buffer.nga /Users/roman/Documents/Private/src/git/nyanga/tmp/nyanga/build/data_buffer.o tvmjit: src/ngac.lua:96: data/buffer.nga: No such file or directory stack traceback: [C]: in function 'assert' src/ngac.lua:96: in function 'start' ../boot/bin/ngac:2: in main chunk [C]: at 0x01000010f0 Thanks! Took a while to fix because I had a lot of new code on the cooker (net.http namespace), but it should be working now.

— Reply to this email directly or view it on GitHub https://github.com/richardhundt/nyanga/issues/38#issuecomment-37062414.

romix commented 10 years ago

Hi Richard,

Current trunk does not build for me:

../boot/bin/ngac -g -n "net.socket" net/socket.nga  build/net_socket.o
../boot/bin/ngac -g -n "net.http" net/http.nga  build/net_http.o
tvmjit: src/lang/translator.lua:128: nyanga: net/http.nga:255: "HttpResponse" used but not defined

stack traceback:
    [C]: in function 'error'
    src/lang/translator.lua:128: in function 'abort'
    src/lang/translator.lua:197: in function 'close'
    src/lang/translator.lua:1265: in function 'translate'
    src/ngac.lua:109: in function 'start'
    ../boot/bin/ngac:2: in main chunk
    [C]: at 0x0100001110

BTW, small proposal for a REPL: It would be nice if it could be a little bit more user-friendly. I.e. if one could go through the history by arrow keys, etc.

Any progress on any other features in the during last week?

richardhundt commented 10 years ago

BTW, small proposal for a REPL: It would be nice if it could be a little bit more user-friendly. I.e. if one could go through the history by arrow keys, etc.

yeah, I'm thinking of pulling in linenoise, but that's a bit cosmetic atm.

Any progress on any other features in the during last week?

Yep, I'm accumulating a pretty big commit here. Reworking the async/io stuff and removing Joyent's http parser. I really don't like callback-based interfaces :(

Stay tuned :)

romix commented 10 years ago

Ah, the new async/io changes are there. Cool!

One thing I noticed:

richardhundt commented 10 years ago

On 3/20/14 11:27 AM, romix wrote:

Ah, the new async/io changes are there. Cool!

It still needs some work, but without much optimization test/httpd.nga does about 14k req/sec when benchmarked with wrk -d 10 -c 500 -t 12 http://127.0.0.1:8000/ versus about 11k req/sec for node.js, so it's pretty fast ;-)

One thing I noticed:

  • test/system.nga is broken. There are a few problems with imports. Once they are fixed, there is a problem with |await async => ....|. It looks like you removed await from Nyanga, right? What is the alternative now? How can you wait for results of an async computation?

If you want to wait for results then you call join on the fiber.

answer = async =>
   return 42
end
print answer.join()

I've updated test/system.nga to reflect this

I'm still toying with the idea of making async and await keywords in the language. However this means pulling in the io loop into the core, which makes it all less modular.

— Reply to this email directly or view it on GitHub https://github.com/richardhundt/nyanga/issues/38#issuecomment-38152234.

romix commented 10 years ago
richardhundt commented 10 years ago

On 3/20/14 1:15 PM, romix wrote:

*

Thanks for explanation regarding |join|. I understand your concern
regarding async/await as keywords vs library approach. IMHO,
having coroutines and generators in the language makes it in many
cases unnecessary to include async/await into a language, because
they can be easily expressed using coroutines. Or do you see any
specific benefits (e.g. optimizations, CPS-like transformations,
etc) from making them part of the language?

Nah, we have coroutines, so we don't need CPS transformations. I was thinking more along the lines of sugar, so:

await async foo.do_something()

which would desugar to:

async(function() return foo.do_something() end).join()

The key idea being that you don't see the closure around the call. Then again, I could just put back the await function and you can do what you did before:

await async => foo.do_something()

Or make await a bit smarter and auto-detect if it's given a plain function and just wrap that for you: await => ...

Which is only 3 characters more and makes it clear that you're creating a closure. So yeah, I think I'm gonna do that. Keep a small core and put the rest in libraries. The syntax is flexible enough

*

Regarding your http performance figures: Quite impressive.

BTW, one interesting use-case/test-case that comes to my mind is
to implement something like Redis using Nyanga. After all, Nyanga
has tables, arrays, etc. And HTTP part is also in place now. So,
assuming that implementing Redis protocol parser does not take too
long, because it is so dead simple (and one could use an existing
C implementation for it), it is all about non-blocking processing
of incoming command and sending responses. The actual processing
of commands is as difficult as adding a key/value to a table or
reading a value for a given key and so on.

Building an in-memory store would be interesting, but I'd probably go for persistence like NDS from the start. You could do this by building it on top of tokyo cabinet, or kyoto cabinet. In fact there was an early version of NDS which was backed by kyoto cabinet before they switched to lmdbm.

*

It would be cool if the outcome of the exercise would be something
like: "Redis in Nyanga is only X% slower than real Redis, but
Nyanga's Redis implementation is 100 times smaller" ;-)

Are you sure it will be slower ? ;-) It's funny, the memory usage is 8mb vs 22mb for Nyanga vs node.js respectively, for 1000 concurrent connections (less with fewer connections and smaller buffer sizes, around 5mb with 100 connections). In this case it's both smaller and faster.

Btw, we now have a mailing list: www.freelists.org/lists/nyanga - if you're quick you can be the first subscriber besides myself :P

Next up for libraries, though is:

On the language side:

Also, check out: ./test/type_guards.nga. You can extend this idea with a like function which would construct an object which compares its members, giving you structural type matching:

if o is like { x = 'number', y = 'number' } then
   ...
end

-- or a faster cached version

PointLike = like { x = 'number', y = 'number' }
if o is PointLike then
   ...
end

Let me know what you think.

romix commented 10 years ago

+1 for NDS. I had that in mind, but didn't want to go too far with my wishes ;-)

BTW, why not lmdbm? I think it has a more liberal license. But may be I'm wrong.

It would be cool if the outcome of the exercise would be something like: "Redis in Nyanga is only X% slower than real Redis, but Nyanga's Redis implementation is 100 times smaller" ;-)

Are you sure it will be slower ? ;-)

Of course not ;-) I meant the possible worst case. I.e. even in the worst case it is only X% slower. But the source code size is 100 times smaller and much cleaner.

richardhundt commented 10 years ago

On 3/20/14 3:31 PM, romix wrote:

+1 for NDS. I had that in mind, but didn't want to go too far with my wishes ;-)

BTW, why not lmdbm? I think it has a more liberal license. But may be I'm wrong.

lmdbm has a crappy API (even the redis developers complained about it). Kyoto Cabinet has fragmentation issues, but it's faster. Actually Tokyo Cabinet (the predecessor to Kyoto Cabinet) is faster still (I get 2M+ ops/sec) and has some really nice features (full-text reverse index and you can store Lua tables directly without worrying about serialization).

    It would be cool if the outcome of the exercise would be
    something like: "Redis in Nyanga is only
    X% slower than real Redis, but Nyanga's Redis implementation
    is 100 times smaller" ;-)

Are you sure it will be slower ? ;-)

Of course not ;-) I meant the possible worst case. I.e. even in the worst case it is only X% slower. But the source code size is 100 times smaller and much cleaner.

:)

*

Regarding the mailing list: May I ask why not Google Groups? I
think it is more popular these days. And it also does not bombard
you with all messages. You can simply visit groups.google.com and
watch what I like.

The last time I tried Google Groups I was getting a lot of spam bots selling me viagra and penis enlargement therapy. Maybe they've tightened it up by now. It was a few years back. I'm also old fashioned (and oldish).

*

type_guards.nga is interesting, yes. I'd like to understand better
your idea about structural type matching and the 'like' construct.
I'm not sure I understood the proposed semantics of it.

Well, it'd be just like the enum example, but in the __istype handler it iterates over the subject and checks its members type and returns false if there's a mismatch. I've updated test/type_guards.nga with a naïve implementation to make it clearer.

*

But I was surprised to see "a is number" in parameter
declarations. I never realized Nyanga supports it. This gives us
the possibility to have optional typing for parameters, right? If
so, I'd suggest extending the idea to variables as well. I.e. if I
provide a type for a variable, then when I perform assignment
Nyanga should check (at runtime) that I assign something compliant
with this type. More over, it would be nice to have an option to
switch these checks off. One could then use type checks when
developing/debugging and eventually switch off type checks for
releases (if performance hit due to type checks is too high).

Yep. I've been thinking about this. All it means is that the compiler wraps assignments in assertions, so it's not hard to do. I just wanted to get some feedback on the argument checking first. I like the idea of being disabling assertions for a release build.

— Reply to this email directly or view it on GitHub https://github.com/richardhundt/nyanga/issues/38#issuecomment-38173412.

romix commented 10 years ago

Hi Richard,

The usual ping: Trunk seems to be broken again :-) I cannot run any single file. I always get something like this:

./build/nyanga sample/while.nga 
Error: .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:113: sample/while.nga:4: unexpected symbol near +
stack traceback:
    [C]: in function 'error'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:113: in function '_lexerror'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:117: in function 'syntaxerror'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:718: in function 'primaryexpr'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:728: in function 'suffixedexp'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:1163: in function 'exprstat'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:1241: in function 'statement'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:567: in function 'statlist'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:884: in function 'block'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:957: in function 'whilestat'
    ...
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:884: in function 'block'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:957: in function 'whilestat'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:1199: in function 'statement'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:567: in function 'statlist'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:1247: in function 'mainfunc'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:1260: in function 'translate'
    .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:1275: in main chunk
    [C]: in function 'require'
    src/main.lua:33: in main chunk
    [C]: at 0x01022649d0

I did the usual steps, but they do not help:

make clean
make 
sudo make install
richardhundt commented 10 years ago

On 3/28/14 10:18 AM, romix wrote:

Hi Richard,

The usual ping: Trunk seems to be broken again :-) I cannot run any single file. I always get something like this:

./build/nyanga sample/while.nga Error: .../Private/src/git/nyanga/deps/tvmjit/src/lua/lunokhod.lua:113: sample/while.nga:4: unexpected symbol near + stack traceback:
I can't reproduce. Can you try a `sudo make uninstall && make realclean
&& makein case you have the oldnyanga.so` lying around?
richardhundt commented 10 years ago

I've just realized; this looks like you have an old version of lunokhod.lua lying around. The Makefile wasn't doing a git submodule update --init deps/tvmjit. Fixed in git head.

romix commented 10 years ago

Thanks! Works again now! :-)

BTW, I had a brief look at your recent changes: serialization, threads, channels, data exchange between threads. Very nice!

I think it is getting closer to the turf of Go and Erlang, which is very good. It would be nice to add a few samples ported from Go or Erlang to demonstrate how one can write apps using their style. May be the infamous ring benchmark and a few more things?

A few thoughts about those concurrency issues:

Let me know what do you think about these proposals?

richardhundt commented 10 years ago

On 3/28/14 11:19 AM, romix wrote:

Thanks! Works again now! :-)

BTW, I had a brief look at your recent changes: serialization, threads, channels, data exchange between threads. Very nice!

I think it is getting closer to the turf of Go and Erlang, which is very good. It would be nice to add a few samples ported from Go or Erlang to demonstrate how one can write apps using their style. May be the infamous ring benchmark and a few more things?

Yep that's the plan, basically. However, we need to separate Go and Erlang, as they have different concurrency models. Go is based on Communicating Sequential Processes (CSP) whereas Erlang uses the Actor Model. The Actor Model is an event-driven system at its core, whereas CSP uses synchronization primitives (channels). This makes Nyanga more like Go. The two are theoretically equivalent in that you can implement actors using CSP and vice versa. Nyanga is less strict than either though. Since channels can be passed though channels, the topology isn't fixed, so it's not pure CSP either, but rather lets you build CSP-style applications. And you can definitely build an actor system on top of it.

A few thoughts about those concurrency issues:

*

Lua/Nyanga have very nice support for coroutines, which makes it
pretty easy to mimic Go and Erlang. More over, each coroutine has
its own LuaState and hence its own heap, or? One of the Erlang's
benefits is that processes (i.e. coroutines) share nothing. Each
has its own heap. Each such heap is GCed separately from others.
Due to this the time for GC is very low, as it does not need to
process huge heaps.

Each thread has it's own global Lua state, but coroutines in a given thread share a global state. This means that there is one GC per thread, but there are N coroutines per GC. It doesn't really matter, because coroutines are about as cheap as closures, so you can create thousands. The main limitation is that you only get 1GB per OS process on x64 with LuaJIT, no matter how many global states or threads you have. The new GC is supposed to fix that, but not sure whether Mike got the funding for it and whether it'll get shipped any time soon. Either compile in 32 bit mode, and only get 4GB, or keep lots of data off the LuaJIT heap (i.e. by using ffi, cdata and an mmap based malloc such as jemalloc).

But yes, traversing one LuaJIT heap per thread beats a global GC lock like in some languages.

On the shared-nothing side: coroutines share their upvalues. If you look at the loop in samples/tcp_server.nga you'll see that client is accepted in the main coroutine, but accessed from the inner one directly. Note that although they're nested, the inner coroutine is not a child of the outer one. The are scheduled as peers. They do, however, share the some scope.

Threads are a little different. They really are shared nothing, which is needed because the Lua VM is re-entrant, but not threadsafe. So upvalues are serialized. The serializer has a hook, though, so synchronization primitives (mutexes and condition variables) can pass their cdata pointers through the thread boundary (same applies to the ØMQ context), but by default nothing is shared.

Serialization isn't dirt cheap either. I can pump about 100k messages / second through a pipe between two threads, whereas channels can move 10's of millions. So you're better off with a small number of "worker" threads, each of which runs a bunch of coroutines. I want to introduce "tasks", which are shared-nothing coroutines which can be spawned in a thread-pool for exacly this purpose.

*

It would be nice to provide a library with some Erlang features
related to fault tolerance and the like. E.g. links between
parents and children, ability to detect when child process has
crashed, supervisors, etc.

This is where CSP and Actor systems diverge quite a bit. Actor systems are hierarchical. You get a tree, basically (or DAG, or whatever you want to call it). Actors have 1 parent (except the root actor) and N children. In CSP all the processes are peers. There's no concept of ownership. Actors are also addressable. CSP processes aren't. You have channels instead.

However, building an actor system isn't that hard and you might be able to do it with coroutines too, although a simple react callback which is called when a message is delivered to the mailbox should be enough. I would build it starting with the I/O loop too, just as with the fiber scheduler.

*

It would be also nice to show that any coroutine (process) can
crash, but others are not affected. Is it the case now? Or the
whole app will crash in this case?

The app crashes, but explicitly. What I mean is that coroutine.resume(coro) doesn't raise an exception if there's an error, but assert(corotine.resume(coro)) does, and I'm explicitly using an assert. I've been meaning to add an on_error handler to fibers which defaults to raising an error which you can override. I guess it's time to add that :)

*

Another interesting thing in Erlang is that no single process
(coroutine) can consume all of the CPU, even by mistake. There is
no explicit yield/resume there, so it is impossible to forget
yielding. It would be nice to (optionally) allow the same mode of
operation. I.e. any(all) coroutine configured with this kind of
(preemptive) multithreading would automatically yield after a
configurable time interval. This can be achieved either at very
low-level (needs modifications to LuaJIT?) or by Nyanga
automatically injecting some checks at the basic block entries or
at function entries (such a check would test if it is time to
yield or not and yield if required). Obviously, if a native call
is currently in progress, it most likely cannot be interrupted in
a general case without complications. So, one should wait until it
is over.

I've seen implementations which use debug.sethook on line events and give each coroutine N lines to run before forcing a yield. I'm not so keen on this idea. These hooks themselves will consume a ton of CPU for little benefit. In Nyanga, you have other ways of constraining the amount of work done by each fiber, without an explicit yield. The easiest is to limit the size of channels. Saying chan = Channel(1) gives you an unbuffered channel. It can hold only one value and an attempt by a producer coroutine to add more data makes it suspend until a consumer has taken the data out. For I/O this can be done by limiting read buffer sizes.

I think for the most part, the applications Nyanga is designed to solve will be mostly I/O bound. I want to write high performance, low footprint, scalable distributed or network centric applications with it. The httpd.nga sample shows this idea of using sources, sinks and filters, and just letting the data flow through the system in chunks, by connecting little processes up using channels or pipes.

Let me know what do you think about these proposals?

You're definitely thinking along the right lines. These concurrency models have been a focus for Nyanga and foremost in my mind when designing the runtime libraries for a while now. So to summarize, eventually Nyanga can have an actor system for really robust self-healing "let-it-crash" style applications, but for now it's more CSP, like Go (or New Squeak, or Limbo, or other stuff by Rob Pike).

By the way, I'm thinking of renaming Nyanga. Nyanga is kinda hard to pronounce and I didn't care that much about it when I started (it was just a quick experiment to toy with Lua's syntax). Now, though, it's growing up and I want to have a good name for it. The name I came up with is "Shine". It's still a nod to Lua's moon theme, and "moonshine" is a homemade alcoholic beverage. And it's shiny :)

What do you think?

romix commented 10 years ago

Richard, thanks a lot for your very elaborative answers!!! Very interesting and informative.

Serialization isn't dirt cheap either. I can pump about 100k messages / second through a pipe between two threads, whereas channels can move 10's of millions.

I see. I'm wondering if it could be improved. One trick that Erlang uses is based on structural sharing. If the same term (i.e. string or symbol) is referred multiple times in different messages, it is serialized only first time and later on only a reference to it (i.e. an integer ID) is sent instead. The cache for terms is limited in side. If you need to put a new entry into it then the oldest one or the least recently used one is evicted and cannot be referred anymore.

It could be that more general structural sharing ideas could be used to refer to object graphs that were seen before.

Of course, it does not cure all problems. If you send tons of data and most of it is unique then you would have problems anyway, I guess.

So to summarize, eventually Nyanga can have an actor system for really robust self-healing "let-it-crash" style applications, but for now it's more CSP, like Go

Yes. That was clear to me that Nyanga is more like Go when I was writing my previous message. I was simply thinking about the future.

The name I came up with is "Shine". It's still a nod to Lua's moon theme, and "moonshine" is a homemade alcoholic beverage. And it's shiny :)

Hmm. Not too bad.

Here are a few ideas from my side:

What do you think?

richardhundt commented 10 years ago

On 3/28/14 1:47 PM, romix wrote:

Richard, thanks a lot for your very elaborative answers!!! Very interesting and informative.

Serialization isn't dirt cheap either. I can pump about 100k
messages /
second through a pipe between two threads, whereas channels can move
10's of millions.

I see. I'm wondering if it could be improved. One trick that Erlang uses is based on structural sharing. If the same term (i.e. string or symbol) is referred multiple times in different messages, it is serialized only first time and later on only a reference to it (i.e. an integer ID) is sent instead. The cache for terms is limited in side. If you need to put a new entry into it then the oldest one or the least recently used one is evicted and cannot be referred anymore.

But in which Lua state is it cached? I mean, it could be cached as a blob on the C heap, but then you're still back to allocating strings in the Lua state which receives the message. However, the idea is not all bad. I was thinking of creating a shared global lua_State which is protected by mutexes and used for exchanging data between states. A kind of shared heap.

There are other things one can do too. Mozilla's Rust language has some interesting ideas for passing values between threads using "boxes". Boxes are allocated by the sender, then treated as immutable, and the receiver is responsible for freeing them. You could create tagged values for the boxes:

ffi::cdef"""
typedef struct box {
   int type;
   size_t size;
   union {
      char*   strval;
      double numval;
     /* ... etc */
   } u;
} box_t;
"""

It gets a little trickier with tables. You'd probably need a custom sharable hashmap implementation which mimics Lua's native tables in behaviour, but is allocated on the C heap and does its own garbage collection (reference counting would be simplest).

You could also use a fast data-store like Kyoto Cabinet and get a kind of "software transactional memory on disk". All of this, however, can be done in libraries. Not sure if this stuff should ship with the language. Like Akka is part of the Scala ecosystem, without being core (ØMQ bindings are moving to a separate github project, for example).

I could spend several months working on this kind of thing, and it's all really interesting, but right now I need to stabilize the language and finish the reference manual (hopefully done by early next week).

Here are a few ideas from my side:

Thanks for the suggestions! I'm a little hesitant to go with a name containingscript. There's already moonscript, and in my head, Nyanga is to Lua, what C++ is to C. So it's more of a big brother.

I'll probably end up going with "Shine" anyway. The packaging system can be called "beam" and a package can be a "quanta" or a "photon". Yeah... we need a build and packaging system... maybe use luarocks as a base. Damn, there's so much to do, I'll better get coding ;)

romix commented 10 years ago

But in which Lua state is it cached? I mean, it could be cached as a blob on the C heap, but then you're still back to allocating strings in the Lua state which receives the message. However, the idea is not all bad. I was thinking of creating a shared global lua_State which is protected by mutexes and used for exchanging data between states. A kind of shared heap.

First of all, I meant this atom cache thingy in Erlang when I proposed this optimization: http://erlang.org/doc/apps/erts/erl_ext_dist.html

Atoms are immutable constants in Erlang. More over, as far as I understand, on the same Erlang VM there is a single area containing atoms. It is shared between all "logical" processes running on this VM. It is safe because these atoms are immutable.

But in principle, each state could have its cache. And I'd say it is for each channel, because different channels may send different things of different structure. So, no need to share the cache between states. Each state builds its own when it receives data. And each end of the channel knows what it has seen already and can refer to the previously seen things by id/offset/etc.

But as you said, it is just an optimization. It is not so important at the moment. If it requires too much time to implement, this can be done much later.

I'll probably end up going with "Shine" anyway. The packaging system can be called "beam" and a package can be a "quanta" or a "photon". Yeah... we need a build and packaging system... maybe use luarocks as a base. Damn, there's so much to do, I'll better get coding ;)

Don't overdo it! :-) build and packaging systems are nice to have, but I'd say they are of a secondary priority. Language and core libs are the most important things to get right first. Then one can start working on the rest of the infrastructure.

BTW,

I've seen implementations which use debug.sethook on line events and give each coroutine N lines to run before forcing a yield. I'm not so keen on this idea. These hooks themselves will consume a ton of CPU for little benefit.

I know this first hand. I actually implemented something like this for Lua (by modifying Lua VM a bit). It was not too slow, but had a measurable overhead. And it worked only on Lua 5.2, not on LuaJIT, because LuaJIT does not support those hooks in the same way.

But my proposal is different. I do not suggest counting lines. I suggest something like this:

In some situations where it statically can be determined that a loop has just a few iterations (i.e. it cannot take too long), one can omit inserting those checks as they would not help anyway. I hope I managed to explain the idea. I did actually something along these lines for Java using bytecode instrumentation it was quite OK. Even if your Java code would contain tight loops, your Java thread would not occupy all of a CPU.

richardhundt commented 10 years ago

On 3/28/14 4:08 PM, romix wrote:

But in principle, each state could have its cache. And I'd say it is for each channel, because different channels may send different things of different structure. So, no need to share the cache between states. Each state builds its own when it receives data. And each end of the channel knows what it has seen already and can refer to the previously seen things by id/offset/etc. Caching inside thread::Pipe might be worth doing. Thanks for the idea. Don't overdo it! :-) build and packaging systems are nice to have, but I'd say they are of a secondary priority. Language and core libs are the most important things to get right first. Then one can start working on the rest of the infrastructure. Well, I'm kinda hoping that if I throw a packaging system out there and get a user base going, that others can expand the libraries :-) But yeah, first things first... But my proposal is different. I do not suggest counting lines. I suggest something like this:

  • a flow of execution may eat a lot of time without yielding typically due to explicit looping or indirect looping via recursive calls or something like this.
  • So, let's insert checks at the entries of basic blocks that are at the start of a loop and at entry points of functions.
  • each check does something like |if must_yield_flag then yield() end|
  • the |must_yield_flag| flag can be set from outside, e.g. by a scheduler based on elapsed time, etc.

Umm there's the catch. Nothing can set the flag from the outside unless it is running. When the scheduler passes control to the coroutine, the scheduler is suspended... it's concurrent, but not parallel. There's nothing stopping you from creating a single coroutine, putting that into the scheduler and having it busy loop without ever passing control back.

The only way around this that I can see is that each coroutine would need to keep its own clock, but I think these things should be solved by the application and not by the libraries. If you're starving other coroutines, and it's a problem, simply don't do it. Call yield.

Moreover, context switches have a non-zero cost, so you're better off tuning your application by yielding explicitly, than relying on some heuristics enforced on you by the runtime. What if you just want your coroutine to do its thing until its finished, instead of paying the cost of switching constantly? Response time is not always a priority. When it is, then it's usually a client/server application which is probably going to be I/O bound anyway.

That was the reasoning for having threads in the first place... if it's CPU intensive, then run the computation in a thread. The only thing missing is to have a pipe-to-channel bridge which suspends the calling coroutine during get and not the calling thread.

Actually, now that I think about it, that really needs to happen inside the sys::thread::Pipe implementation anyway (by checking if being called from the main coroutine, or not and just blocking the main thread if there are no other coroutines or events to be serviced. That may be non-trivial.

  • Obviously, there is no need to perform any expensive checks on each iteration of a loop. One can do it every N iterations...

In some situations where it statically can be determined that a loop has just a few iterations (i.e. it cannot take too long), one can omit inserting those checks as they would not help anyway. I hope I managed to explain the idea. I did actually something along these lines for Java using bytecode instrumentation it was quite OK. Even if your Java code would contain tight loops, your Java thread would not occupy all of a CPU.

I think I understand what you're saying. I just think that if you want preemption, then use threads, otherwise know that fibers/coroutines are scheduler cooperatively, in which case you should let them cooperate, and some of the objects are smart enough to do it for you (channels, sockets, file streams, and semaphores).

romix commented 10 years ago

I just think that if you want preemption, then use threads, otherwise know that fibers/coroutines are scheduler cooperatively, in which case you should let them cooperate, and some of the objects are smart enough to do it for you (channels, sockets, file streams, and semaphores).

Well, I'm after preemption a-la Erlang. Erlang allows for thousands or even millions of processes on single VM. Essentially, each process as light-weight as a coroutine. But none of those processes need to yield explicitly and none of them can eat all of the CPU time. Typically, they wait for a new event to arrive - that's easy to implement with Nyanga. But sometimes they may do something very CPU-intensive, i.e. they do not wait and just do something all of the time. In this case, Erlang still preempts them after some time. May be it counts something (e.g. number of executed VM instructions) or may be it performs some checks like I proposed. And I'd like to have something like what I just described. Threads do the job, but how many threads can you start? Certainly not 100000 or a million...

romix commented 10 years ago

Caching inside thread::Pipe might be worth doing. Thanks for the idea.

Yes. Small remark: Keep in mind that in Erland the sender and receiver can be on different machines. Therefore I guess it caches on each end of the pipe separately instead of using a single cache for both ends...

richardhundt commented 10 years ago

On 3/28/14 6:12 PM, romix wrote:

Caching inside |thread::Pipe| might be worth doing. Thanks for the
idea.

Yes. Small remark: Keep in mind that in Erland the sender and receiver can be on different machines. Therefore I guess it caches on each end of the pipe separately instead of using a single cache for both ends...

This is exactly why I originally had ØMQ as a core dependency. I knew it would come up sooner or later :)

I might put it back in, but I'm kinda happy keeping a smaller core with a light-weight alternative and then releasing a ØMQ-based scheduler and threading library as a separate module. There we can go mad with implementing a heavy duty actor system, with network transparency and all the rest.

For now I'd like to see how far I can get with a lightweight alternative. Sockets present a similar interface to channels, so it's not hard, but they're duplex and channels aren't (duplex channels are actually pretty easy to implement too). I just don't want to end up re-implementing ØMQ. No point.

richardhundt commented 10 years ago

On 3/28/14 6:02 PM, romix wrote:

I just think that if you want preemption, then use threads,
otherwise know that fibers/coroutines
are scheduler cooperatively, in which case you should let them
cooperate,
and some of the objects are smart enough to do it for you (channels,
sockets, file streams, and semaphores).

Well, I'm after preemption a-la Erlang. Erlang allows for thousands or even millions of processes on single VM. Essentially, each process as light-weight as a coroutine. But none of those processes need to yield explicitly and none of them can eat all of the CPU time. Typically, they wait for a new event to arrive - that's easy to implement with Nyanga. But sometimes they may do something very CPU-intensive, i.e. they do not wait and just do something all of the time. In this case, Erlang still preempts them after some time. May be it counts something (e.g. number of executed VM instructions) or may be it performs some checks like I proposed. And I'd like to have something like what I just described. Threads do the job, but how many threads can you start? Certainly not 100000 or a million...

Nope. But 100000 coroutines is feasible. Unlike Erlang, this stuff isn't built into the language. There's nothing to stop somebody from knocking themselves out implementing something like this. The question is whether it should be shipped with the standard libraries or not. I'll see what I can cook up when I ship the ØMQ bindings (called zsys because it includes all the czmq utilities as well, so it's more like an alternative to the sys and async namespaces).

Anyway, thanks for all your feedback so far. You've given me plenty to think about.

romix commented 10 years ago

Trunk seems to be broken. Just one of the failures:

shine sample/array.shn 
Error: Unexpected end of input
stack traceback:
    [C]: in function 'error'
    [string "shine.lang.tree"]: in main chunk
    [C]: in function 'parse'
    [string "shine.lang.loader"]: in function 'loadchunk'
    [string "shine"]: in main chunk
    [string "shine"]: in main chunk
    [C]: at 0x010a6a1330

I even did a new git clone into an empty directory to be sure that I don't have any old files...

richardhundt commented 10 years ago

On 4/3/14 7:30 PM, romix wrote:

Trunk seems to be broken. Just one of the failures:

shine sample/array.shn Error: Unexpected end of input stack traceback: [C]: in function 'error' [string "shine.lang.tree"]: in main chunk [C]: in function 'parse' [string "shine.lang.loader"]: in function 'loadchunk' [string "shine"]: in main chunk [string "shine"]: in main chunk [C]: at 0x010a6a1330

I even did a new git clone into an empty directory to be sure that I don't have any old files...

I thought of warning you... basically I was seeing exponential parsing times for deeply nested expressions (like 22 seconds for a 10 line test), so I had to make some really drastic changes to the parser. Destructuring in local declarations is also broken.

Basically, right now only the => <block> end form of short functions is supported, not the => <expr> (so you need a full function body).

I'm still on the case, basically, but thanks for the heads up.

romix commented 10 years ago

I thought of warning you... basically I was seeing exponential parsing times for deeply nested expressions (like 22 seconds for a 10 line test), so I had to make some really drastic changes to the parser.

Ah, OK. Now I understand. PEG strikes back! :-)

On the theoretical side these times are probably a result of the fact that LPEG does not cache lookahead like many other PEG parsers. Therefore it parses the same thing many, many times... But shouldn't it be possible to limit backtracking somehow?

richardhundt commented 10 years ago

On 4/3/14 8:01 PM, romix wrote:

I thought of warning you... basically I was seeing exponential parsing
times for deeply nested expressions (like 22 seconds for a 10 line
test), so I had to make some really drastic changes to the parser.

Ah, OK. Now I understand. PEG strikes back! :-)

Yeah, quadratic complexity FTL :(

Anyway, hopefully it's all working now (I went through all of the samples). Actually, I should be able to re-instate the super short function with expression syntax again. I just need a breather.

On the theoretical side these times are probably a result of the fact that LPEG does not cache lookahead like many other PEG parsers. Therefore it parses the same thing many, many times... But shouldn't it be possible to limit backtracking somehow?

Tried everything I could think of. I must have put about 16 hours into it. I event tried forcing evaluations using match-time captures. I'm a little surprised actually. It wasn't lookahead assertions causing the problem. It was the following pattern (simplified, obviously):

infix_expr <- <term> <binop> <term> / <term>

So it doesn't seem to cache descent on right recursion, if I've understood it at all. I would have thought it would figure out that it had seen and just backtrack to before . It could also have been the way I mixed it with and all the rest. Basically I just flattened all that stuff out and I'm doing more folding in src/lang/tree.lua.

I nearly thought I'd have to use LPeg just as a dumb lexer and pretty much hand-craft a parser. But it's getting there again, and it is noticeably faster all round.

richardhundt commented 10 years ago

Just a heads up.

Shine now has decorators.

They work exactly the way they do in croc. They apply to function, class, module, grammar and local declarations.

I haven't documented them yet, but there's ./sample/decorators.shn for starters.

Have fun :)

romix commented 10 years ago

Regarding decorators: Very nice! I like it. I'd suggest adding (or describing on Wiki) more meaningful examples from http://jfbillingsley.com/croc/wiki/Lang/Decorators to the decorators.shn , e.g. function call counter for functions, toString generator for classes (to show that decorator may change the class), etc.

Question: As far as I understand, decorators are not preserved as meta-information once they are applied. I'm thinking if it would help to be able to do so? Java has annotations which are preserved at runtime, so that you can do introspection and detect them. It is often used in many scenarios (e.g.. custom serialization to/from XML using JAXB, exposure of your classes as a Web Service and many others). So, I'm wondering, if Shine could support something like this at least for class fields. I guess doing it for functions or variables is problematic as normally no meta-information is associated with them, i.e. there is no place to store this information, or?

Regarding variable declaration decorators: right now they get only assigned values as parameters. Would it also be possible to provide variable names as parameters to it? It would be also nice to provide a name for a function or a class, if it is not anonymous...

Coming back to parser issues: It is a bit pity that the short lambda syntax is not possible now... As far as I understand you, the problem is that LPEG parser sometimes exposes excessive lookahead and/or backtracking, e..g when parsing expressions. I think that some typical workarounds or tricks should exist for such cases. May be you should ask on the Lua mailing list how such typical situations are supposed to be solved by LPEG? I guess Roberto & Co should be able to provide their expert advice. And it would be useful to understand how to limit lookahead/backtracking for the future, when new syntactical elements are added to the parser. What do you think?

richardhundt commented 10 years ago

On 4/4/14 11:43 AM, romix wrote:

Regarding decorators: Very nice! I like it. I'd suggest adding (or describing on Wiki) more meaningful examples from http://jfbillingsley.com/croc/wiki/Lang/Decorators to the decorators.shn , e.g. function call counter for functions, toString generator for classes (to show that decorator may change the class), etc.

Roger, just finishing off the macros now. Then I'll update the docs, etc.

Question: As far as I understand, decorators are not preserved as meta-information once they are applied.

Yeah, this is a tricky problem, mainly because not everything allows you to associate arbitrary data with it (functions, coroutines, cdata, etc.). However - and I think this is far more flexible - there's nothing stopping you from setting it up yourself:

meta = { }
function deco(f, v)
   meta[f] = v
   return f
end

@deco(42)
function foo()
end

assert meta[foo] == 42

Regarding variable declaration decorators: right now they get only assigned values as parameters. Would it also be possible to provide variable names as parameters to it? It would be also nice to provide a name for a function or a class, if it is not anonymous...

Hmm... not sure about this. Locals in Lua don't really have meaningful run-time names. They're just stack slots. The only reason you see a name at all in an exception is when the debug segment is not stripped from the bytecode. I admit that for local declarations, decorators are probably not as useful.

But macros are in-bound... then you'll get the name of the variables and all the rest. There's potential for deep magic there... enough rope, as they say, to bind-and-gag the neighborhood, rig a Spanish galleon, and still have some left over to hang yourself from the yardarm.

Coming back to parser issues: It is a bit pity that the short lambda syntax is not possible now...

I figured this one out. I had a bug in my whitespace handling (remember the expression form required no line-break after the =>). I'll put it back in shortly. Other than that, I think I've pretty much got the parsing complexity under control now. I'm learning :)

richardhundt commented 10 years ago

Macros are in, and short lambda expressions are back.

romix commented 10 years ago

Cool!

Noticed minor problem. This used to work before, now the second line results in an error:

j is Number = 0
local i is Number = 1
romix commented 10 years ago

Other than that, I think I've pretty much got the parsing complexity under control now.

Can you explain how to get it under control? What are the tricks to achieve it?

richardhundt commented 10 years ago

Thanks for the bug report. Fixed in head.

Regarding parser complexity. I did this kind of transformation...

from:

infix_expr <- <term> (<binop> <term>)+ / <term> -> infixExpr

to:

infix_expr <- <term> (<binop> <term>)* -> infixExpr

In the first case, if there's no <binop> then it will fail and backtrack, then match the second clause <term>. After the transformation, the rule succeeds even if it is not an infix expression. This means that the infixExpr handler needs to detect whether it has a <binop and if so, fold it, otherwise just return the first capture (<term> in this case).

Simplifying the explanation like that just makes it clear that this is really how recursive descent parsers usually handle precedence anyway. They don't backtrack that much.

romix commented 10 years ago

Richard, one proposal regarding local variables declarations.

If the code at the end of match:LocalDeclaration is changed to this:

      for i=#node.decorators, 1, -1 do
         local deco = node.decorators[i]
         local args = self:list(deco.arguments)
         local names = {}
         for i=1, #decl do
            names[#names + 1] = tvm.quote(decl[i])
         end
         local namesk = tvm.quote("names")
         local valuesk = tvm.quote("values")
         local argsk = tvm.quote("args")
         local argstable = {}
         argstable[namesk] = Op{OpList(names)}
         argstable[valuesk] = Op{OpList(decl)}
         argstable[argsk] = Op{OpList(args)}
         local decoargsOp = Op(argstable)
         frag[#frag + 1] = Op{'!massign', Op{decl},
            Op{Op{'!call', self:get(deco.name), decoargsOp }}
         }
      end

than it is possible to check the names of variables being declared, which can be useful. And here is an example of a decorator for local declarations which shows how to use this feature:

function vars(d)
   -- Simply print information about the declaration
   print "DECLARED VARS:", ...d.names
   print "VALUES:", ...d.values
   print "ARGS:", ...d.args
   -- return values unchanged
   return ...d.values
end

@vars(45,46,47)
local a, b, c = 1, 2, 3

print "local vars:", a,b,c

which outputs:

DECLARED VARS:  a   b   c
VALUES: 1   2   3
ARGS:   45  46  47
local vars: 1   2   3

What do you think? Would you include it into trunk?

romix commented 10 years ago

BTW, having this feature in place, one could implement a bit of AOP: E.g. a user could set a default local declarations decorator, which would apply to all local declarations even though they are not explicitly decorated by a user. This way one could see how all local variables are initialized in your program. This is useful for debugging. And if one would go more in the AOP direction, one could say "apply this default local declarations decorator to all local declarations annotated/decorated in a specific way". Then a user could mark all interesting declarations and only they would be used to invoke a default decorator (this can be decided statically during compilation). One interesting aspect here is that a decorator can be used as an annotation, i.e. it does not directly result in any code being generated from it. But it can be used by other parts of a compiler to find annotated places.

richardhundt commented 10 years ago

Grats on getting your hands dirty with the translator.

I'll need to digest this a bit.

If I change local decorators, I'll need to change them all. None of the other decorators know the names of their declaration either (it's incidental that classes carry their names as part of their value, whereas functions don't). I like the fact that they operate on values only, and in this way the local declaration decorators are consistent as they are now (i.e. all decorators intercept their values). This is how it's done in Python (and in Croc). It's familiar and most of all, simple and easy to reason about.

You see, once you have the name of a variable, other than logging, there's nothing else you can reliably do with it. You can't even pass it to debug::setlocal which expects a stack slot. Variables are compile-time aliases to stack slots, at best.

So to boil it down: I'd either need to come up with a protocol for decorators which has consistent meta-data for all of them or leave them as they are.

Consistency really matters to me. Programming itself is complex enough without needing the additional cognitive load from trying to remember edge cases in the language you're using.

Lastly, I'm actually using decorators now, so I'm getting a feel for what their limitations might be. Here's a snippet from an application I'm building:

import Controller, route from "swarm.app"
import JSON       from "codec.json"
import RESTClient from "swarm.rest.client"

module Features include Controller

   rest = RESTClient('localhost', 9200)

   @route('GET', '/')
   index(req, args, query)
      return { 200, { }, { "ALIVE" } } 
   end 

   @route('POST', '/feature')
   create(req, args, query)
      local data = req.entity.stream.read(1024)

      local json = JSON.decode(data)
      local resp = rest.post('/features/feature', { }, JSON.encode(json))
      local esjs = resp.entity.stream.read(1024)
      local code = resp.status
      local hdrs = { 
         ['Content-Type'] = 'application/json'
      }   
      return { code, hdrs, { esjs } } 
   end 

end

Features.start('localhost', 8000)
romix commented 10 years ago

Grats on getting your hands dirty with the translator.

Thanks! ;-)

Regarding your example: I know this kind of using annotations, oh, sorry, annotations, very well. Anyone who used Java with JAXB and/or JAX-WS does more or less this kind of things...

As for your reaction on my proposal: I expected exactly that, really. I totally understand your arguments about consistency. But my point is that while consistent, it limits the application of decorators too much, IMHO.

For example, if I only intercept the values of variables being declared, I cannot do too much interesting in the decorator as I don't know what it is being applied to. OK, I can manipulate a value, but blindly. And I'd imagine that when I intercept a variable declaration, I typically what do it for the following reasons:

So to boil it down: I'd either need to come up with a protocol for decorators which has consistent meta-data for all of them or leave them as they are.

It would be ideal, if it would be possible and without a (big) run-time hit. But I'm not 100% convinced that consistency in this case is so important or gets so broken.

After all, by definition, in a PL a declaration associates a name with a type and optionally assigns an initial value. So, it is a tuple (name, type, initial value). Since it is a dynamic language, type is not present. And since we often have also names missing in case of anonymous declarations, only the value remains in most situations. But this is incidental. In the ideal world one would expect a tuple above instead of declaring the worst case to become a normal case. So, if variable (and class?) declarations may provide 2 of 3 ingredients we should be glad and not upset about it...

richardhundt commented 10 years ago

just for the hell of it:

function let_impl(ctx, expr)
   util = require("shine.lang.util")
   vars = ctx.op(ctx.list(expr.params))
   vals = ctx.op(ctx.list(expr.body.body[1].arguments))
   return ctx.op({'!define', ctx.op(vars), ctx.op(vals) })
end

macro let = let_impl 

let (a, b, c) => 1, 2, 3

This twisted little piece of code uses the short lambda syntax to declare variables. I can't use assignment because macro's just accept expressions (I'm thinking of relaxing this restriction, in which case you could really say let a, b, c = 1, 2, 3 with a custom definition of let).