rns / kollos-luif-doc

Reference documentation for LUIF (Lua Interface) of Kollos (Libmarpa + Lua) project
MIT License
2 stars 2 forks source link

D2L simplification: S for sequences, bare literals for symbols #33

Open rns opened 9 years ago

rns commented 9 years ago

I'm inclined to make luif.S{ item, quantifier, separation_specificer, separator } to define sequences (rather than symbols), abandon luif.Q() and leave for symbols bare literals except for '|', '||', '%%', '~' and '%'. The rationale is clarity and good Huffman coding -- symbols are used most frequently -- and sequences need strict notation.

With this, the LUIF calculator grammar will become:

  Script = S{ 'Expression', '+', '%', L',' },
  Expression = {
    { 'Number' },
    { '|' , '(', 'Expression', ')' },
    { '||', 'Expression', L'**', 'Expression', { action = pow } },
    { '||', 'Expression', L'*', 'Expression', { action = mul } },
    { '|' , 'Expression', L'/', 'Expression', { action = div } },
    { '||', 'Expression', L'+', 'Expression', { action = add } },
    { '|' , 'Expression', L'-', 'Expression', { action = sub } },
  },

vs. the current

  Script = { S'Expression', Q'+', '%', L',' },
  Expression = {
    { S'Number' },
    { '|' , '(', S'Expression', ')' },
    { '||', S'Expression', L'**', S'Expression', { action = pow } },
    { '||', S'Expression', L'*', S'Expression', { action = mul } },
    { '|' , S'Expression', L'/', S'Expression', { action = div } },
    { '||', S'Expression', L'+', S'Expression', { action = add } },
    { '|' , S'Expression', L'-', S'Expression', { action = sub } },
  },
jeffreykegler commented 9 years ago

Looks good. Is this a move away from the notation in Roberto's paper?

rns commented 9 years ago

Roberto uses bare literals for terminals and notation like lpeg.V"A" for non-terminals (V for variables) [1] -- his grammars don't have terminal symbols, unlike LUIF's structural grammars.

D2L (now) uses luif.S'symbol' for symbols (both terminals and non-terminals), luif.L'literal' for literals, and luif.C'[0-9]' for charclasses.

Symbols get the most use in structural grammars, and sequences need stricter syntax -- so bare literals for symbols and luif.S{ } for sequences simplifies both processing and syntax as I saw in early prototyping D2L.

[1] http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html#grammar

rns commented 9 years ago

Done in 47cde97163523b16bf1cfb532857446fc639b13b, 869d4e134032e2b0aa29a29067768e80bb638694, 9e21535, https://github.com/rns/libmarpa-bindings/commit/999e31fb39fa83fd6a64fba339c22c71a58033c5

rns commented 9 years ago

A side effect is that there is no function call and location is reported less accurately than, e.g., in the case of literals (luif.L()), which can get line number via debug.getinfo().

This of course applies only to D2L used in Lua source files -- LUIF parser will be able to add full location objects as needed.

Ideas:

jeffreykegler commented 9 years ago

That LPeg can find line-within-grammar is interesting. That code might be worth looking at for Kollos.

On Thu, May 21, 2015 at 11:06 PM, rns notifications@github.com wrote:

A side effect is that there is no function call and location is reported less accurately than, e.g., in the case of literals (luif.L()), which can get line number via debug.getinfo().

This of course applies only to D2L used in Lua source files -- LUIF parser will be able to add full location objects as needed.

Ideas:

— Reply to this email directly or view it on GitHub https://github.com/rns/kollos-luif-doc/issues/33#issuecomment-104530966.