til-lang / til

An easy to extend command language
57 stars 5 forks source link

Simplifying syntax #2

Open dumblob opened 3 years ago

dumblob commented 3 years ago

I started reading some of in-Til written source code and immediately asked myself why do we need:

  1. proc p (name) { ... } instead of simpler proc p name { ... }
  2. if ($msg == "QUIT") {break} instead of simpler if $msg == "QUIT" {break}
  3. only the verbose construction of a dict set d1 [dict.create (a "alfa") (b "beta") (c "gama")] instead of providing a second "constructor" set d1 [dict.create2 a "alfa" b "beta" c "gama"]
  4. the verbose math ($x * $y) instead of math $x * $y

Looking into this further both points (1) and (2) are actually similar to "set destructuring a simplelist" (set a b c (1 2 3)).

cleberzavadniak commented 3 years ago

Hello again. I really appreciate your questions.

  1. Because it's faster to consider it's always going to be a SimpleList than checking every time if the programmer sent a single Atom or a proper SimpleList. I'm trying to keep the code as simple and straightforward possible and this decision allow me to get rid of if (parameter.type == ... things.
  2. Because it's almost impossible to know where the condition actually ends without putting a lot of semantics in the parser. It's already identifying pre-defined "operators", but that's all: there is no actual validation and if the programmer decides to use if ($x my_special_operator $y) there's no impediment to that (yeap, in this example there's only the body at the end, but if can be a chain of "if/else if/else if/else", too).
  3. That would be possible but I don't like how the key/value semantics get kind of visually lost. Also, it allow eventual extension by saying dict (a 1 int) (b 2 str) for, IDK, maybe JSON serializing.
  4. Again, semantics: you're asking "math" to resolve one formula, so you pass one argument. Besides, the implementation gets much simpler. And we could eventually solve multiple formulae in one "math" run. -- Also, I believe the docs are lacking on this part, but "multi items returns" are actually "liquid": you could say math [proc1 $x] [proc2 $y] [proc3 $z] with each one returning N formulae and everything would count as "spliced" (in Lisp terms) arguments. And that behavior makes me prefer making lists-as-arguments always explicit.

About the "destructuring set", again, it's always the case the last argument is a list-or-not, but if can be an N-sized chain of new conditions and bodies.

dumblob commented 3 years ago

Ok, sounds like a lot of typing to me objectively. Subjectively it definitely feels like more burden than necessary as most of the command invocations are short and thus become visually buried in all those parenthesis/braces/brackets as these account for a significant part of the code base.

If you don't mind, I'll write summarize some (pseudo)arguments I have in mind :wink:.

  1. I like strict syntax - so no "more ways to write the same thing". I.e. no alternative constructors etc. I'm writing this to make it clear that my proposals are to replace the current situation and not to add "alternative" syntax (so I belive your point (1) becomes void).

  2. Regarding your point (2) I think I might be misunderstanding how if is implemented. Currently I don't see the reasons why it should put any semantics into the parser. I assume one thing here: commands can be "overloaded" depending on argument values known in parse time (i.e. depending on the type of the token or the root token type in case of a compound structure like flat or arbitrarily nested list) they get as arguments. Maybe if else was a separate command (if it's not already), things would look differently (I don't know whether better or worse). Could you elaborate on "lot of semantics in the parser would be needed"?

  3. About point (3): interesting - I find myself less visually lost - mainly because set builds around that syntax (that's why I made the proposal :wink:). But how about extending the syntax in the future? Simply said I don't think that kind of syntax extension shall happen. Simply because such extensions require being globally applicable and not command-specific (JSON has to work with many commands, same goes for types). For such global things some kind of "compile time macro" (maybe loosely similar in idea to Java annotations or Python type hints?) seems to be a much better fit.

    Btw. in case of dict one would probably also want to specify the type of key, not just type of the value :wink:.

  4. About (4) one argument one formula sounds logical but less practical IMHO. But that's subjective, so let's focus on a more problematic issue. Namely I'm really surprised by how inconsistently (compared to other commands) splicing in Til works according to your example. If math gets only one expression (list of atoms representing mathematical expression), then your example (math [proc1 $x] [proc2 $y] [proc3 $z]) must not compile as despite consecutive [ ... ] blocks form a splice, it's still only a splice and not a list, so it must be enclosed in ( ) like math ([proc1 $x] [proc2 $y] [proc3 $z]).

    On the other hand I'd prefer the math [proc1 $x] [proc2 $y] [proc3 $z] variant because it's consistent with my proposal math $x * $y or even math $x*$y (and thus violates your premise "one argument one formula").

  5. From my experience also I'm strongly convinced more complicated parsing to achieve non-negligibly less visual clutter is always worth it. And no, I'm not talking about having lot's of sugar. The basic principle has to be robust otherwise no sugar will save the disaster :wink:.

cleberzavadniak commented 3 years ago

In the math [proc1 $x] [proc2 $y] [proc3 $z] case, I expect each proc to return SimpleLists. But if I want to build a complicated mathematical expression, I would love to allow some procedures to just add new terms to it without "joining" things by hand, like math ($x + $y * [very_complicated_common_part_with_variable_terms_returning_depending_on $z]) or math ($x * [complicated_parameters_loading_and_calculation]) or even math ($x [operator_depending_on $y] $y). (And I'm using math as an example, but that applies to many other cases).

About splicing: it works basically like a Unix shell: math $(proc1 $x) $(proc2 $y) $(proc3 $z). It's up to each command to group or not it's returned values. Nothing really new, here, I just prefer to be explicit when separating things instead of when joining them.

And I love it when that saves me from writing annoying ifs.

dumblob commented 3 years ago

Ah, thanks for explanation. I didn't know math accepts more than one argument.

Is the rule "it's up to each command to group or not SimpleList's values" common for all Til commands? That doesn't sound intuitive.

E.g. in case of math I doubt one would abstract it like math [proc1 $x] [proc2 $y] [proc3 $z] because that'd most probably result in invalid expression (assuming each of proc1 proc2 proc3 has no knowledge about it's syntactical surrounding). That said one needs to design proc1 proc2 proc3 so that each of them separately returns a valid expression and then use math as follows:

math (\() [proc1 $x] (\) + \() [proc2 $y] (\) * \() [proc3 $z] (\)) - (\( $a / \( $b * 5 \)\))

Which is... just plain unreadable nonsense IMHO. Or maybe I'm again misunderstanding something about Til :wink:.

If proc1 proc2 proc3 returned just spliced tokens instead, and math accepted both SimpleLists and non-SimpleLists while treating SimpleLists as mathematical subexpressions (to avoid quoting \( \)), we could write the example above as:

math ( [proc1 $x] ) + ( [proc2 $y] ) * ( [proc3 $z] ) - ( $a / ( $b * 5 ) )

(for safety I can't assume proc1 proc2 proc3 returned spliced tokens (or AST nodes in some form) all enclosed in parenthesis, so I have to write them here at the receiver side just to be sure)

cleberzavadniak commented 3 years ago

math (\() [proc1 $x] (\) + \() [proc2 $y] (\) * \() [proc3 $z] (\)) - (\( $a / \( $b * 5 \)\)) - no idea where this came from.

If the procedures don't return SimpleLists themselves, it would be a matter of just wrapping them:

math ([proc1 $x]) + ([proc2 $y]) * ([proc3 $z]) - ( $a / ( $b * 5 ))

dumblob commented 3 years ago

math (\() [proc1 $x] (\) + \() [proc2 $y] (\) * \() [proc3 $z] (\)) - (\( $a / \( $b * 5 \)\)) - no idea where this came from.

Hm, could you then elaborate on what you meant with the following?

  1. Again, semantics: you're asking "math" to resolve one formula, so you pass one argument. Besides, the implementation gets much simpler. And we could eventually solve multiple formulae in one "math" run. -- Also, I believe the docs are lacking on this part, but "multi items returns" are actually "liquid": you could say math [proc1 $x] [proc2 $y] [proc3 $z] with each one returning N formulae and everything would count as "spliced" (in Lisp terms) arguments. And that behavior makes me prefer making lists-as-arguments always explicit.

I understood that math will always accept only one argument. Which appears to be untrue. But then, why does one see math ($x * $y) instead of math $x * $y in the code (as I pointed out in my original question (4) in the comment above)?

Also, could you comment on the point (2) from https://github.com/til-lang/til/issues/2#issuecomment-845095083 before it gets buried in the discussion :wink:?

cleberzavadniak commented 3 years ago

math accept one argument today. That's not a rule. But I want it to be able to handle multiple arguments.

About if and semantics in the parser: Til is a command language. if is a command like any other.

You example was if $msg == "QUIT" {break}. The reality would be:

if $msg == "QUIT" {
  return
} else if $msg == "not quit" {
  continue
} else {
  break
}

First note: It's less typing, for sure, but it also seems weird (it's not for my taste and even reminds me of Golang - argh!).

Without pushing if semantics to the parser: if implementation would need to "discover" everything at runtime. As it already does, except that it enforces a simple-to-follow format where all the conditions and bodies fall into pre-defined positions so, no guessing at all. No if (this argument type is this) else if (this argument type is that) and no further semantic analysis: if doesn't have to do any lookahead operations, like "well, I received $msg, is it the condition itself? Not sure, must check the next argument. Oh, it's an operator, so let's see the next one" and so on. (And no loading of all arguments to visit them in reverse and then re-analyze everything again.)

If all that was pushed into the parser, it would need to check "is this an if command?" and checking for specific commands is not the parser's job (you see, even include is a regular command).

dumblob commented 3 years ago

math accept one argument today. That's not a rule. But I want it to be able to handle multiple arguments.

Thanks for explanation. I'm looking forward to math accepting more than one argument :wink:!

but it also seems weird (it's not for my taste and even reminds me of Golang - argh!).

:rofl: Ok, but there are (many) other options - e.g. swap the semantics of {} and [] and it'll not remind you of Golang any more that much :wink:.

Don't take me wrong, but I really don't think this personal "downside" outweights the "more lightweight and airy" syntax.

Without pushing if semantics to the parser: if implementation would need to "discover" everything at runtime. As it already does, except that it enforces a simple-to-follow format where all the conditions and bodies fall into pre-defined positions so, no guessing at all. No if (this argument type is this) else if (this argument type is that) and no further semantic analysis: if doesn't have to do any lookahead operations, like "well, I received $msg, is it the condition itself? Not sure, must check the next argument. Oh, it's an operator, so let's see the next one" and so on. (And no loading of all arguments to visit them in reverse and then re-analyze everything again.)

If all that was pushed into the parser, it would need to check "is this an if command?" and checking for specific commands is not the parser's job (you see, even include is a regular command).

Ok, now I understand. That's a fair argument but it sounds more academical than practical. I know parsers (not to be confused with lexers) doing lookaheads quite normally (and this one is quite cheap).

As I pointed out in (5) above - the tens of years of development of languages show, that sacrificing comfort in favor of slightly more complicated parser destines the language for premature death.

Til could take a hybrid approach - in compile time analyze where if command ends and then in runtime pay no extra cost for such analysis over and over.

Fortunately this is a forward-compatible restriction. So if any time later Til decides to allow for more comfort, it's possible without adjusting existing source code.