jamiebuilds / ghost-lang

:ghost: A friendly little language for you and me.
302 stars 10 forks source link

Implicit Returns & Expression Statements & Whitespace Insignificance & Unary/Binary Operators & No Semicolons! #33

Open jamiebuilds opened 5 years ago

jamiebuilds commented 5 years ago

There's a couple aspects of the language that when considered together create a design problem:

The Features

Implicit Returns - Automatically returning the "last" value in a function

let add = fn (addend, augend) {
  let total = addend + augend
}

let total = add(40, 2) # 42

Expression Statements - Allow expressions in "statement" positions

if (cond) {...} # if-else is an expression, not a statement, but it should be usable in statement positions

let add = fn (addend, augend) {
  addend + augend # implicit returns will often not use a `let` statement
}

Whitespace Insignificance - Whitespace (spaces, tabs, & newlines) should be allowed in between "parts" of language constructs

let value = if (
  someReallyLongFunctionCall(someReallyLongValueName) ||
  someOtherRallyLongFunctionCall()
) {...}

# even wacky stuff...

let       total
  =
      addend
  + 
           augend

Note: Allowing the "wacky" whitespace stuff greatly simplifies the language grammar. Otherwise you're constantly specifying what whitespace is allowed where and people will end up being able to do wacky stuff anyways, so the language should just have a formatter

Unary/Binary Operators - There are generally two types of operators "binary" operators and "unary" operators"

# Binary operators have values on either side of them:
let total = addend + augend

# Unary operators either prefix or suffix a value:
let opposite = !boolean
let negative = -number
let not = ~number

No Semicolons - Lots of languages have separators between statements in order to make sure you know exactly when one ends, generally that is a semicolon:

let no = true
let semicolons = true

The Problem

Combining all these things you end up with:

let fn = fn (num) {
  let val = 42
  -num
}

It's ambiguous, and it needs to be resolved.

It's not an option to just look at the whitespace and see they are two separate statements. That would greatly complicate the language grammar.

The Possibilities

In order to solve this, Ghost has two options:

If we were to get rid of any one of the above language features, suddenly this wouldn't be a problem anymore.

  1. If Ghost had semicolons, it would know where all the statements ended.
let value = 42;
-num
  1. If Ghost didn't have unary/binary operators, there'd be no way to connect an expression statement to the statement above it (or the other way around with suffixes).
let value = 42
num * -1
  1. If Ghost had whitespace significance, it could terminate statements.
let value = 42
  - num # binary

let value = 42
-num # unary
  1. If Ghost didn't have expression statements, it would never introduce the problem
let value = 42
let other = -num

let value = 42
-num # must be continued from above
  1. If Ghost didn't have implicit returns, you'd effectively be getting rid of expression statements
let value = 42
return -num

The Solution

Out of all those options, the one I like the best is getting rid of expression statements:

let value = 42
let negated = -num

If naming variables is too annoying, I would say to use the Junk Binding (the value should still be returned, it just wont create a binding):

let value = 42
let _ = -num

The Problem With The Solution

Through most of this I've been focusing on expression statements in "return" positions. But in order to be a good language for writing scripts, you want to write lots of expression statements that have effects.

Using the above solution, it would look something like this:

let deploy = fn (tag, message) async {
  let _ = await updateYaml(tag: tag)
  let _ = await gitAddAll()
  let _ = await gitCommit(message: message)
  let _ = await gitTagAdd(tag: tag)
  let _ = await gitPush(remote: 'origin', branch: 'master')
  let _ = await triggerCi(workflow: 'deploy', tag: tag, message: message)
}

Personally, I actually quite like this, it makes it very obvious that you're doing side-effectful shit. But others might find it annoying.

I might even take it further by copying something from Rust, and saying:

let deploy = fn (tag, message) async {
  let _yaml = await updateYaml(tag: tag)
  let _status = await gitAddAll()
  let commit = await gitCommit(message: message)
  let _tag = await gitTagAdd(tag: tag)
  let _status = await gitPush(remote: 'origin', branch: 'master')
  let _pipeline = await triggerCi(workflow: 'deploy', tag: tag, message: message, commit: commit)
}

This would allow you to name everything without creating tons of bindings, and while making it clear you're running functions for their side-effects instead of their return value.

The Other Problem With The Solution

With the simplest version of this proposal, it's adding quite a bit of typing:

# Before:
let fn = fn (numbers: Iter<Number>) async {
  Iter.reduce(numbers, 0, fn (total, number) { total + number })
}

# After:
let fn = fn (numbers: Iter<Number>) async {
  let total = Iter.reduce(numbers, 0, fn (total, number) {
    let total = total + number
  })
}

For convenience, I'm tempted to say that you can have an expression statement if there are no other statements above it (including other expression statements).

So you could still get:

let fn = fn (numbers: Iter<Number>) async {
  Iter.reduce(numbers, 0, fn (total, number) { total + number })
}

But as soon as you add statements, you need to use let

let fn = fn (numbers: Iter<Number>) async {
  let numbers = doubleAll(numbers)
  let total = Iter.reduce(numbers, 0, fn (total, number) { total + number })
}

I'm 50/50 on this. It's very convenient (saving a minimum of 5 keystrokes at a time adds up fast), but its adding a syntax to the language that you can't "cut and paste" and put anywhere.

It also means that if you add a statement to a function, you might need to rewrite a function below it in order for it to be syntactically valid.

Like when I wanted to add that let numbers = doubleAll(numbers) to the above function, I also needed to add let total = (or at least let _ =). It's not terrible, but it's something I try to avoid in Ghost, because it's annoying, and it's the sort of thing that will confuse beginners.

The Alternative Solutions

If expression statements had a syntax of their own, it could also solve the problem. The most obvious answer would a prefix. Sadly, my favorite prefixes would be ! or ~, but I can't use those or any of these:

+ # a + b
- # a - b, -a
/ # a / b
* # a * b
% # a % b
** # a ** b
& # a & b
| # a | b
^ # a ^ b
~ # ~a
<< # a << b
>> # a >> b
>>> # a >>> b
== # a == b
=== # a === b
&& # expr && expr
|| # expr || expr
! # !expr
: # :symbol 

Funny enough.... semicolons could work:

let deploy = fn (tag, message) async {
  ;await updateYaml(tag: tag)
  ;await gitAddAll()
  ;await gitCommit(message: message)
  ;await gitTagAdd(tag: tag)
  ;await gitPush(remote: 'origin', branch: 'master')
  ;await triggerCi(workflow: 'deploy', tag: tag, message: message)
}

There's lots of other things you could do, but it should be much easier to write than let _ = or it's not really worth it. Personally I prefer let

j-f1 commented 5 years ago
  1. If Ghost didn't have implicit returns, you'd effectively be getting rid of expression statements
let value = 42
return -num

They’d still be useful, for cases like your examples in “The Problem With The Solution.”

How about adding a new syntax for functions?

let fn = fn (numbers: Iter<Number>) async {
  Iter.reduce(numbers, 0, fn (total, number) => total + number)
  # or (if you prefer)
  Iter.reduce(numbers, 0, fn (total, number) -> total + number)
}

Like in JS, this would allow exactly one expression, which would be implicitly returned.

avegancafe commented 5 years ago
let deploy = fn (tag, message) async {
  let _ = await updateYaml(tag: tag)
  let _ = await gitAddAll()
  let _ = await gitCommit(message: message)
  let _ = await gitTagAdd(tag: tag)
  let _ = await gitPush(remote: 'origin', branch: 'master')
  let _ = await triggerCi(workflow: 'deploy', tag: tag, message: message)
}

This actually reminds me quite a bit of what you do in ocaml when you need a side effect executed but don't care about the value:

let add_with_side_effect = fun x y ->
  let () = do_a_side_effect () in
  x + y
;;

While they do have semicolons, I think they specifically do this for the type system (in ocaml () is considered their "null" value), because then it can validate that do_a_side_effect has a return type of null. I noticed you didn't mention type considerations in the OP, but if you take the implementation of _ even further than described, assigning to _ could actually provide real value! I thought I'd pitch in yet another positive of this proposed solution; it's a pretty small thing but it makes it feel really natural in ocaml and encourages the programmer to not have methods with side effects that also return values (a general anti-pattern as described by Robert Martin's Clean Code)

jamiebuilds commented 5 years ago

it's a pretty small thing but it makes it feel really natural in ocaml and encourages the programmer to not have methods with side effects that also return values (a general anti-pattern as described by Robert Martin's Clean Code)

I don't know that I agree that returning a value from an side-effectful function is an anti-pattern. But interesting