Function syntax is too ambiguous

creationix commented 14 years ago

I like the change to make parenthesis optional, but now I think the language is too ambiguous.

For example:

a b, c => b + c

Could mean

a(function (b, c) { return b + c; } )

or

a(b, function (c) { return b + c; })

Both are pretty common cases in JavaScript. I don't want optional parens to end up like option semicolons in Javascript. Sure you can do it, but the results aren't well defined, so the best practice is to not omit them.

I think we should change the syntax for function definitions to make the grammar still easy to read, but unambiguous.

One option would be to add a new reserved keyword fun and require all function definitions to start with it.

// Two parameters
a b, fun c => b + c
// One parameter that's a function with two arguments.
a fun b, c => b + c

Or we could use bars to group the arguments of the function.

// Two parameters
a b, |c| => b + c
// One parameter that's a function with two arguments.
a |b, c| => b + c

Or we could think of something else, but the point is to make it well defined so programmers aren't suprized by unexpected behavior.

jashkenas commented 14 years ago

I've been feeling the same way about this for a while now. There are starting to become lots of cases where function definitions need to be better delimited. Here's a couple:

[x => x, x => x * x]
vs.
[(x => x), (x => x * x)]

element.bind('click', event, el => el.show())
vs.
element.bind('click', event, (el => el.show())

The basic issue is the inability to distinguish between the end of the argument list, and the beginning of the parameter list to the function, because functions don't have a symbol that denotes their beginning. I'd like to see some suggestions too, I don't have anything particular in mind.

One solution that might be nice could be to enclose all parameter lists in parenthesis, to mirror the syntax that you use to invoke a function. So:

square: (x) => x * x

Or, creationix's example from above:

a (b, c) => b + c

Ideas?

creationix commented 14 years ago

Hmm, since Jison is LALR(1) It doesn't like using parens for both method calls and method definitions. If at all possible, I would like syntax that's possible to represent with a LALR(1) grammar.

creationix commented 14 years ago

And | isn't working either, but that's because my lexer can tell it apart from the logical or operator.

creationix commented 14 years ago

I think having a keyword at the start will work best. Something I think would be really neat is allowing for unicode symbols.

// Using regular ascii characters
square: fun x -> x * x
// Using unicode characters
square: λ x → x * x

I think most editors now support unicode and programable snippets. You could make a snippet in textmate, for example that expands fun(tab) to the unicode version of a function shell. For people who don't want to mess with unicode can just use the fun keyword.

zmthy commented 14 years ago

How about making it a requirement to name every function? That way we don't need a keyword, and the start of every function is obvious. a anon: b, c => b + c vs a b, anon: c => b + c

zmthy commented 14 years ago

Also, a (b, c) => b + c already means a(b, c, function () { return b + c; });

creationix commented 14 years ago

I vote for symmetric parens. It feels more like the CoffeeScript way. Reminds me of using : for assignment just like in JSON objects.

Also, I think making it required to name everything is a bad idea. I use anonymous functions all the time, and that would be a huge burdon as well as bloat the generated JS with unneeded variables and names.

jashkenas commented 14 years ago

That works too, but I'd find it hard to justify -- because it's making mandatory a change in semantics, (despite the fact that it's probably a good idea). It would make functions the only type of expression with that requirement. It would also make the generated JS just a little bit uglier when it doesn't need to be. Finally, thinking of good names is hard enough to do for variables you reference later, and mandating a throwaway name for a single-use function seems doomed to a lot of "f" and "g" functions.

That said, I think it's the best alternative to mandatory parentheses surrounding parameter lists, which is something that both creationix and I are taking a stab at, as we speak.

['toast', 'wine', 'cheese'].each (food) => print(food.capitalize())

vs.

['toast', 'wine', 'cheese'].each capitalize: food => print(food.capitalize())

By the way, it's working just fine on my branch, I just want to change all of the tests to the new syntax before pushing it.

creationix commented 14 years ago

I never liked the syntax where the parens were around part of the parameters in a function call anyway. We'd have to axe that option so as to make function definitions unambiguous.

a (b, c) => b + c

would now mean

a(function (b, c) { return b + c; })

jashkenas commented 14 years ago

Yeah, it's not a backwards-compatible change. The one thing about it that feels overly ugly is functions that take no arguments, which now look like this:

() => do something...

Perhaps a necessary evil, but if there's a better way to mark a no-argument function, I'm all ears.

creationix commented 14 years ago

Can't you just leave off the parens for that case or am I missing something.

jashkenas commented 14 years ago

Perhaps you can -- I thought there was an ambiguity with a parenthesized expression right before it, but I guess that can't happen.

Edit:

It would conflict with our block syntax, which would be nice to keep:

func(array) =>
  code...

vs.

func(array) () =>
  code...

creationix commented 14 years ago

By the way, while we're breaking stuff, is there a reason for the=> instead of ->. It's a single arrow in Lambda Calculus and OCaml.

Also the double arrow reminds me of hash objects (php and ruby)

creationix commented 14 years ago

I agree we need the block syntax, but since parens are optional in function calls it still looks nice.

func array, =>
  code...

jashkenas commented 14 years ago

No reason -- it just lines up a little better in Monaco, I guess. -> Looks a little off-kilter, and the hyphen is shorter in non-monospaced fonts. Check out the difference in this textbox. I think the equals-arrow comes out a bit more readable.

creationix commented 14 years ago

Also with the old block syntax, you can't use things where the callback function is not the last argument like setTimeout. But with the new syntax it should look ok.

setTimeout =>
  # code...
, 500

setTimeout(=>
  # code...
, 500)

jashkenas commented 14 years ago

Alright. Mandatory parens around parameter lists in function definitions is now on master. Kick the tires and see if it feels right.

weepy commented 14 years ago

How about using pipes like |x, y|. I think it might make it a bit more readable -- as there will be less ambiguity with other nearby parens. E.g.

(square: |x| -> x * x)
a(|b, c| -> b + c)

zmthy commented 14 years ago

There are a great deal of parens going around. Only problem with | is it's also an operator, but should we really be supporting bitwise operators? They're not the greatest in JS.

weepy commented 14 years ago

We should probably support bitwise operators, but they could be another keywords. Seems silly that such nice syntax ^ | & opportunities are used up by such seldom used functions!

liamoc commented 14 years ago

Why not use the Haskell lambda notation?

square: \x -> x * x

zmthy commented 14 years ago

I like.

jashkenas commented 14 years ago

There are a great deal of parens, but hopefully they all make sense. Currently, you use them for three different purposes, two of which are mirrors of each other:

Grouping expressions, as in math: (a + b) * (c + d)
Function definitions: sum: (x, y) -> x + y
Function calls: sum(5, 6)

All of those uses of parentheses make fairly good sense after exposure to basic algebra, I think, with its f(x)...

In general, I'd like to avoid trying to introduce cryptic symbols that can't be easily read (which is one of the reasons I've been so resistant to @variables, even Ruby programmers don't agree on how to pronounce it). [] () ! - , are all things that scan well. I remember this being one of the ideas behind _why's Potion as well, although the link seems to have disappeared.

So, to change the subject, the last thing that is missing from this change is the removal of our block literals, which are no longer required, and can be replaced with sans-paren funcion calls. There are a couple of ambiguities that still need to be worked out, the main one of which is this:

# A method call.
each [1, 2, 3]

# An indexed reference.
array[1]

It looks like we'd have to start watching for the space between the identifier and the [ or ( character. Is that acceptable?

creationix commented 14 years ago

I think so. While we're adding significant spaces in, let's not allow this:

items.forEach(item) -> item.trim()

but require a space before the anonymous function.

items.forEach (item) -> item.trim()

weepy commented 14 years ago

items.forEach (item) -> item.trim()

A good example of why I think it would be better to have | to make it explicit. | scans well too.

jashkenas commented 14 years ago

Quick question:

list[3] arg

method() arg

Should these be syntax errors, or be implicit calls, producing this:

list[3](arg)

method()(arg)

What do you think?

jashkenas commented 14 years ago

This is all now on master, with passing tests. Block literals are gone. Use sans-parentheses invocations, if you'd like to pass nested functions, like so:

tables.each (table) ->
  table.rows.each (row) ->
    row.show()
    row.highlight()

Closing this ticket. If you run into any problems with bits that should be function calls but aren't, or vice versa, leave a note. Otherwise, if all goes well, this can be CoffeeScript 0.3.0.

jashkenas / coffeescript

Function syntax is too ambiguous #127