Function Syntax - Githubissues

keean commented 8 years ago

One of the first things we need to do is decide on a function syntax, for application and definition.

keean commented 8 years ago

I don't think we want something too weird and new, so we should follow the conventions of existing language families. Do we want to follow the functional style which makes variable and function assignment similar (and maybe support partial application):

x = 1
f x = x
g x y = (x, y)

or do we want something from the 'C' family:

int x = 1
int f(int x) = x
(A, B) f<A,B>(A x, B y) = (x, y)

or something like Rust (without the types for inference):

let x = 1
fn f(x) { x }
fn g(x, y) { (x, y) }

Or perhaps python like:

x = 1
def f(x):
    ...
def g(x, y):
    ...

shelby3 commented 8 years ago

@keean wrote:

Do we want to follow the functional style which makes variable and function assignment similar (and maybe support partial application):
x = 1
f x = x
g x y = (x, y)

I have argued here, here, here, and here to not to adopt that syntax.

Also that syntax is more ameniable to global type inference, because it otherwise requires a duplicate line to declare the type of the function.

Insurmountable negative factor is that vast majority of the world's programmers will be unfamiliar with it (unnatural, not second nature, fights their ingrained second nature), because it is too different from the most popular languages, JavaScript, Java, C, and C++.

I argued it raises the cognitive load too much both at declaration and at the call site, because it requires computing too many factors (inference, precedence, declaration site types and arity).

From the author of the acclaimed Learn You A Haskel:

I failed to learn Haskell approximately 2 times before finally grasping it because it all just seemed too weird to me and I didn't get it.

@keean wrote:

However I do think you have a point about people not used to the style.

You seem to have agreed?

Btw, partial application can be done with the other syntax styles. I will explain later.

keean commented 8 years ago

@shelby3 I agree I just want to capture everything relevant here.

I also remember we wanted something like Python style indenting rather than curley brackets.

Do we want to have a keyword like 'def' or 'fn' for function definitions?

keean commented 8 years ago

I actually quite like Rust's syntax for definitions, rather than Python or TypeScript. One problem is with combining TypeScript style signatures with Python style indenting:

inc(x : Int) : Int :
    ...

The problem here is we might want to leave out types:

dec(x) :
   ...

It makes it impossible to tell whether the return type is missing. Whereas with Rust style:

inc(x : Int) -> Int :
    ...

shelby3 commented 8 years ago

@keean wrote:

The problem here is we might want to leave out types:

I was just about to post a similar point. We can't use the C/Java-style of type precedes instance name:

int x = 1
int f(int x) = x

Also I wouldn't want some verbose gaudy type as the prefix any way.

I wrote that 2 days ago:

If non-exported function signatures can be inferred, then a syntax with optional type annotations would seem better.

Agreed. Thus we need the suffixed : type and not the C/Java prefixed type declaration.

Haskell puts the optional types on a separate line instead which I think is unfamiliar, consumes extra lines and violates DNRY, which I don't like (then afaik all the types have to be provided, not the option of just some of them).

shelby3 commented 8 years ago

@keean wrote:

Do we want to have a keyword like def or fn for function definitions?

Since yesterday, I've been trying to think if there is a way we don't have to prefix the function declaration (aka definition) with fn or def (because I hate noise, unless it aids readability). I was going to work on the LL(k) parser grammar so we can prove it is a context-free grammar (CFG), which is very important. Also I want consistency with declaring methods in the body of typeclass definitions. So if we need to prefix with fn or def for local functions, then we should do same for methods, but I hate that boilerplate.

I've been thinking about all these issues too. :wink:

Inside the body of the typeclass declarations we certainly don't need the prefix on methods. But for functions declared inside the body of another function, we would have a grammar conflict with function calls. One way to get around that is to force a space between the function name and ( for declarations (e.g. f (x...)) and that whitespace not allowed for function calls/application (e.g. f(x...)). I may have other ideas. Let me think about it for a while.

The grammar is really a holistic design and we can't design it piecemeal and be sure to not have CFG conflicts.

Feedback?

shelby3 commented 8 years ago

@keean wrote:

I also remember we wanted something like Python style indenting rather than curley brackets.

inc(x : Int) : Int :
   ...
The problem here is we might want to leave out types:
dec(x) :
  ...
It makes it impossible to tell whether the return type is missing

I don't like the alternative syntax with an arrow inc(x : Int) -> Int :. It is inconsistent! Consistency is very important. And that trailing : is very confusing no matter what we do.

In the LL(k) grammar I was working on in May, I devised a solution which eliminates the need for the trailing : to indicate the start of an indented code block.

I presumed we will enforce a standard indent as agreed by @SimonMeskens.

The rule is that if the following line is indented by the standard amount, then it is a new code block. If it is indented more than the standard amount, then it is the continuation of the prior line.

I realize that enforcing indenting spacing is something that programmers will disagree on because they all have their preferential amount (and some want tabs and others want spaces), but this is open source and consistency (for readability) is more important than allowing people that freedom to make code unreadable for others (such as when the tabs don't line up!).

Btw, I am hoping we decide for 2 spaces indent to conserve horizontal space for deeply indented blocks? I notice from your examples, that you like more spaces when you indent (4 or 5?), but the really skilled programmers I've seen use 2 spaces, because 2 spaces is enough and also it helps align at the tail of the if on the line above:

if x
  something
else
  something else

Three spaces uses 50% more horizontal space and it aligns with the x which is confusing or consistent depending on the way one thinks about it:

if x
   something
else
   something else

Any block indenting greater than 3 spaces is going to piss off some programmers. Any programmers who think they need 4 spaces for block indenting are amateurish. Our eyes can't easily detect it at 3 spaces, and I find 2 is enough (and I am blind in one eye and blurred in the other eye).

What is your opinion? I could tolerate 3 but I don't think it is ideal. But 4 makes me not happy:

if x
    something
else
    something else

And 5 is ridiculous:

if x
     something
else
     something else

Most of my coding life I used 3 spaces, but when I saw that expert coders were using 2 spaces, I realized I didn't need 3.

SimonMeskens commented 8 years ago

What about the wildly popular arrow syntax? It's familiar, due to being included in JavaScript, TypeScript, Java and C# and it's quite concise (unlike using a keyword like def or fn). I'm using indents, but this would work with curlies or anything else too (I personally prefer required indentation). In Coffeescript, return is optional, but I really don't like that. What I do like is optional return for one liners. I kinda like the conciseness of not having to use brackets, but I agree that it makes the language rather dense to read and hard to get into. I agree with earlier sentiment that semi-colons are evil. You pay a penalty for all your code for that one time where you need them, usually to make your code less readable. I picked a thin arrow (->) instead of a fat arrow (=>), which is used more often, because I think the fat arrow is visually quite busy.

Block lamda (daft sample to pad it out a bit):

inc(x: Int): Int ->
  let y = x + 1
  return y

Notice that absence of type on y means the compiler implicitly assigns type, not that y is untyped.

Now, the problem arises for me, when you want to type curried functions or just simple lambdas. What do we use for indicating a type goes from one to the other? TypeScript for example uses the fat arrow for both, this is syntactically very confusing. I think for this reason, we could also replace the arrow with double colon. I personally think double colon is the nicest separator, but it's slightly less popular.

I'll introduce something else I like too: using := for assignment, like in Pascal. The reason being that this gels nicely with :: (both are a type of assignment, one assigns code, one assigns a value) and because it allows you to use = for comparison, as == for comparisons is visually busy and comparisons already tend to be the busiest syntax. I might even go as far as to suggest and and or for logical operators to further fix that.

filter(list: List of Item, pred: Item -> bool): List of Item ::
  let out := List(Item)
  for(item of list)
    if(pred(item))
      out.add(item)
  return out

filter(myList, x => x = 5)

Notice that we might as well just drop the ::, as with forced indent, it's clear that the block indicates a function body.

inc(x: Int): Int
  return x + 1

I kept the code quite imperative for now, to not assume too much yet. If we do get rid of brackets, I might prefer something like this I think? Still keeping brackets for calls maybe? Starting to get alien. Still less confusing than Haskell though.

filter
  list: List of Item
  pred: Item -> bool
  return List of Item
=>
  let out := List(Item)
  for item of list
    if pred(item) and not item.empty
      out.add(item)
  return out

filter(myList, x => x = 5)

shelby3 commented 8 years ago

@SimonMeskens wrote:

What about the wildly popular arrow syntax? It's familiar, due to being included in JavaScript, TypeScript, Java and C# and it's quite concise (unlike using a keyword like def or fn).
inc(x: Int): Int ->
  let y = x + 1
  return y

That solves the : at end to start a new block, but I offered also an enforced indenting style as a solution instead, which removes that noisy arrow.

Your suggestion won't work for eliminating the def or fn as I explained where I offered inc ( with the space between inc and ( instead, because we need efficient LL(k) left-to-right parsing with minimal constant maximum number 'k' of tokens lookahead, but yours is an unknown variable unbounded number of tokens lookahead (for the parser to differentiate it from a function call, as parser has to reach a : to distinguish from function call, and : is optional on each argument for non-top-level functions).

I potentially like your idea of consistency between the syntax of anonymous lambda functions and regular function implementations (but maybe they shouldn't be equated since one is not a singleton, is not named, and other is a named, singleton?), but you've rendered the new block syntax inconsistent with if as follows (and I'm not sure which consistency should be our the priority choice?):

if x
  something

Also this would be inconsistent with -> to start new block:

if x:
  something

SimonMeskens commented 8 years ago

That seems fair enough, was just offering some ideas. Personally, I really like homoiconic languages, but I'm not sure how feasible that is for this one. Here's a good article on it:

https://blogs.oracle.com/blue/entry/homoiconic_languages

shelby3 commented 8 years ago

I am very sleepy so I hope the following is coherent.

On further thought, I must overrule my prior suggestion for an enforced number of spaces for indenting. As much as I would like to have this consistency, I don't think it will be popular because any choice will offend most people who don't like that choice. Number of spaces for indenting is a personal preference.

I looked at the grammar I wrote in May and now I remember how I solved this problem and maximized consistency.

First we must consider that these and if-else will also optionally be used without a code block and also that if-else will replace the ternary operator to simplify the language (more consistency) and because the if x: or if (x) at the start has lesser chance of precedence errors than "string" + x ?. Also I force a code block for nested if (as shown below).

So I suggest that if (x) is superior to if x: because the : when a new code block doesn't follow (i.e. what follows is on same line), then it could make be confused with a type annotation (or even a cast if we ever used that : Type format for casts). Also the : is more invisible when crowded.

if (x)
  true
else
  false

if x:
  true
else
  false

if (x) true else false
if x: true else false

if (x)
  if (y)       // may not put this at end of prior line, because it is a nested `if`
    true
  else
    false
else
  if (y)       // may not put this at end of prior line, because it is a nested `if`
    false
  else
    true

So if we choose if (x), then consistency with function definitions is lost in terms of the character that separates the code block, or code on same line. Thus I am comfortable with @SimonMeskens' idea except I favor the fat arrow (=>) because JavaScript and Scala use it and it looks more like an assignment symbol (=) so we can consistently use the same symbol for code block, one line function code (see below), and lambda (aka arrow) functions. Also afaik -> has typically been used for type signatures and not anonymous (unnamed) lambda functions, although Java 8 uses ->. The following definition now looks like assigning a lambda function to the function name, which is actually how JavaScript does it for { inc : (x) => return x + 1 }.

inc = (x: Int): Int =>
  let y = x + 1
  return y

inc = (x: Int): Int => return x + 1

Notice above instead of def, fn, or my hokey space idea, I used an = to differentiate from a function call inc(x...) needing only k = 2 tokens of lookahead (note inc no matter many characters is one token returned by the lexer to the parser). It is not shorter than def or fn, but it is more consistent with the usage of lambda functions, because we assign the unnamed lambda function to the function name.

Some people suggest to reverse the direction of the arrow?

SimonMeskens commented 8 years ago

That seems pretty clear to me :+1:

keean commented 8 years ago

Some interesting ideas, I quite like the uniform assignment syntax. Haskell and other functional languages define functions as:

x = 1
f x = 1

It just means you need an LR parser. So I don't see a problem with:

x = 1
f(x) = x

I would suggest using '=' consistently. However I also like the anonymous function syntax and using normal assignment:

f = (x) => 
    x

That looks odd for something like the identity statement.

I would suggest the block begins if there is nothing on the same line after the '=>' .

I think block indent depth should be set by the indent of the first line. Line continuations would be any kind indented more than this.

However there would be problems where there is a block and a line continuation like:

set_callback((x) => 
        print(x)
        x + 1
    ) // Must be at least as indented as the original, but less than the block indent.

What about optionally naming the anonymous function to help debugging. JS allows this:

set_callback(move_right(x) => x + 1)

I think these forms will require an LR parser, but I am okay with that.

keean commented 8 years ago

Should we allow no brackets for a single argument function?

f = x => x + 1

shelby3 commented 8 years ago

@shelby3 wrote:

On further thought, I must overrule my prior suggestion for an enforced number of spaces for indenting. As much as I would like to have this consistency, I don't think it will be popular because any choice will offend most people who don't like that choice. Number of spaces for indenting is a personal preference.

But could we at least agree to disallow tabs for indentation (i.e. only spaces allowed)? My attitude is if it pisses off a few people, then I am unsympathetic to their preference, because if I as the reader don't know what tab width settings the programmer was using, then the blocks of code don't align on columns when I view it. This is an era of open source and code must be transferable without ambiguity.

Most programming text editors these days have an option to automatically convert all [Tab] keypresses to spaces.

Could I get a vote on the above suggestion? (the thumbs or thumbs down on this comment)

shelby3 commented 8 years ago

Another requested vote is should we disallow lines which contain only spaces? This would remind the programmer (and his IDE) to delete such lines from the code. Such lines make for clutter in version control differencing when they are later removed (cleaned up). Not putting them there in the first place would be better.

shelby3 commented 8 years ago

@keean wrote:

Should we allow no brackets for a single argument function?

f = x => x + 1

No. I am going to argue now at this early stage that we need to have some guiding principles. And I think one of those principles should be to not unnecessarily create multiple ways to do the same thing when we can easily avoid doing so (which is one of the complaints against Scala). Especially not creating another variant of syntax just to save two characters. This is inconsistency for the reader. And remember typically there can be 10 - 100 times more readers of each source code base, then there were coders of that source code. We want to minimize the number of variant cases the new programmer has to learn.

keean commented 8 years ago

Okay, that makes sense to me, so it has to be:

f = (x) => x + 1

Presumably we are happy with all sorts of variations on type information like:

f = (x : Int) => x + 1
f = (x) : Int => x + 1
f = (x : Int) : Int => x + 1

shelby3 commented 8 years ago

I am not yet 100% firm on my prior suggestion to unify function definition around the =. I have another comment post coming about that and also responding to your prior comment about that, where you noticed some of the same issues I did.

@keean wrote:

Presumably we are happy with all sorts of variations on type information like:

The multiple ways to optionally annotate type declarations on arguments and result value, seems acceptable to me, because it is a consistent rule that also will apply to reference declarations as well.

shelby3 commented 8 years ago

On this tangent of variant syntax to define an anonymous (unnamed) function, I have liked a shorthand from Scala which enabled very concise expressions:

@keean wrote:

set_callback((x) => x + 1)

set_callback(_ + 1)

That is saving us much more than two characters and it can really remove a lot of verbosity if we further enable it to handle more than one argument.

I disliked that Scala used multiple _ to represent multiple arguments (or did Scala only allow one input argument for this shorthand? I forget), as these are undifferentiated:

set_callback2((x, y) => x + y)

set_callback2(_ + _)

I suggest instead with a limit of 3 arguments:

set_callback(_ + 1)
set_callback2(_ + __)

Note these should not cross function call lexical parenthetical scopes, i.e. the following are not equivalent:

setCallback((x) => f(() => x))
setCallback(f(_))

Normally I would argue against creating a new variant to do the same thing. But the above is so geek-cool, that I think it would be a beloved :heart_eyes: feature. My marketing senses/intuition prioritize love/passion :heart: over rigid design guidelines. Also I think I can make a case that the significant increase in brevity makes it more than just another way to do the same thing. We can't get that same brevity with the other syntax. Thus I argue it is necessary.

Am I mistaken on the love for this alternative syntax?

P.S. afair, Scala even allows the following, which I argue against, as it violates the guideline for not unnecessarily creating variant syntax:

set_callback(_:Int + 1)
set_callback2(_:Int + __)

keean commented 8 years ago

I quite like the _ syntax, but there is probably quite a lot of other uses for it too. If we use it I would suggest we use it as a De Bruijn index, so that we have:

set_callback(_1 + _2)

shelby3 commented 8 years ago

@keean wrote:

set_callback(_1 + _2)

I had that idea also, but it makes it more verbose for the single argument case. If we prefer that, I'd perhaps suggest instead:

set_callback(_ + _2)

But I didn't suggest that because it is inconsistent, violating one of the other guidelines I have for not proliferating special case rules in order to keep the cognitive load for the new programmer reasonable.

Also in Scala that conflicts with the tuple member names. We might want to not reserve numbered names when they can be useful in general for names.

Also there is another reason I favored this:

set_callback(_ + __)

And that is because it visually literally looks like a slot (i.e. "fill in the blank"); whereas, _2 doesn't.

I do understand that _ and __ are somewhat similar in appearance and maybe not as readily differentiated to the eye, but on balance I think I prefer "fill in the blanks" than _2 which doesn't connote anything visually. And beyond 3 arguments is a diminishing benefit and we really need named arguments, because it gets too confusing to track the correspondence (I guarantee bugs :smirk:) even for ____ or _4 (and verbose compared to single letter argument names).

Another possible design choice, is only support it for one argument.

Anyone else want to share an opinion?

P.S. I know the choice is imperfect (I admit this _ + __ looks unbalanced and a bit weird) and I also consider not having the feature because it is not perfect, applying a guideline of not adding marginal syntax that can be done another way which isn't egregious. But I do love the brevity as being visually beautiful/elegant :gem: , which I think is another reason I prefer the connotation visually to slots. What do others think?

keean commented 8 years ago

I find it hard to distinguish _ and __, and it gets even harder with ___ and ____. I think I prefer the Scala way where each slot is simply _ and you have to provide an argument for every slot. I think that is more readable and consistent. We can allow a 'let expression' if you need a slot more than once:

set_callback(let x = _ in x + x)

Although that makes me think how do we differentiate between a single like if and a multi line one? In Python function definition uses : at the end of the line so it would make sense for if. I think blocks (other than functions) might be a different topic, but it might not. It depends if we want consistency between functions and other statement blocks?

shelby3 commented 8 years ago

In the interest of brain dump disclosure...

I wish we could use Unicode symbols, then we could use (although it is a slippery slope towards "symbol soup" dingbats source code):

setCallback(⎵¹ + ⎵²)

Or:

setCallback(⑴ + ⑵)

Unfortunately Unicode doesn't define the character set for numbers above horizontal brackets, so the above are the two best available choices.

I realize we can't use Unicode for something a programmer needs to be able to type in quickly, but the world is changing and there are now configurable customized hardware keyboards and otherwise it is possible on Android, Windows, and Linux to software remap the keys on any hardware keyboard. A programmer can remember to type [Ctrl]+[1] keys simultaneously to generate a remapped ⎵¹ or ⑴. Since this is an optional shorthand feature that programmers don't have to use (and they can always resort to copy+paste if they haven't mapped their keyboard), I am not entirely joking. :laughing:

But I loathe special requirements on system configuration and/or specialized hardware. So this would need to be super compelling and it isn't.

It would also for the time being be a discrimination against the developing world, where they may not have access nor funds (or be forced to use a shared computer in a netcafe) to purchase the custom hardware keyboard.

Edit: Dec. 3, 2017 it’s possible to render with HTML+CSS the Unicode “bottom brackets” and “open box” with the superscripted number shifted in the desired position:

numbered_slots

So I’m thinking the programmer could type _1 and IDEs could optionally render it as above. And __1 for the arguments of inline functions within a function call that is nested within an inline function already employing the _1 shorthand.

Special typing assistance isn’t required, i.e. the text can contain either _1 or ⎵¹. The editor (IDE) could optionally display them as Unicode as I above to make it more concise/pretty. That does cause an issue with inconsistent columnar alignment with different renderings, but I think that is an acceptable tradeoff. There’s no Unicode character which draws the superscripted number over the bottom brackets.

Ditto for example I will propose to allow ➜ instead of ->, but in this case the IDE could also optionally render -> as ➜ but many users might not choose to do so (as again it changes the displayed columnar alignment). I wish I could find a Unicode right arrow which is two characters wide.

shelby3 commented 8 years ago

@keean wrote:

I find it hard to distinguish _ and __

I don't (and I'm blind in one eye), but __ and ___ is borderline for me, especially if I am rushed or my second nature is on auto-pilot.

and it gets even harder with ___ and ____

Agreed. That is why I wrote we couldn't go beyond 3 arguments, maybe not even more than 2.

I think I prefer the Scala way where each slot is simply _ and you have to provide an argument for every slot.

But that isn't orthogonal to ordering, which sometimes can't be re-ordered to fit. And doesn't allow using the slot more than once (which you acknowledged).

And I find that confusing semantically (it is so inconsistent with the way variable names work). It adds cognitive load. I remember that being a source of confusion for me when I was learning Scala and still might be a gotcha when I am not paying careful attention.

We can allow a 'let expression' if you need a slot more than once:

set_callback(let x = _ in x + x)

That violates the generalized guiding principle that I thought we agreed on. It adds specialized syntax to fix the shorthand when the programmer can otherwise revert back to the normal syntax for unnamed function. And afaics that proposal gains nothing significantly justifiable in brevity:

set_callback((x,y) = y in x + x)

keean commented 8 years ago

@shelby3 wrote:

That violates the generalized guiding principle that I thought we agreed on. You are proposing to add specialized syntax to fix the short-hand when the programmer can otherwise revert back to the normal syntax for unnamed function. And afaics you are proposing to gain nothing in brevity:

I thought having an assignment that you can use in an expression would be in line with the 'everything is an expression principal'. It would be usable in any expression, not just in the special short function syntax.

But that isn't orthogonal to ordering, which sometimes can't be re-ordered to fit. And doesn't allow using the slot more than once (which you acknowledged).

If you want to use each variable more than once or in a different order, use the original longer syntax :-)

I am not a huge fan of the _ syntax, but I can live with it, but the further we go down that route, the less I like it. I am tending to think we should stick to 'only one way to do thingsand say you just do(x) => x + x` its not really much longer and I think readability beats conciseness in this case.

shelby3 commented 8 years ago

@shelby3 wrote:

And beyond 3 arguments is a diminishing benefit and we really need named arguments, because it gets too confusing to track the correspondence (I guarantee bugs :smirk:) even for ____ or _4 (and verbose compared to single letter argument names).

and it gets even harder with ___ and ____

Agreed. That is why I wrote we couldn't go beyond 3 arguments, maybe not even more than 2.

The proposed shorthand isn't suitable for more than 2 or 3 arguments, regardless of syntax we choose. It isn't justifiable to have _4, and the programmer should revert to the normal unified syntax for unnamed function definition in that case of so many arguments, because the brevity will be roughly the same due to the fact that single letters a, b, c, d are 4 characters shorter than _1, _2, _3, _4 not counting using a slot more than once.

Thus I logically conclude (including the logic in my prior comments) that the shorthand should only support at most 2 or 3 arguments and the syntax should be _ + __.

Otherwise choose not to have a shorthand syntax.

@shelby3 wrote:

P.S. I know the choice is imperfect (I admit this _ + __ looks unbalanced and a bit weird) and I also consider not having the feature because it is not perfect, applying a guideline of not adding marginal syntax that can be done another way which isn't egregious. But I do love the brevity ...

shelby3 commented 8 years ago

@keean wrote:

I thought having an assignment that you can use in an expression would be in line with the 'everything is an expression principal'. It would be usable in any expression, not just in the special short function syntax.

I failed to communicate my point. I am not making a decision here about whether an "assignment-as-expression" is desirable or not (since afaics that is mostly an orthogonal design issue, and since this wasn't a compelling use-case for it). Rather I am stating it doesn't make the use-case of the shorthand any less verbose.

But that isn't orthogonal to ordering, which sometimes can't be re-ordered to fit. And doesn't allow using the slot more than once (which you acknowledged).

If you want to use each variable more than once or in a different order, use the original longer syntax :-)

That logic is incongruent with my other point:

And I find that confusing semantically (it is so inconsistent with the way variable names work). It adds cognitive load. I remember that being a source of confusion for me when I was learning Scala and still might be a gotcha when I am not paying careful attention.

Point is that if we add the shorthand, then it should respect normal semantics of variable names. This is one of those insights into language design that Scala's designers apparently didn't do well in some (many or most?) cases.

I am not a huge fan of the _ syntax, but I can live with it, but I am tending to think we should stick to 'only one way to do things...

Didn't you write otherwise upthread? Or perhaps you meant all design uses for _ such as partial application?

I quite like the _ syntax, but there is probably quite a lot of other uses for it too.

Any way, holistic design requires a lot of thought, so it is easy to end up flipflopping.

I generally agree of course in the stated principle that "readability beats conciseness".

...and say you just do (x) => x + x its not really much longer and I think readability beats conciseness in this case.

It isn't just the issue of being concise in string length, but also the crowdedness (symbol soupy-ness) of the juxtaposed double (( followed in close proximity by ) =>:

setCallback((x) => x + x)
setCallback(_ + _)
setCallback((x,y) => x + y)
setCallback(_ + __)

I have one more alternative syntax idea for the shorthand, assuming that we will set a rule that all variable names are lowercase (or at least that all variable and function names begin with a lowercase letter or underscore):

setCallback((x) => x + x)
setCallback(A + A)
setCallback((x,y) => x + y)
setCallback(A + B)

That would conflict with the convention of uppercase letters for type parameters (such as inside of typeclass method bodies), but we could require type parameters to be a minimum of two characters (or just disallow A, B, C, ... as many arguments we want for type parameters).

But I still think I prefer the underscore as it looks like a slot. Otherwise drop the shorthand proposal.

shelby3 commented 8 years ago

I love that we are putting all this design thought into the open source. :clap:

I think I may have a unified design solution to the shorthand goal. Let me think it out first.

keean commented 8 years ago

I think structs/objects would probably start with an upper case letter. On reflection I think I would prefer to keep the 'one right way to do things' principle, and sacrifice the '_' for now. We can revisit later (maybe when we have used it a bit - self hosting the compiler should be the first project).

keean commented 8 years ago

I have updated the parser for single line function definitions (block syntax is not done yet). Given the following source code (note the function is given a name to help debugging if it is used as a callback etc):

id = id(x) => x

The parser generates the following (not really abstract) AST for this:

'blk' : [{
    'ass' : 'id', 'exp' : {
        'fn' : 'id', 'args' : ['x'], 'body' : {
            'blk' : [{'var' : 'x'}]
        }
    }
}]

and feeding that into the code generator results in:

id=function id(x){x;};

shelby3 commented 8 years ago

@keean wrote:

Some interesting ideas, I quite like the uniform assignment syntax.

I will attempt herein to salvage (rescue) it (against flaws discovered). Seems others such as perhaps @SimonMeskens are amenable to it also?

I would suggest using = consistently. However I also like the anonymous function syntax and using normal assignment:

Agreed.

If we use (x, y, x) for the tuple construction (and destructuring) sugar syntax (presuming we will reserve [ ... ] correspondingly for arrays), then the grammar I proposed was not LL(k) for any bounded k, because the (x, y, x) => at start of anonymous function is not distinguishable from a tuple coming from the left side (LL(k)).

Thus to replace most of the brevity and reduction of crowded symbol soupy-ness provided by the stillborn, proposed shorthand, my new idea is to replace it with a different syntax for the anonymous function (assuming this won't cause any LL(k) grammar conflicts):

setCallback(x => x + x)
setCallback(x,y => x + y)

And so then consistency of assignment is restored, because the following is not instantiation of a new function reference inc rather it is assignment of a new function to an existing reference:

inc = x:Int => x + 1

Result (return) types will nearly always be inferred.

To instantiate a new function within a function body and not declare a result type:

let inc = x:Int => x + 1

Alternatively to instantiate a new function within a function body (and if necessary to declare the result type):

let inc(x: Int): Int => x + 1

The let is required to distinguish from a function call/application. I wanted to eliminate fn or def in as many scenarios as I could, and by doing that I think we should unify on the same keyword for declaring new references, not a special keyword for function references.

To instantiate a new function outside a function body, such as typeclass interface methods (i.e. with the body of a trait or what ever keyword we end up with for typeclass interfaces):

inc(x: Int): Int => x + 1

I think these forms will require an LR parser, but I am okay with that.

I think my suggestion above is LL(k) and a CFG, but I need to check it after we've homed in our design decisions.

Isn't LR going to make the grammar more obtuse and less efficient than a LL(k) lookup table approach such as provided by the SLK tool? I realize you are employing parser combinators to prototype this, but I am also thinking about JIT performance later.

keean commented 8 years ago

Actually, even this:

inc(x: Int): Int => x + 1

is LL(k) because the 'k' means we can backtrack indefinitely. The prototype parser can already distinguish these cases. First it tries the function-literal parser (the function definition is literal like an integer literal) then it tries the function-application parser. We lose a little efficiency at the moment, but I don't think it will cause a performance problem.

What we do need is a way of doing scoped assignment.

shelby3 commented 8 years ago

Also we need to design about how typeclass bounds and function-level type parameters fit into this.

Afair, Rust always employed a type parameter with a trait (i.e. typeclass) bound, isn't that unnecessary? Why do we ever need function-level type parameters if we are offering typeclass bounds? So can't we treat typeclass bounds the same as annotation of a data type?

Do we need higher-ranked types and why? Remember the example from the Rust forum.

shelby3 commented 8 years ago

@shelby3 wrote:

Result (return) types will nearly always be inferred.

To instantiate a new function within a function body and not declare a result type:
let inc = x:Int => x + 1
Alternatively to instantiate a new function within a function body (and if necessary to declare the result type):
let inc(x: Int): Int => x + 1

I have an idea to unify this further.

Seems above the only reason to need two ways, is because inability to declare the return type on the anonymous function syntax. But we can fix that simply by optionally allowing parenthesis but only when a return type is declared:

let inc = (x: Int): Int => x + 1

There is more justification than your idea which I downvoted. I say this so as to not appear I am being arbitrary in my logic.

keean commented 8 years ago

Because type-classes can be multi-parameter and express relations between types like this:

f<A, B>(x : A, y : B) where Compatible<A, B> => ...

Rust makes the first type parameter special, and puts it before the typeclass like this:

f<A, B>(x : A, y : B) where A : Compatible<B> => ...

but really in a relation there is no 'special' type, so I am not sure. The Rust way does help with namespacing and IDE method lookup though.

I think its important functions can be declared inline in an exression (the value of the function literal is the function itself), and that there be a form of scoped assignment that can be used in expressions.

shelby3 commented 8 years ago

@keean okay so in your example B isn't bounded and is instead Any (the top type) unless Compatible was defined at its declaration site to have bounds on its second type parameter. Correct?

keean commented 8 years ago

@shelby3 I am not sure I understand? Its much simpler, you must pass x and y of the correct types. Those types are determined by the instances of Compatible, for example using Rust syntax:

trait Compatible<B> {}
impl Compatible<Int> for Int {}
impl Compatible<Float> for Int {}

f(3, 3) // okay
f(3, 3.0) // type error

I prefer something more like Haskell where there is not implicit self in the traits:

trait Compatible<A, B> {}
impl Compatible<Int, Int> {}
impl Compatible<Float, Int> {}

keean commented 8 years ago

Actually a list is an interface (type-class) right?

trait List<A> {...}

Remember the advantage of an interface is that the user can define new implementations.

But one way to implement a list is something like (in Rust syntax):

pub enum List<A> {
    Nil,
    Cons(A, Box<List<A>>),
}

I would prefer the more concise Haskell datatype syntax:

data List x = Cons x List | Nil

The way to understand this is List is the type/class name, Cons and Nil are alternative constructors for the tagged-union. In Haskell types are boxed hence no need for Rust's Box.

shelby3 commented 8 years ago

@keean wrote:

Remember the advantage of an interface is that the user can define new implementations.

What I am saying is that some type parameters of a typeclass may not need any implementation, because the typeclass's methods treat that parameter as the top type and never access any properties of it, which is the case for a List<A>. A List only needs one implementation. Also some type parameters (of a typeclass) may be bounded by typeclasses.

We need a syntax for expressing this in the typeclass (trait) definition. We have this requirement because our design extends Rust's design to include unions of data type not just trait objects for polymorphism.

keean commented 8 years ago

Personally I don't like type systems that have a 'top' type (any). I don't see that we need one so far, I am not sure why you think we need one.

I don't think I get your point :-(

shelby3 commented 8 years ago

@keean actually I was also thinking that "top type" may not be the correct term to refer to what I am describing. I not sure what the term is to describe a type parameter that can be anything and which is treated as a black box because none of its properties are accessed. In a type system with subtyping, it is the top type. I think we only need Any (top type) for the compiler's internal logic, not in source code.

Why don't you get my point? A List stores references. None of the implementation of List<A> cares what the type of A references are. Only its callers care what A is. And A needs to be tracked so callers respect its subtyping invariance. We will have subtyping of unions (can't avoid that with structural unions), i.e. an Int (tagged) instance is assignable to an Int|String reference.

shelby3 commented 8 years ago

@keean wrote:

Actually, even this:

inc(x: Int): Int => x + 1

is LL(k) because the 'k' means we can backtrack indefinitely.

You mean the parser can backtrack from an unbounded amount of lookahead from the => to determine whether it was a function call/application or a function definition/declaration.

But afaik this makes it impossible to build a LL(k) lookup table in order to make parsing super fast?

Note SLK does support infinite backtracing so if we must have that, then we can still use SLK to double-check the parser combinator prototype's correctness:

Predictive backtracking, also called LL(inf) or PEG, can be used selectively on the nonterminals that LL(k) is unable to handle. Predictive backtracking predicts which production to use by trying to parse each one in turn, and returning the first one that parses without error to the mainline parser. Semantic actions are not executed during backtracking, so the process is transparent to the programmer.

Also the advantage of putting the grammar in an ENBF file and checking it with SLK, is we have separation-of-concerns w.r.t. to grammar specification and other implementation code.

The SLK parser generator can also be used to aid in the development of hand-written recursive descent compilers. The -Da option can be used to display the parse table in readable text form. Only the valid entries are included, making it well suited to translation to recursive descent. The conflict table is also displayed as well as the FIRST and FOLLOW sets.

Unfortunately SLK isn't open source, but EBNF is more-or-less a transportable grammar format. I haven't found an open source tool that is as succinct.

shelby3 commented 8 years ago

In other words, since a List is always prepended with a Cons, then invariance is not broken when prepending to the union a new type that wasn't already in the list. The compiler will naturally detect this from the types and the code structure. But for Array the compiler will issue an error, because Array is not an invariant (immutable) data structure. Other references to an instance of Array would become invalid if Array<A> was allow to be variant on A.

let x = Array(0, 1, 2)
x[0] = "string"    // compiler error, Array<Int> can't assign a String to an Int

Because x might be a reference far away from our lexical scope that we don't know about when we modify the parameter type of the Array.

Scala, Java, TypeScript (unsoundly), and N4JS are examples of languages which try to support covariant and contravariant type parametrization, but we will force all to be invariant for unions. Because there is no good way to handle the variance without subsuming to a subclass or trait object. Since we discarded subclassing, we have no use for variance except maybe (?) for trait objects. So our compiler must enforce invariance.

Edit: more coherent explanation.

keean commented 8 years ago

EBNF does not control the actions to be taken. What you want is something like an "Attribute Grammar" but even then its hard to produce the side effects you want (especially error reporting). No production language that I know of uses a parser-generator because it does not give you enough control over the parsing process. In my experience of writing parsers, its better to have everything in one place. Parser combinators allow nice things like this:

var identifier = Parsimmon.regexp(/[a-z][a-zA-Z_]*/);
var variable = identifier.map(function(id) {
    return {'var' : id};
});
var arg_list = Parsimmon.sepBy(identifier.skip(space), comma);

So the output of the arg_list parser is a list of variable definition objects, and we can continue to combine parsers, each of which individually produce the correct AST fragments like this:

var expression = fn_lit.or(fn_app).or(variable).or(int_lit).skip(space);

Which means an expression is a function definition, function application, variable or literal, followed by (0 or more) spaces (or tabs).

I find the combinators much easier to work with, they are an idea that came from functional programming, see the Parsec paper: http://research.microsoft.com/en-us/um/people/daan/download/papers/parsec-paper.pdf

We may end up having to hand-roll the parser anyway to get really good and readable error reports, but for now I would rather use the parser combinators.

Their modular nature makes managing and especially changing big parsers much easier than dealing with recursive decent. Parsimmon is also the second fastest JavaScript parser tool see: https://sap.github.io/chevrotain/performance/

keean commented 8 years ago

An array with a type parameter Array<A> is monomorphic, so a function generic on the array type parameter will accept an array of containing any type, but all elements in the array must be the same.

However those 'all the same' elements can be anonymous structural unions, but the union would be the same for all array elements. This means once the array is created you cannot add a type that is not in the union, and the type of the union would have to be computable statically by the type-system.

shelby3 commented 8 years ago

@keean I am copacetic and enthusiastic about your prototyping with Parsimmon. My points are:

That if we can avoid LL(inf) aka LL(∞) and prefer bounded 'k', then better.
I will probably check the grammar in SLK to be sure, and this will provide a concise specification (for handrolling later or what ever). Also this lets me experiment with checking the conflicts and 'k' lookahead required from grammar changes faster (more interactively).

EBNF does not control the actions to be taken. What you want is something like an "Attribute Grammar" but even then its hard to produce the side effects you want (especially error reporting). No production language that I know of uses a parser-generator because it does not give you enough control over the parsing process.

I haven't tried yet, but I think I can integrate into the table lookup efficiency of SLK. I will dig into those details later. Any way, it is great to have the Parsimmon prototype. We need a working language to experiment with. What ever is best long-term will work itself out. This is open source.

keean commented 8 years ago

I think the goal should be to self-host the language, so we will need some kind of parsing tool to generate ZenScript... In a way some kind of built-in EBNF to replace regular expressions as part of an Attribute Grammar might be a way to do that.

Personally I wouldn't bother implementing the 'non-prototype' in any language other than ZenScript, if it really is what we are hoping it will be.

shelby3 commented 8 years ago

@keean Agreed!

shelby3 commented 8 years ago

@keean wrote:

An array with a type parameter Array<A> is monomorphic, so a function generic on the array type parameter will accept an array of containing any type, but all elements in the array must be the same.

However those 'all the same' elements can be anonymous structural unions, but the union would be the same for all types. This means once the array is created you cannot add a type that is not in the union.

Array<A> is monomorphic but the instances of any quantified type for A don't have to be monomorphic. The type of A employed may be a virtualized subclass (in a subclassing language) or a trait object (in Rust). This distinction is important because for example the setElement method for an array will accept any subtype of A. So the compiler has to enforce covariance of A w.r.t. Array<A> when assigning elements.

For our union types, a Bool is not a subtype of an Int|String, so it can not be assigned. So our compiler will refuse to allow variant Array<A> for unions. But the design of List<A> is that when a new element is prepended, the Cons<A> points to the prior List<Int|String> in its tail and the new type Bool in its head, with the result new type of List<Bool|Int|String>. The compiler will not complain because no subtyping relationships have been violated.

I mispoke to describe that in the past as "immutability". Rather it is enforcing covariance. For union types, the type parameter must be invariant.

Edit: Is Rust and will our version of "trait objects" be invariant when assigning a trait object to another trait object, i.e. Array<B> to Array<A> because afair trait objects don't extend trait objects? Remember we were discussing whether traits (typeclasses) have any subtyping relationship.

keean / zenscript

Function Syntax #6