keean / zenscript

A trait based language that compiles to JavaScript
MIT License
42 stars 7 forks source link

Syntax summary #11

Open shelby3 opened 7 years ago

shelby3 commented 7 years ago

I will maintain in this OP a summary of current proposed syntax as I understand it to be. Note this is not authoritative, subject to change, and it may be inaccurate. Please make comments to discuss.

: Type is always optional.

  1. Sum, Recursive, Record, and Product parametrized data type with optional Record member names and optional Record (e.g. Cons) and Product (e.g. MetaData) names:

    data List<T> = MetaData(Nil | Cons{head: T, tail: List<T>}, {meta: Meta<T>})

  2. Typeclass interface and implementation:

    typeclass Name<A<B>, ...>   // or 'pluggable'?; optional <B> for higher-kinds¹?
     method(B): A<B>           // TODO: add default arguments, named arguments?
    
    List<A> implements Name
     method(x) => ...
  3. References:

    let x:Type = ...      // final assignment, not re-assignable (same as const in ES5/6)
    var x:Type = ...      // (same as let in ES5/6)
  4. Functions:

    Type parameters do not require declaration <A,B>.

    someCallback(x:Type y:Type(:Type) => x + y)     //            also  ():Type => x + y
    var f = x:Type y:Type(:Type) => x + y           // not named, also  ():Type => x + y
    f = x:Type y:Type(:Type) => x + y               // not named, also  ():Type => x + y
    let f(x:Type, y:Type):Type => x + y             // named f,   also f():Type => x + y
    let parametrized(x: A, y: B):A|B => x || y
    let parametrizedWhere(x: A, y: B):A|B where ... => x || y

    Note that iterator types can be specified for the return value to return a lazy list as a generalized way of implementing generators. The optional (:Type) is necessited for generator functions. Note the (x: Type y: Type): Type => x + y form is unavailable.

  5. Assignment-as-expression:

    if ((x = 0))      // force extra parenthesis where expected type is Boolean

† Not yet included in the syntax, as would be a premature optimization. May or may not be added. ¹ https://github.com/keean/zenscript/issues/10

keean commented 7 years ago

I am not sure I like pluggable for a type-class. If it's going to have that many letters either interface or typeclass would be better.

interface List<A>

I am not sure we want to use | for both sum types and union types.

I prefer having 'implementation' before the type-class having the type class first for implements seems inconsistent to me. I also prefer to treat all type class parameters equally. The first is not special so why give it special syntax.

implement List<A>

I am not sure why you put types in the function call syntax? I don't think you need or want them, you only want typed in function definitions.

I don't like that the method syntax is different from the function definition syntax. I think we should have a unified record/struct syntax. If we have:

data List<A> = List(
    append : (l1 : A, l2 : A) : A
)

data MyList
let mylist : List<MyList> = List(
    append : (l1 : MyList, l2 : MyList) : MyList =>
        ... definition ...
)

A record above is like a type-class but you can pass it as a first class value.

If we can 'promote' this to implicit use, we can have a single unified definition syntax. Maybe:

let list3 = mylist.append(list1, list2) // use explicitly
use mylist // make mylist implicit
let list6 = append(list4, list5) // use implicitly
shelby3 commented 7 years ago

@keean wrote:

I am not sure I like pluggable for a type-class. If it's going to have that many letters either interface or typeclass would be better.

Can't be interface because it would be confused with the way interface works in many other OOP languages. To me as a n00b, typeclass means class, so more misunderstandings. pluggable has some meaning to a n00b such as myself. Sorry I am not an academic and they are only something like 0.01 - 0.1% of the population.

Q: "What is a pluggable API?" A: "It means that you can replace the implementation."

I personally can tolerate typeclass.

I am not sure we want to use | for both sum types and union types.

Why not? Sum types are an "or" relationship. Unions are an "or" relationship.

I prefer having 'implementation' before the type-class having the type class first for implements seems inconsistent to me.

Inconsistent with what? implementation Thing Nil or implementation Nil Thing are not sentences and it is not clear which one is which. Nil implements Thing is a sentence and very clear which is the typeclass.

I am not sure why you put types in the function call syntax?

Afaik, I didn't. What are you referring to?

keean commented 7 years ago

Ah I see:

someCallback(x:Type,y:Type => x + y)

This is ambiguous... is it calling someCallback with 'x' as the first parameter and y => x + y as the second? This would seem less ambiguous:

someCallback((x:Type,y:Type) => x + y)
shelby3 commented 7 years ago

@keean wrote:

This is ambiguous... is it calling someCallback with x as the first parameter and y => x + y as the second?

Good catch. I missed that one. It indeed conflicts with comma delimited groups in general, not just function calls. I will remove after sleep.

You didn't point out that problem to me when I suggested it. Remember I was trying to make the inline syntax shorter, to avoid the _ + __ shorthand problems.

Edit: there is another option (again :Type are optional):

someCallback(x:Type y:Type => x + y)

But that is still NFG! Because it is LL(inf) because without the leading ( it must backtrack from the space, unless we require Type to be a single token in that context (i.e. use type if need to define complex type as one token).

keean commented 7 years ago

Personally I would rather have a single syntax for function definitions. If that is (with Type optional):

let f = (x:Type, y:Type) : Type => x + y

Then passing to a callback would be:

someCallback((x:Type, y:Type) : Type => x + y)

and then things are consistent. I think keeping things short is important, but I think consistency is even more important.

shelby3 commented 7 years ago

@keean the only point was to have an optional shorthand syntax (instead of the inconsistent semantics of _ + _ or the obfuscating _ + __) for inline functions and to get rid of those gaudy juxtaposed parenthesis someCallback((....

Thus we don't need : Type for the shorthand syntax, so I propose the following optional shorthand syntax which eliminates the LL(inf) problem as well:

someCallback(x y => x + y)

Which is shorter than and removes the garish symbol soup (( from:

someCallback((x,y) => x + y)

That being generally useful shorthand, enables your request for an optional syntax in the special case of single argument (which I was against because it was only for that one special case):

someCallback(x => x + x)

Instead of:

someCallback((x) => x + x)

However, it isn't that much shorter and the reduction in symbol soup isn't so drastic, so I am wondering if it is violating the guiding principle that I promoted?

Short inline functions might be frequent? If yes, then I vote for having the shorthand alternative since it it would be the only way to write a more concise and less garish inline function in general for a frequent use case. Otherwise I vote against.

keean commented 7 years ago

Are we optimising too soon? I have implemented the basic function parser for the standard syntax, is that good enough for now? I think maybe we should try writing some programs before coming back to optimise the notation. I would suggest sticking to "only one way to do things" for now, because that means there is only one parser for function definitions, which will keep the implementation simpler for now. What do you think?

shelby3 commented 7 years ago

Thanks for reminding me about when I reminded you about when you reminded others to not do premature optimization.

I agree with not including the shorthand for now. Then we can later decide if we really benefit from it. I'll leave it in the syntax summary with a footnote.

keean commented 7 years ago

The compiler can now take a string like this

id = (x) => x
id(42)

compile it to:

id=function (x){return x;};id(42);

Next thing to sort is the block indenting, and then it should be able to compile multi-line function definitions and application.

keean commented 7 years ago

I think we should have an provisional section, so we can split the syntax into currently implemented, and under consideration.

shelby3 commented 7 years ago

@keean wrote:

I think we should have an provisional section, so we can split the syntax into currently implemented, and under consideration.

I'll do if the † instances become numerous enough to justify duplication.

shelby3 commented 7 years ago

Link to discussion about unification of structural lexical scope syntax.

keean commented 7 years ago

Are we sure having keywords let and var is the right way to go? If we have keywords for these we might want to have a keyword for functions? I quite like Rust's fn for introducing function definitions?

shelby3 commented 7 years ago

@keean wrote:

Are we sure having keywords let and var is the right way to go? If we have keywords for these we might want to have a keyword for functions? I quite like Rust's fn for introducing function definitions?

Instead I have proposed unified functions around let and var.

What would be the alternative to not having let and var? I can't think of one that makes any sense. How would you differentiate re-assignment from initialization? Remember we already decided we can't prefix the type for reference initialization, because types are optionally inferred.

shelby3 commented 7 years ago

@keean wrote:

I think structs/objects would probably start with an upper case letter.

Agreed.

My suggestions on types of name tokens for the lexer to produce:

The exclusivity for type parameters for all uppercase is so they don't have to be declared with <A,B...>.

Edit: the distinction between named functions and non-functions references will be useful, because unnamed functions references should be rarer. However, I was incorrectly thinking that it wouldn't make any sense to give function naming to unnamed function references (which have re-assignable references) because the reference would indicate it is for a function but I had the incorrect thinking the reference could be reassigned a non-function type (but reference types can never change after initial assignment). So I think it would be safe to change the above to:

The other advantage of that is the lexer can tell the parser to expect a function, which is more efficient and provides two-factor error checking.

Note the compiler must check that the inferred type of the reference matches the function versus non-function token for the name.

keean commented 7 years ago

(Aside: Very few languages have a clean lexer and often you end up with lexer state depending on compiler state (string literals are a classic example). One of the advantages of parser combinators like Parsec is that you can write lexer-less parsers, and that cleans up the spaghetti of having the lexer depend on the state of the parse. )

Conclusion, nothing is going to be perfect.

My favourite would be:

datatypes and typeclasses : [A-Z][a-zA-Z0-9][']+ functions and variables : [a-z][a-zA-Z0-9][']+

This would have both type variables and value variables lower case.

I like the mathematical notation of having a 'prime' variable:

let x' = x + y
shelby3 commented 7 years ago

Comment about function syntax. Edited the OP to reflect this change.

@keean note where is already documented for functions in the OP.

shelby3 commented 7 years ago

@keean wrote:

  • If we do not introduce type-variables, we need to have different cases for type variables and types.

Agreed.

For readers, by "If we do not introduce type-variables" you mean if we do not prefix <A, B, ...> in front of functions. I am not proposing to remove that when it is a suffix of a type name.

  • There are not enough cases, as I would like to have something different for variables, types, and type-classes...

You have a point, but it is not an unequivocal one. We can require typeclasses begin with a lowercase i or uppercase I followed by a mandatory uppercase letter. If we choose the lowercase variant, we can disallow this pattern for function names.

It not only helps to read the code without syntax highlighting (and even 'with', if don't want rainbow coloring of everything), it also speeds up the parser (because the lexer provides context).

Very few languages have a clean lexer and often you end up with lexer state depending on compiler state (string literals are a classic example)

If the string literal delimiters are context-free w.r.t. to the rest of the grammar, then the lexer can solely determine what is inside a string literal and thus not parse those characters as other grammar tokens (aka terminals). Edit: the proposed paired delimiters will resolve this issue.

I believe if the grammar is context-free (or at least context-free w.r.t. to name categories) this will reduce conflation of lexer and parser state machines. That is why I suggested that we must check the grammar in a tool such as SLK, so that we can be sure it has the desirable properties we need for maximum future optimization. I am hoping we can also target JIT and ZenScript become the replacement for JavaScript as the language of the world. Perhaps the type checker for our system will be simpler than subclassing and thus performant. Even Google now realizes that sound typing is necessary to solve performance and other issues.

One of the advantages of parser combinators like Parsec is that you can write lexer-less parsers, and that cleans up the spaghetti of having the lexer depend on the state of the parse.

I still need to come up to speed on the library you are using to know what I think about tradeoffs. Obviously I am in favor of sound principles of software engineering, but I really can't comment about the details yet due to lack of sufficient understanding. I will just say I am happy you are working on implementing and I hoping to catch up and also look at other aspects you may or may not be thinking about.

I don't like camel case :-( and prefer values and functions to_be_named_like_this.

The _ is verbose (also symbol soup) and I try to avoid where ever I can. I try to use non-function references that are single letters or words. But function references very often can't be single words. Also calling methods with . gets symbol soup noisy when there are also _ symbols in there. I do understand that camel case for values (references) is similar to the camel case that is in type names and only difference being the proposed upper vs. lowercase first letter (and then further overloaded by the i variant of the proposal above for distinguishing typeclasses); but this is irrelevant because function names do not appear in typing declarations (unless we opt for nominal typing of functions which I am not sure what that would mean).

Note I had a logic error in my prior comment, in that single word function and non-function names were indistinguishable in what I proposed. But that doesn't destroy the utility of the cases where function names are camel case.

datatypes and typeclasses : [A-Z][a-zA-Z_0-9][']+

I want to make what I think should be a convincing rational point about proper names.

I don't like _ in type names. For me a type name should read like a proper name, headline or title where each word has its first letter capitalized. We don't put such punctuation in a title normally in English. Simulating spaces with _ is ugly symbol soup. It removes the elegance of a title. It is better to just keep the first-letter capitalization and smash together without the spaces. Instead you prefer to remove the first-letter capitalization and convert spaces to _, which is removing the first-letter capitalization attribute of a title which is the sole attribute that differentiates a proper name from other forms of English. Spaces are not the differentiating attribute of proper names. If you instead proposed to retain first-letter capitalization after each _, you would have a more consistent argument (but I would still argue that the _ is noise symbol soup redundancy since have the camel case to distinguish words).

So I can objectively conclude your preference is not consistent to types as proper names, headlines, or titles, which is what they are.

<joke>You are British, so you should be more proper than me, lol.</joke> Although my last name is "Moore" and first name was a family name "Shelby" originating from north England meaning "willow manor". And I've got "Hartwick" (German), "Primo" (southern France/Italian) and "Deason" (diluted Cherokee native American) ancestry as well.

I like the mathematical notation of having a 'prime' variable:

let x' = x + y

I don't think I have an objection to this as a suffix only. ~Why not allow unicode subscript characters as well?~(Edit: we have array indices for this)

Edit: however one issue with camel case and no underscores is when an entire word which is an acronym is not delimited by the capitalization of the word which follows it, e.g. NLFilter (for NL as an acronym for newline). In that example, I might prefer to name it NL_Filter, i.e. the underscore only allowed when it follows and is followed by a capitalized letter.

shelby3 commented 7 years ago

@keean wrote:

datatypes and typeclasses : [A-Z][a-zA-Z_0-9][']+ functions and variables : [a-z][a-zA-Z_0-9][']+

You didn't differentiate from ALLCAPS type parameters above. Also your regular expression seems incorrect, as + means 1 or more. Perhaps you are employing a syntax that is peculiar to your Parsec library?

Note that JavaScript allows $ in names, so if we want full interoperability then we need to allow it. Perhaps there are other ways we could work around and support interoperability with the $? Note JavaScript also supports some Unicode, but if we support that we are allowing ZenScript source code to resemble Dingbats art. Perhaps we should only allow $ and Unicode in names that have been declared as FFI?

So the ' will be emitted to JavaScript names as $prime same as for PureScript because it (nor the correct  ′ symbol) is not a valid character in identifier names? Or we could convert these to single and double x̿ (x̿) overline characters (or single and double vertical line above) characters which are valid for JavaScript identifiers names. Should we also offer the π, τ, , , (or more correctly gamma γ), 𝑒, and φ symbols or entire Greek alphabet αβγδεζηθικλμνξοπρςτυφχψω as identifier names since they are valid for JavaScript? Ditto double-struck alphanumerics 𝕒𝕓𝕔𝕕𝕖𝕗𝕘𝕙𝕚𝕛𝕜𝕝𝕞𝕟𝕠𝕡𝕢𝕣𝕤𝕥𝕦𝕧𝕨𝕩𝕪𝕫𝔸𝔹ℂ𝔻𝔼𝔽𝔾ℍ𝕀𝕁𝕂𝕃𝕄ℕ𝕆ℙℚℝ𝕊𝕋𝕌𝕍𝕎𝕏𝕐ℤ𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡, mathematical gothic 𝔞𝔟𝔠𝔡𝔢𝔣𝔤𝔥𝔦𝔧𝔨𝔩𝔪𝔫𝔬𝔭𝔮𝔯𝔰𝔱𝔲𝔳𝔴𝔵𝔶𝔷𝔄𝔅ℭ𝔇𝔈𝔉𝔊ℌℑ𝔍𝔎𝔏𝔐𝔑𝔒𝔓𝔔ℜ𝔖𝔗𝔘𝔙𝔚𝔛𝔜ℨ (also 𝖆𝖇𝖈𝖉𝖊𝖋𝖌𝖍𝖎𝖏𝖐𝖑𝖒𝖓𝖔𝖕𝖖𝖗𝖘𝖙𝖚𝖛𝖜𝖝𝖞𝖟𝕬𝕭𝕮𝕯𝕰𝕱𝕲𝕳𝕴𝕵𝕶𝕷𝕸𝕹𝕺𝕻𝕼𝕽𝕾𝕿𝖀𝖁𝖂𝖃𝖄𝖅), and mathematical script 𝓪𝓫𝓬𝓭𝓮𝓯𝓰𝓱𝓲𝓳𝓴𝓵𝓶𝓷𝓸𝓹𝓺𝓻𝓼𝓽𝓾𝓿𝔀𝔁𝔂𝔃𝓐𝓑𝓒𝓓𝓔𝓕𝓖𝓗𝓘𝓙𝓚𝓛𝓜𝓝𝓞𝓟𝓠𝓡𝓢𝓣𝓤𝓥𝓦𝓧𝓨𝓩 (also 𝒶𝒷𝒸𝒹ℯ𝒻ℊ𝒽𝒾𝒿𝓀𝓁𝓂𝓃ℴ𝓅𝓆𝓇𝓈𝓉𝓊𝓋𝓌𝓍𝓎𝓏𝒜ℬ𝒞𝒟ℰℱ𝒢ℋℐ𝒥𝒦ℒℳ𝒩𝒪𝒫𝒬ℛ𝒮𝒯𝒰𝒱𝒲𝒳𝒴𝒵)?

Here is what I arrive at now in compromise:

I like the leading I on typeclasses, so we capture the notion they are interfaces without conflating the keyword with the incompatible semantics of interface in other programming languages.

Edit: no need to allow uppercase in non-function references. Who on God's earth is using camel case for variable (i.e. non-function) reference names? :laughing:

shelby3 commented 7 years ago

@keean wrote:

Please no all caps, it's like shouting in my text editor :-(

:eyes:

Type parameters will nearly always be a single letter. We both must compromise to what is rational. I have compromised above forsaking required camel case on functions. I also compromised (well more like I fell in love once we eliminated need for subclassing syntax) and accepted Haskell's data unification of Sum, Product, Record, and Recursive types.

I don't want the noise of declaring <A, B...> on functions. That is egregiously more DNRY noisy, than any choice between uppercase and lowercase single letters. Function declarations are too cluttered.

Also the lowercase letter choice for type parameters is not idiomatic and it is has no visual contrast in the x: a arguments. You can't compare to Haskell, because Haskell puts the function type declaration on a separate line. Sorry the lower case type names don't work once we merge typing into the same line.

Also type parameters are types, thus they should not be lowercase. That would be inconsistent with our uppercase first-letter on all types.

The lowercase type parameters of Haskell (combined with lack of <>) still causes me to not be able to read Haskell code quickly. It took me many attempts at learning Haskell where I failed, because of differences like that from mainstream Java, C++ languages.

If you are making a Haskell language, I don't think it will be popular. I am here to make a popular language, thus I will resist you on this issue.

One of my necessary roles here is to provide the non-Haskell perspective.

Let's do something very cool and eliminate the need to declare <A, B...>. We need advantages to our language in order to attract love and attention. Programmers love DNRY.

ML prefixes type variables with a character (happens to be 'a but could be anything)

:anguished:

I absolutely hate that. First time I saw that, I was totally confused. And I hate Rust's lifetime annotations littering the code with noise. I don't like Haskell and ML syntax. Not only am I lacking familiarity (not second-nature) with their syntax, but I dislike much of the syntax (and even some of the concepts) of those academic languages for logical reasons which I have explained in prior comments. I realize their target market is the 0.01 - 0.1% of the population that are academics (and what ever subset of that which are programmers). If you want to bring in most of the syntax and the obtuseness from those languages, then I think we have different understanding of what the mainstream wants.

I am not a verbal thinker. I always score higher on IQ tests that are measuring visual mathematical skills, rather than verbal skills. My I/O engine is weaker than my conceptual thought engine (I think this is why I get fatigued with discussions because my I/O engine can't keep up with my thoughts). My reading comprehension of English is 99th percentile, but my articulation and vocabulary are in the high 80s or low 90s. So apparently I dislike complex linguistic computation. I seem to struggle more with sequencing or the flattening out what I "see" in multi-dimensions into a sequential understanding. My math and conceptual engine is higher (more rare) than 99th percentile, but not genius.

So someone with more highly developed linguistic computation than myself, would probably find my desire for linguistic structure to be arbitrary and unnecessary. I've been working on my weakness, but I do find it takes energy away from my thought engine, which is where I feel more happy and efficient.

Also I note you want to get rid of the type parameter list, you do realise sometimes you need to explicitly give the parameters if they cannot be inferred like:

let x = f<Int>(some_list)

Please differentiate between function declaration and function call.

I had written about that 3 days ago:

It is much less noisy and often these will be inferred at that function call site, so we won't often be doing f<Int,Int,Int>(0, 1) so the explicit correspondence to <A, B, C> [on function declaration] probably isn't needed for aiding understanding.


@keean wrote:

This also makes me realise another ambiguity in our syntax, it is hard to tell the difference between assigning the result of a function and assigning the function itself as a value. In the above you might have to pass to the end of the line to see if there is a '=>' at the end. This requires unlimited backtracking which you said you wanted to avoid.

Please catch up with recent corrections to the syntax.

keean commented 7 years ago

@shelby3 wrote:

I absolutely hate that. First time I saw that, I was totally confused. And I hate Rust's lifetime annotations littering the code with noise.

So I am totally with you on the above. The problem is without introducing the type variables, how do we distinguish between types and variables, for example:

let x(a : A, b : B) : C

Are they single letter types, or type variables?

We often want to re-use type variables like 'A' a lot consider:

let f(x : A) =>
    let g(y : A) =>
        // is 'y' the same type as 'x' ?

The problem with making type variables all uppercase is it does not distinguish type names. Do we insist that all type names have more than one letter?

shelby3 commented 7 years ago

@keean wrote:

Are they single letter types, or type variables?

Type variable per the regular expressions I proposed.

:thought_balloon: I see you are preferring "type variables" to the term "type parameters". I suppose this is to distinguish from function parameter (arguments).

Do we insist that all type names have more than one letter?

Yes. However...

I see now our conflict in preferences. I am thinking type names should be informational; single letter proper names are extremely rare and not self documenting, so I thought it was okay to just not allow them. You are apparently thinking of supporting math notation in code. Which is evident by your data R example and your suggestion to allow ' at end of all names.

Mainstream programmers typically don't (or rarely) want do math notation in code.

In my proposal they can still get math notation with data R' instead of data R.

:bulb:

I think there is another solution which would give you single-letter data, and keep my desire to eliminate the <A, B...> declaration noise. When a data type is intended, then for the first mention of the single-letter, put x: data.R. If the first mention is a product (tuple) or record constructor in code, then data.R(...) or data.R{...}.

:hammer:

And when there is a single-letter data name in conflict with a type parameter in scope, then I think we should have a compiler warning that must be turned off with compiler flag. The warning should tell the programmer to use data. prefix if that is what is intended (which turns off the warning) else use compiler flag or remove the conflict from scope. Or alternatively, we could not allow single-letter data names in scope, unless compiler flag is turned on.

Would that solve the problem for you? I don't think the single-letter data will be used by most or often, so those who need it can pay this slight verbosity and special case cost, so that everyone else can enjoy brevity and simplicity more often.

shelby3 commented 7 years ago

@shelby3 wrote:

Also I note you want to get rid of the type parameter list, you do realise sometimes you need to explicitly give the parameters if they cannot be inferred like:

let x = f<Int>(some_list)

Please differentiate between function declaration and function call.

I had written about that 3 days ago:

It is much less noisy and often these will be inferred at that function call site, so we won't often be doing f<Int,Int,Int>(0, 1) so the explicit correspondence to <A, B, C> [on function declaration] probably isn't needed for aiding understanding.

There is a problem remaining. The order of the type parameters in the optional <...> list on function calls may be ambiguous? I think we can adopt the rule that it is the ~order in which they appear in the function declaration.~ (Edit: I propose instead alphabetical order so that the programmer has more flexibility to order them so that the implicit ones can be first on function call <…> annotations, and this will also defeat some refactoring bugs.)

:warning:

Edit: and that leads to a very obscure and probably very rare programmer error, in that if not all the type variables are specified in the argument list (i.e. some are only in the where clause) and if some change is made to the where clause which doesn't change call site type, but changes the order of the type variables. But all programming languages have some sort of rare obscure programmer errors.

Edit#2: and note it should be quite odd and extremely rare that the programmer wants to constrain at the call site, a type variable that is not in the argument list or result value. Also the following function call is much more informational than f<Int,Int,Int>(0, 1):

:bulb:

f(x:Int, y:Int): Int

And allows us to specify only some constraints:

f(x:Int,y):Int

And it is more consistent with the syntax of function declaration.

Of if we prefer:

(Int)f((Int)x,y)

So maybe we can disallow <A,B...> on functions (declaration and call site) except for specifying type variables which don't appear in the argument list or result? Which should be almost never.

keean commented 7 years ago

The number and function of the type parameters is not the same as the number and type of the arguments, some type parameters may only occur in the where clause. Consider:

f<A>(x : B) where Subtype<B, A>

Note, Rust would not allow this, as you have to introduce all type parameters, which makes them less useful.

Really we have to have the type parameters if we want to have parametric types (that is types that are monomorphisable). If we are happy to give up monomorphisation we can have universally quantified types instead, and then there is no need to have type parameters at all.

In some regards I would prefer universally quantified types from a purely type system perspective, but it is much easier to implement monomorphisation with parametric types.

If you really want to get rid of the type parameters, then lets switch to universal quantification.

keean commented 7 years ago

I would rather say minimum two letters the second of which must be lower case for datatypes, and all caps for type variables.

Also we can use universal quantification to get rid of type parameters (although it does change what types are valid in the type system).

keean commented 7 years ago

I would suggest lexical scoping for type variables, so in my example above the A would be the same for both.

I think this satisfies the principle of least surprise.

shelby3 commented 7 years ago

@keean wrote:

The number and function of the type parameters is not the same as the number and type of the arguments, some type parameters may only occur in the where clause. Consider:

f<A>(x : B) where Subtype<B, A>

Did you not read the comment of mine immediately before yours?

I also explained that exact issue and offered a solution.

keean commented 7 years ago

Here's an interesting one, we need to write the type of a function, and we agree function definition should be an expression.

let x : (Int, Int) : Int = x y => ...

This should be possible too because I will want to pass functions to other functions:

let f(g : (Int, Int) : Int) : Int =>
keean commented 7 years ago

@shelby3 wrote:

Did you not read the comment of mine immediately before yours?

I also explained that exact issue and offered a solution.

type parameters are like function arguments but for types, you cant just have some rule for inferring them from the rest of the declaration.

If you don't want type-parameters for functions, can we move to universal quantification which does not require them.

shelby3 commented 7 years ago

@keean wrote:

If we are happy to give up monomorphisation we can have universally quantified types instead, and then there is no need to have type parameters at all.

No. We long ago realized that we can't do these higher-order features and have global inference. Impossible or at the minimum beyond our available resources, time, and brain power.

In some regards I would prefer universally quantified types from a purely type system perspective, but it is much easier to implement monomorphisation with parametric types.

I wouldn't prefer to have universally quantified types. We discussed this already. I don't want to repeat that discussion. I explained that all public APIs need types. I gave my reasons. No need to repeat here.

If you really want to get rid of the type parameters

I never proposed to get rid of type variables aka type parameters. Why are you introducing a tangent. Please we need to stop wasting time on tangents that were already decided before.

shelby3 commented 7 years ago

@keean wrote:

I would rather say minimum two letters the second of which must be lower case for datatypes, and all caps for type variables.

That was my original proposal.

You said we need single-letter data names, so I offered a solution that retained my proposal for brevity of type variables.

shelby3 commented 7 years ago

@keean wrote:

Here's an interesting one, we need to write the type of a function, and we agree function definition should be an expression.

let x : (Int, Int) : Int = x y => ...

This should be possible too because I will want to pass functions to other functions:

let f(g : (Int, Int) : Int) : Int =>

Yes and afaics there is no grammar conflict. Why do you mention it? My specification : Type includes function types.

shelby3 commented 7 years ago

@keean wrote:

type parameters are like function arguments but for types, you cant just have some rule for inferring them from the rest of the declaration.

Why you say I "can't"? I already explained how to do it.

You seem love inference in other areas of the type system. Why you resistant to inference? It is actually not type inference. It is data scraping based on syntactical structure, i.e. no typing is involved.

keean commented 7 years ago

@shelby3 wrote:

You said we need single-letter data names, so I offered a solution that retained my proposal for brevity of type variables.

I can give up single letter types names in order to get something we both agree on :-)

keean commented 7 years ago

@shelby3 wrote:

I never propose to get rid of type variables aka type parameters. Why are you introducing a tangent. Please we need to stop wasting time.

Type variables and type parameters are different things. Type parameters are like function arguments, and type variables are like function variables. Function arguments are variables, but they are not the same thing as variables. variables can occur outside of function declarations etc.

So you are proposing we get rid of type parameters (we are keeping type variables). This means the type system is no longer parametric (we don't have type parameters any more).

If we don't have type parameters, the alternative is universal quantification. This means all type variables in a function signature are universally quantified. Now we don't have to worry about passing type parameters nor the order of parameters at all.

I don't think you understood this either:

f<A>(x : B) where Subtype<B, A>

It would mean if I call it like this:

f<Int|String>(x)

x must have a type that is a subtype of Int|String. So you can see type parameters supply different information than the argument types of a function.

shelby3 commented 7 years ago

@keean wrote:

So you are proposing we get rid of type parameters

If you continue to accuse me of something I never implied, that is not productive.

keean commented 7 years ago

Let me rephrase, you are proposing to infer the type parameters? How about we infer the function arguments too?

shelby3 commented 7 years ago

Try reading my comments above again and find your mistake in reading comprehension.

So maybe we can disallow <A,B...> on functions (declaration and call site) except for specifying type variables which don't appear in the argument list or result? Which should be almost never.

See the bolded.

keean commented 7 years ago

I already gave an example where your rules don't work:

f<A>(x : B) where Subtype<B, A>
keean commented 7 years ago

@shelby3 wrote:

So maybe we can disallow <A,B...> on functions (declaration and call site) except for specifying type variables which don't appear in the argument list or result? Which should be almost never.

I apologise for getting it wrong above, you were not trying to get rid of type parameters.

However the rule doesn't work in cases where function argument types are not type parameters, however this probably does require universal quantification (which it sounds like neither of us really want).

shelby3 commented 7 years ago

I am just approaching from a different perspective of what the typical programmer thinks who sees all this unnecessary1 declaration most of the time. You are coming from more experience than me in more languages and you see type variables as a generalized concept as a library expert programmer would.

Because from your past examples at the Rust forum, I became aware in May of the utility of type variables that are not in the argument list (although I had some notion of this from Scala's views, etc). But I wasn't aware of the terminology distinction between type parameters and type variables.

In any case, sometimes someone with a different perspective will bring something new. And then the person with other experience will find issues and then together they will arrive at something maybe good.

That is (one reason) why I don't want to try to do it all by myself. You have valuable knowledge and experience that I don't have. But it doesn't mean that I don't know something you don't or that I won't ever have a new idea that works.

Any way, I want now for us to try to move as fast as possible to the final syntax.

I will read up on Parsec to see if I can learn why we should be doing it that way or another way.

  1. Sometimes it is worse to remove all the boilerplate and keywords. The language can become too terse. We will have to look at code samples to decide. I just don't like all that <A,B...> noise on all my methods in an interface declaration, when it isn't needed. Also as I pointed out upthread, the casts on arguments and result value are more informational to the reader at the call site than f<Int,Int,Int>(....
shelby3 commented 7 years ago

@shelby3 wrote:

I don't like _ in type names. For me a type name should read like a proper name, headline or title where each word has its first letter capitalized. We don't put such punctuation in a title normally in English. Simulating spaces with _ is ugly symbol soup. It removes the elegance of a title. It is better to just keep the first-letter capitalization and smash together without the spaces. Instead you prefer to remove the first-letter capitalization and convert spaces to _, which is removing the first-letter capitalization attribute of a title which is the sole attribute that differentiates a proper name from other forms of English. Spaces are not the differentiating attribute of proper names. If you instead proposed to retain first-letter capitalization after each _, you would have a more consistent argument (but I would still argue that the _ is noise symbol soup redundancy since have the camel case to distinguish words).

Another factor is I think the type name should appear to be a cohesive unit, not separated by space, such as class TheEnd versus class The_End. The second one is confusing my eye as to whether End is a separate token which is due to the capitalization of the E dominates the _.

I would also prefer to not allow a capital letter abuts a _ in function names.

keean commented 7 years ago

The underscore let's you separate name parts like this ThisType_Has_ AVeryLongName I generally prefer writing readable literate code. That means a long variable name is preferable to a short variable name with a comment explaining what it is.

shelby3 commented 7 years ago

@keean wrote

That means a long variable name is preferable to a short variable name with a comment explaining what it is.

Disagree.

I prioritize readable code, which to me means explaining what a type or variable is where it is defined, then not repeating that verbiage every where in the code, because IMO verbosity obscures the algorithm.

I agree with using meaningful names when they are short enough and/or not repeated often. But never can I imagine that justifying a proper name long enough to require _. I am okay with _ instead of camel case on reference names, so as to further differentiate them (in addition to the first-letter capitalization) from proper names for types.

keean commented 7 years ago

I disagree because I don't want to hunt through the code to find where xyz is defined to find a comment in some obscure include file. Its really not helpful when a nice name like options_selected will tell me what it does without having to look at other code.

When you work on large projects written by multiple teams in different locations, or as often has happened in my past, coming into a project that is going a bit wrong to fix things, you do not have time to read and understand the whole code-base.

The key property is "local reasoning" I need to be able to look at a function and understand what it does without having to look up anything in other parts of the source code. This means meaningful function and variable names, and type signatures. For example consider:

let x = get_events()

vs

let event_list : List<Events> = get_all_events_as_list()

I know which I would rather see in code I had to work with.

shelby3 commented 7 years ago

I think pattern matching within function arguments is looking to be not worth it, especially given it can't pattern match on the implicit this. Better to not provide more than one way to do pattern matching, given it has very little additional utility and function overloading has less degrees-of-freedom for brevity of expression as I had pointed out we can nest match but not pattern matching function overloads.

@shelby3 wrote:

However, the counter argument is that the permutation of function declarations duplicates more code if for example the case (code branch) of one argument applies to all three traits of another argument. So in that case seems the match format would be more DNRY.

There is one case where I think I prefer to overload function names (in addition to @keean's proposal to support named optional arguments and default arguments).

shelby3 commented 7 years ago

@keean wrote:

The key property is "local reasoning" I need to be able to look at a function and understand what it does without having to look up anything in other parts of the source code.

Of course I agree, because I used the same argument of "local reasoning" to argue that we must use parenthesis for function calls.

I disagree because I don't want to hunt through the code to find where xyz is defined to find a comment in some obscure include file. Its really not helpful when a nice name like options_selected will tell me what it does without having to look at other code.

When you work on large projects written by multiple teams in different locations, or as often has happened in my past, coming into a project that is going a bit wrong to fix things, you do not have time to read and understand the whole code-base.

There is a tradeoff (tension) between longer descriptive names for readers (even re-reading our own code in the future) and cluttering the algorithm with repeating long descriptive names over and over again in the same local area of code. So what we really want is to see the descriptive name (or comment) once locally and not over and over and over and over again. That is why I said I prefer the comment.

But I see your point that the comment isn't carried over to other modules, so for the reader (not the writer), to have to go back to all the referenced modules to lookup the comments is very time consuming and destroys local reasoning because they can't see everything at once.

There is a solution that Scala (and I think Java) offers (and I believe the new import syntax for ES6next/ES7), which is the option to rename the verbose name on import. So then the reader need only look for the local import to see the original name.

So it seems to me the rule we are after here is that exported names can always contain underscores. And we need a renaming on import feature.

Does that satisfy you?

This means meaningful function and variable names, and type signatures.

  • I see no need for capital letters in non-function reference names, regardless if exported or not.
  • I see no need for capital letters combined with underscores in function references, regardless if exported or not. Choose camel case or underscores, don't mix them.
  • I see no need for lowercase letters following an underscore in type names, and this would only apply to exported names, since per the proposal above, then non-exported type names may not have underscores. I would make one exception which is non-exported type names may begin with an underscore, but exported names not, because sometimes we need to use an underscore locally as I did for the implementation of Nil in the TypeScript example code.

The rules are a bit complex, but will lead to the cleanest naming we can enforce. The compiler can explain to the programmer.

keean commented 7 years ago

I think that's a good compromise. I would probably use Nil' rather than _Nil so I would be happy with no underscores in the first character.

I do not really agree with making function references different from non-function references. A function is a value like any other value, so it would seem odd to name it differently.

For example in the polymorphic function:

let id(x) => x

x could be an integer number or a function, so what naming convention to use for x?

shelby3 commented 7 years ago

I would probably use Nil' rather than _Nil so I would be happy with no underscores in the first character.

Hmmm, but isn't trailing ' supposed to be for math notation? Also that becomes Nil$prime in JavaScript emitted (copying PureScript). The JavaScript way is underscores.

I do not really agree with making function references different from non-function references.

The only difference is no camel case (no capital letters) option for non-function references. You said you don't like camel case, so why should you care?

I want to use camel case for my functions and not camel case for my non-functions. You want to make your non-function references indistinguishable (not factoring in context), because you think lowercase first-letter camel case looks like a type name. We both acknowledge that single word names are indistinguishable between function and non-function references.

A function is a value like any other value ... x could be an integer number or a function.

When we know the type is not a function, then we can enforce no capitals. For polymorphic types, I suggest enforcing naming as per non-function references, because we can't do any operations on a polymorphic types unless we have a trait bound and when we have a trait bound then we know it isn't a function type.

keean commented 7 years ago

You can have a trait bound on a function, and instances of traits can be declared for function types, so that's not a way to distinguish. In general the only difference between a function value and a non function value is that you can call a function.