The Syntax Bikeshedding Dojo, round 4: Basics

yannham commented 4 years ago

@garbas asked me if the function syntax was final. Since nothing is final yet, I think it is a good idea to create a dedicated issue for the basic constructs of the language, even if most seem non controversial, such that people can voice concerns or make alternative proposals.

Functions
Records
Comments
Others

Functions

Definition

Currently, Nickel uses an ML-like anonymous function syntax fun arg1 .. argn => body. There are other possibilities:

Nix-like: arg1: arg2: ..: body, but : will be used for typing, so it's probably a bad idea (see section below).
C-like: fun(arg1,..,argn) ... ReasonML does this for example, although they also adopt the corresponding call syntax f(arg1,..,argn) which looks less natural in languages with currying. This can be confused with tuples, but Nickel does not have tuples currently.
C-like body: .. { body }. Independently from the syntax of arguments, the body could be written as a block.
variation: use a different keyword than fun: fn, def, etc.

We could also define a non-anonymous function in a more concise way. For example, in OCaml, you can do:

let f arg1 .. argn = body in
//sugar for
let f = fun arg1 ... argn => body in

Typing

It has not been discussed, but a nice thing to have is a syntax to define the arguments, their type and the return type directly. In ML-derived languages, you can usually write:

let f (x: Num) (y: Num) (z: Num) : Num = body
//instead of
let f : Num -> Num -> Num -> Num = fun x y z => ...

As in the previous section, there are many variations and possible choices:

Rust-y return type syntax: let f (x: Num) (y: Num) (z: Num) -> Num = body
and others depending on the choice of the function definition syntax

One Nickel-specific issue would be to have a similar syntax for function contracts. For example, we can replace : Num with | type Num or | contract Num (see #183):

let f (x | type Num) (y | type Num) (z | type Num) | type Num = body

to indicate that pre and post-conditions must be checked at run-time, but that we don't wish to statically typecheck the body.

Note that there's the issue #81 for destructuring, which would also be available directly on function arguments.

Records and Lists

Nickel uses Nix-like record definition:

{
  field = content;
  ..
  field = content;
}

We should stay close to JSON syntax. Pure JSON syntax using : would clash with typing annotations, and the field = content syntax is pretty standard among languages that don't use : (C++, OCaml, Haskell, Dhall, etc.). The delimiter ; could be changed for , or something else, but note that , is already used for lists and more (which is not blocking, but can become confusing).

Lists use exactly the JSON syntax:

[1, 2, "3", false]

Comments

There are currently no comments. I propose the ubiquitous C-like line and block comments:

// line comment
/* 
 Block
 Comment
*/

One could argue that long documentation belongs to meta-values, and that we may do fine without block comments, eliminating the issue of nested comments (how to parse /* /* /* foo */ */ */ ?) at the same time.

Others

I've left let in declaration out, as it will have their own round of bike-shedding.

garbas commented 3 years ago

I would add my thoughts

Typing

I do like that Nix does not require to use any special word to start writing a function. Using fun feels weird if you are coming from Nix. I understand we can not use the same syntax as it is with Nix, but using an Haskell way of defining functions seems much nicer.

I kinda dislike having types embedded with the function definition (I really hate that Rust did it this way). Specifying types inside the function definition makes it much harder to think in terms of types and how they relate. Even now I first write types even if in a comment before writing any Nix code.

I guess I'm trying to propose the Haskell or Elm like syntax here.

f : Num -> Num
f a b = a * b

Above only covers named functions. There should be also a syntax for anonymous functions. In this case I would allow inline type definitions. The important use case to consider is a file level anonymous functions that are quite popular in Nix.

\{ a : Num
 , b : Num
 } = mkDerivation {
   name = "hello"
 }

Records

It would also use the same delimiter as in Records and Lists. This deviates from how Nix, but it brings the syntax closer to be JSON-like which I think is going to be more familiar.

edolstra commented 3 years ago

We should stay close to JSON syntax.

Yeah, it occurred to me today that it would be nice if Nickel is a strict superset of JSON, so piping a JSON file into Nickel yields the same (or canonicalized) JSON as output. So this would mean changing = to :, ; to , and supporting double-quotes around field names (which would be useful to have in any case).

Pure JSON syntax using : would clash with typing annotations,

Maybe we can use :: (like Haskell)?

mboes commented 3 years ago

Yeah, it occurred to me today that it would be nice if Nickel is a strict superset of JSON

That would be nice, though JSON having chosen to use : is eating up prime syntax estate. It would also be nice to retain similar syntax in let bindings and records (hence use =). Also : preceded by :: for the type wouldn't look too great. So maybe what we can do, purely for BC with legacy JSON, is to allow the : in lieu of = but only when the record field name is double quoted? We could furthermore say that a type annotation is not allowed in that case (JSON never have those anyways).

When it comes to , though, I think we should use that uniformly everywhere (just like JSON). Lists use that too but it doesn't cause problems in other languages as far as I'm aware.

yannham commented 3 years ago

Indeed, being JSON compatible is feasible although it means having a slightly less nice and consistent syntax, because of the reasons pointed out by @mboes. By the way, if the goal is just to be able to import JSON directly in Nickel, I think it can be reasonably easy to support let x = import "foo.json" using the deserialization capabilities of serde without having to make the syntax JSON compatible.

mboes commented 3 years ago

Come to think of it - you're right, dealing with JSON is easy via an import. We could go a step further though and have the Nickel interpreter natively understand multiple syntaxes. TOML would be one. JSON would be another. No particular reason to give JSON special status. The interpreter would disambiguate based on filename, and if that's not available, we could always have an -fmt json flag to the interpreter.

n87 commented 3 years ago

Is there a particular reason behind // for line comments? In configuration languages like YAML or TOML # seems more prevalent. Nix also uses # for comments and // for record updates. If you don't aim for C-like syntax, I'd go with # single-line comments, and I'd probably not bother with multiline at all.

Removing fun keyword would also be nice if you can make parser understand this:

let plus = x y => x + y

To make it work, => should get higher precedence than function application I guess, and I'm not sure if it's always sensible.

Alternatively, if you have currying (sorry I didn't see it in the docs) this is still better than fun I think:

let plus = x => y => x + y

Profpatsch commented 3 years ago

I'd go with # single-line comments, and I'd probably not bother with multiline at all

seconded

(off-hand comment @n87 if you use the edit functionality to add content to a comment, they changes will not show up in the notification emails that github sends, so people might miss part of your comment)

yannham commented 3 years ago

Is there a particular reason behind // for line comments? In configuration languages like YAML or TOML # seems more prevalent. Nix also uses # for comments and // for record updates. If you don't aim for C-like syntax, I'd go with # single-line comments, and I'd probably not bother with multiline at all.

I agree about not bothering about multiline comments. Currently, # is already used in several constructions: to denote custom contracts as in let val | #SomeContract = exp in exp (the pipe syntax | Foo is not yet pushed on master but is probably the future syntax of contracts application/metavalue, see #186). It's also used for string interpolation (see #224), to avoid clashing with the common interpolation syntax ${ of bash and co. While both are not set in stone yet, changing comments to # implies to find a new syntax for these, so if it's just a matter of taste, I don't know if it is worth it. Although there is the question of the overriding record update operator (which may not be needed if we chose to go with merging + priority, cf the discussion of #240).

Profpatsch commented 3 years ago

How about using % for interpolation, that is something I’ve seen in other template languages.

I feel like // is an extremely “dusty” comment syntax, plus it’s two key strokes for something that shouldn’t have any friction.

It makes sense to stay with the other configuration languages, otherwise people will be weirded out. (python/starlark, TOML, YAML)

On Mon, Dec 14, 2020 at 12:01 PM Yann Hamdaoui notifications@github.com wrote:

Is there a particular reason behind // for line comments? In configuration languages like YAML or TOML # seems more prevalent. Nix also uses # for comments and // for record updates. If you don't aim for C-like syntax, I'd go with # single-line comments, and I'd probably not bother with multiline at all.

I agree about not bothering about multiline comments. Currently, # is already used in several constructions: to denote custom contracts as in let val | #SomeContract = exp in exp (the pipe syntax | Foo is not yet pushed on master but is probably the future syntax of contracts application/metavalue, see #186 https://github.com/tweag/nickel/issues/186). It's also used for string interpolation (see #224 https://github.com/tweag/nickel/issues/224), to avoid clashing with the common interpolation syntax ${ of bash and co. While both are not set in stone yet, changing comments to # implies to find a new syntax for these, so if it's just a matter of taste, I don't know if it is worth it. Although there is the question of the overriding record update operator (which may not be needed if we chose to go with merging + priority, cf the discussion of #240 https://github.com/tweag/nickel/issues/240).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tweag/nickel/issues/207#issuecomment-744362629, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYB5ZSG4DHBQ5BSUAY2H2TSUXWBLANCNFSM4TU3L63Q .

mboes commented 3 years ago

I do think it's worth following the precedent set by Nix here, which matches the syntax used by Bash (a common object lanugage), by Python, by Ruby and a number of other scripting languages, but also existing configuration languages like YAML, TOML and HCL (which supports all of #, // and /* ... */). The value in doing so is high enough, I think, to justify looking at what alternative syntax we can use for #SomeContract. FWIW, we originally stole the interpolation syntax from Ruby, which also uses # for comments. Since the interpolation happens inside a string, that won't clash with comments.

mboes commented 3 years ago

And I also agree with multi-line not being worth it. I wouldn't emulate HCL's support of all three syntaxes: I think supporting # only is preferrable.

ggPeti commented 3 years ago

I'd like to reiterate one thing:

fun is not fun.

It looks like an identifier. It just makes the code harder to map optically, it makes the text-to-symbol ratio high, and it looks especially terrible for inline lambdas. And best of all, it's already redundant. The => is clear on its own.

Either.mapBoth (x => onething) (x => otherthing) myval

It doesn't have to be =>, it can be any common symbol.

I'd also like to suggest that binding a name to a function - or any value - should stay a separate issue from defining a function.

milahu commented 2 years ago

@garbas said

we can not use the same syntax as it is with Nix

why not? : >

or: can nickel replace nix? → general purpose language to declare configs, packages, systems

one thing i miss in nix is "typed strings", to allow text editors to parse embedded languages, for example

mkDerivation {
  builder = sh''
    gcc $src
  '';
  src = c''
    int main() {
      printf("yay\n");
    }
  '';
}

garbas commented 2 years ago

@garbas said

we can not use the same syntax as it is with Nix

why not? : >

: operator already being used for static types. Using it for function definition (like Nix) wouldn't make sense than.

or: can nickel replace nix? → general purpose language to declare configs, packages, systems

one thing i miss in nix is "typed strings", to allow text editors to parse embedded languages, for example

Do you maybe have any resource on how editors detect embedded languages?

milahu commented 2 years ago

how editors detect embedded languages

the language must be declared explicitly like sh''echo hello''

: operator already being used for static types

smells like typescript : /

can we limit the type system to function interfaces? so that interface declarations can be pseudo wrapper functions

{
  f =
    type (string: int)
    inputString: (builtins.stringLength inputString)
  ;
}

silverraven691 commented 2 years ago

In regards to the function syntax, I would like for people without functional programming background to be considered, args => body may be obvious to you, but will it be obvious to someone who's only ever dabbled in Go or Python? I would expect Nickel to end up in the editors of SREs, I don't think all have keen interests in FP.

I would like to add that I personally don't see strong points in favor of removing the fun keyword, "make it like Haskell" certainly does not seem like one.

ammkrn commented 2 years ago

FWIW, I was "forced" to start using fun args => body recently by Lean 4, and after a week or two I actually ended up liking it. Speaking somewhat to silverraven691's point, you kind of have to know what a lambda is (and how it's written) to begin to understand the intuition behind \x => ... I would vote to keep fun args => body.

Also, I personally like the ML typing convention, and for the record thing another option would be := for assignment e.g. { field := val, .. }.

yannham commented 2 years ago

Update after the standardization meeting. Include an almost verbatim reproduction of https://github.com/tweag/nickel/issues/494#issuecomment-1002970371 to avoid losing related discussion.

We went the direction of adding ML-like function definition for named functions, in the form of let f x y z = x + y + z, keeping fun for anonymous functions, and to extend definition by pieces (which is already supported for other meta attributes, see #84 ) to type signature (#496), for consistency and to work better with the above syntax. In OCaml for example, it's not totally obvious how to write inline annotations for such functions: should it be f (x : Num) (y : Num) : Num = ... or f x y z : Num -> Num -> Num? What's more, in Nickel type annotations trigger typechecking. So a partial annotation in ML like let f (x: int) y = ... wouldn't have an obvious semantics in Nickel with respect to what needs to be typechecked. On the other hand, putting all the annotation at the end is verbose. Instead, with piecewise signature, we can just decouple type annotation and function argument declaration:

{
  map : forall a b. (a -> b) -> List a -> List b,
  map f list = ...
}

We also discussed the option of removing fun, but I recall attendees didn't like it for a bunch of reasons. IIRC the general feeling was that it's just three characters that make it explicit from the beginning that we are parsing a function definition and not something else, both for the user and the parser. Otherwise, in exp1 ... expn => body, you need to go all the way to the end to decide if what you parsed before is actually valid (until expn it can be both an application and a function definition, but the two allow different things to be there). Related discussion on Ocaml forum.

Nix destructuring looks like it faces the same problem (when you parse {stuff}:, you don't know until the end of stuff that you are defining a function or a record value). Some tests seem to indicate that the syntax for patterns and values are distinct, so it can decides early. In Nickel I don't think we can hope for this, as we have renaming let {foo=foo_} = bar in, ellipsis for values (used for open record contracts), we have inline contract annotations in patterns let {foo | Num} = bar in ..., and so on. So the set of valid expressions for application and function definitions have a non-empty intersection but are not equal.

It is still probably doable by parsing something that looks like an application in a syntax that is a superset of the two, and decide validity once we've reached the end, but it may be painful (without even considering the specific parsing lib we currently use). Some languages choose this path, like ReScript, which dropped the fun keyword. They use parentheses as in (x,y) => x + y but I imagine they still have to disambiguate tuples from functions definitions/destructuring.

We actually discussed doing something like ReScript too (we don't have tuple, so it would not be ambiguous) but that looks odd with respect to ML style application. ReScript just did this to be more JavaScript-like, and also changed the function application in consequence. We also discussed using something shorter like Haskell \, but a lambda sounded more idiosyncratic for people working on configuration than a keyword relating to the word "function", and we decided to go with ML-like let-function -definition syntax.

milahu commented 2 years ago

the "function pipe" operator |> is too long. both nix and nickel are based on bash, so why not use | as pipe? (edit: "based on bash"? naah. rather: based on haskell) maybe this could be merged with the "contract pipe" operator? so contracts would be functions or use $ as "contract pipe" operator?

arithmetic operators are probably needed less often (nickel is not a calculator) (frequency should drive the encoding → frequent tokens should be short) so we could require a (( ... )) context for arithmetic operators, like in bash. or, completely remove the infix notation for arithmetic operators 1 + 2 == 3, and only support add 1 2 == 3 benefit: more characters remain for other operators. for example, we could use + for string-concat (++), array-concat (@), record-concat (&) (but then we could also overload the infix + for numbers ...) (i guess overloaded operators are harder to parse, so this is avoided in nickel?)

yannham commented 2 years ago

the "function pipe" operator |> is too long. both nix and nickel are based on bash, so why not use | as pipe? maybe this could be merged with the "contract pipe" operator?

Contract application is unfortunately not the same as standard application, although it bears some similarities:

it accepts records as an applicant (and, in the future, we can imagine also literals like 2 being usable as a contract to say equals to 2)
the interpreter adds a label parameter, so it's a bit like application with implicit parameters
the argument of contracts are handled specially, for error reporting purpose.

|> is used in Elm and in OCaml. It has the advantage of indicating the direction (we can also add <| for the reverse version).

(i guess overloaded operators are harder to parse, so this is avoided in nickel?)

The problem with overloaded operators is mainly typing (the parser doesn't care about the semantics of things, and can already parse [] + []: only the interpreter will yell at you later). Because Nickel is gradually typed, basic operators must have (well, should have, in an ideal world) a type that can be expressed in the type system. The type system doesn't support overloading (this is arguably complex and not warranted in a config language, at least at this point), so we wouldn't know which type to give to + beside something like Dyn -> Dyn -> Dyn, which makes it a pain to use in typed code.

I honestly don't know if arithmetic will end up being necessarily less used than, say, string concat (especially with string interpolation being an option) in practice. I don't expect number crunching Nickel, but simple index calculations or integer conditions still happen. We may look at the existing Nix codebases to have an idea, maybe. Even if it is, we also have to consider pragmatically the familiarity of some symbols for developers. Bash syntax is... well... not the most intuitive I know.

In general, while

frequency should drive the encoding → frequent tokens should be short

makes sense, IMHO it's only one criterion among others. The fact that it is consistent with the remaining syntax, or that one lexeme is being used in the vast majority of other languages, making it familiar, are also important considerations.

yannham commented 2 years ago

I am also closing this issue. The idea was to discuss the basic syntax, which has been debated and implemented as of the release 0.1.0. This is not a way of shutting down the discussion, but rather to keep issues well-scoped and avoid lingering ones. Although we'll consider very carefully non-backward compatible changes to the syntax, it's still possible, so feel free to open well-scoped and focused issues if needed.

milahu commented 2 years ago

|> is used in Elm and in OCaml. It has the advantage of indicating the direction (we can also add <| for the reverse version).

so lets use > as forward pipe operator, similar to >> in F#


-- get maximum depth of braces
-- based on https://github.com/codereport/LeetCode/blob/master/0210_Problem_1.hs
-- via https://www.youtube.com/watch?v=zrOIQEN3Wkk

maxDepth :: String -> Int
maxDepth x =
  let
    -- > = forward pipe operator
    --(>) x f = f x
    x > f = f x
  in
  x
  > filter ((flip elem) "()")
  > map (\ c -> if c == '(' then 1 else -1)
  > scanl1 (+)
  > maximum

-- maxDepth "((x))" == 2

the less-often used arithmetic operators could be hidden in a math scope

math.gt  -- >
math.gte -- >=
math.lt  -- <
math lte -- <=

tweag / nickel