racket / rhombus

Rhombus programming language
Other
350 stars 62 forks source link

Square notation #172

Open michaelballantyne opened 3 years ago

michaelballantyne commented 3 years ago

Square notation

This proposal is an attempt to design a syntax similar to Shrubbery notation (https://github.com/racket/rhombus-brainstorming/pull/122), but where each syntactic element has a single role that can be described and understood separately.

A key syntax in Shrubbery notation has multiple roles: parentheses. Parentheses are used both as list-like syntax for function application, tuples, etc., and to indicate grouping in arithmetic expressions.

The central design decision in this proposal is to use parentheses only to disambiguate grouping and to use square brackets for all list-like structures, such as function argument lists. Thus the name.

Here are some examples in square notation:

define identity [x]: x

define fib [n]:
  cond
   | n == 0: 0
   | n == 1: 1
   | else: fib [n - 1] + fib [n - 2]

define print_sexp [v]:
  match v
   | empty: display ["()"]
   | cons [a, d]:
       if is_list [d]
        | display ["("]
          print_sexp [a]
          for [v = in_list [d]]:
            display [" "]
            print_sexp [v]
          display [")"]
        | display ["("]
          print_sexp [a]
          display [". "]
          print_sexp [d]
          display [")"]
   | v: print_atom [v]

Basic syntax

Items include atoms, groups, sequences, blocks, and alternative groups.

Atoms

This proposal concerns the grouping forms, so ignores the syntax of atoms. Let's assume atoms similar to Shrubberies. Note that it is not necessary to separate operators from other identifiers, however.

Groups

A sequence of whitespace-separated items on a line forms a group. Here's a group with five elements:

1 + 2 + 3

Pairs of ( and ) form a nested group that can span lines. The following is a group with three elements, where the last element is a group with five elements.

1 + (2
     + 3
     + 4)

All elements of a group must begin at the same or greater column as the first element, but the first element may be at an earlier column than any introductory (. This is legal:

1 / (
 2
   * 3
   * 4
 + 5
)

This is not:

1 / (2
 * 3
 * 4
+ 5)

Sequences

Sequences are formed by pairs of [ and ], and items within are separated into groups by ,. Extra or trailing , are disallowed. Line breaks are allowed within groups in a sequence.

This is a sequence with two groups:

[1, 2
    + 3
    + 4]

All lines beginning with elements of a sequence must begin at the same column as the first element, but multiple , separated elements may appear on a line, elements that are groups may span multiple lines, and the first element may be at an earlier column than the introductory [.

This is legal:

list [
 1, 2
     + 3
     + 4, 5,
 6]

Blocks

Blocks start with a : following an item, optionally followed by an indent and newline. These examples have a block containing one group:

define identity [x]: x

define distance [Point [x, y]]:
  sqrt [x^2 + y^2]

Blocks may be nested on the same line:

define add_curried [x]: lambda [y]: x + y

Without use of parentheses to disambiguate, the elements following the last : belong to the innermost block.

Within a block, groups are separated by ; or newline.

when true: display ["true!"]; error []

when true:
  displayln ["true"]
  error []

All lines of a block must begin on at the same column, but there may be multiple ; separated elements on a line. These are legal:

a: b; c
   d; e

long [expression]:
  b; c
  d; e

but these are not:

long [expression]: b; c
  c; d

a: b; c
      d; e

Continuing a group across lines in a block requires parentheses:

define distance [Point [x, y]]:
  val sum = (x^2
             + y^2)
  sqrt [sum]

A block within a group is always the last element of a group. Where it ends depends on the surrounding context: a block nested within an outer block ends at a dedent; a block in a sequence ends at , or ]; a block in a parenthesized group ends at ). Some examples:

define curried_add [x]:
  define inner [y]:
    x + y
  inner

foo [lambda [x]:
       val res = x + 1
       xres,
     lambda [x]:
       val res = x - 1
       res]

((begin:
   display [x]
   x)
  + 1)

Because an item is required before the :, a block is never the first element of a group.

Alternative groups

An alternative group is opened by a | following an item, either on the same line or a new line. Subsequent alternatives in the group are opened with |. The contents of each alternative in an alternative group are parsed as a block.

if is_null x
 | empty
 | val res = append [rest [l1], l2]
   cons [first [l1], res]
if is_null x | empty
             | val res = append [rest [l1], l2]; cons [first [l1], res]
if is_null x | empty | val res = append [rest [l1], l2]; cons [first [l1], res]

Note that because an alternative group must directly follow an item, a : preceeding an alternative group is not allowed.

All lines of an alternative group must begin on the same line, but there may be multiple | separated elements on a line. If an alternative's block contains an un-parenthesized : block, the alternative must appear beginning on its own line.

These are legal:

if test | then | else

cond a | (b: c; d) | (e:
                       f)
       | g:
           h
           i
       | else: j

These are not:

cond | a: b | c: d

cond | a: b | c:
                d
                e

Like a block, an alternative group will always be the last element of a surrounding group and will always be preceeded by at least one other item.

Parsed representation

The parsed representation uses syntax objects with paren-shape annotations. Atoms represent themselves; (item ...) represents a multi-element group; [item ...] represents a sequence, and {item ...} represents a block.

Because groups cannot begin with a block, blocks cannot occur directly nested. Thus {{item ...} ...} is available to unambiguously represent alternative groups. This is probably too cute. Maybe tags as in the Shrubbery proposal are a better idea, but I'll offer this as an alternative.

Parentheses are only used for disambiguation, so extra parentheses are removed at read-time. The parsed representation of single element groups has no parentheses, and that for multiple element groups has a single set of parentheses. So the following all have the same parsed representation:

lambda [x]: 1 + 2
(lambda [x]: 1 + 2)
lambda ([x]): 1 + 2
lambda [(x)]: 1 + 2
lambda [x]: (1 + 2)
lambda [x]: ((1 + 2))

of

(lambda [x] {(1 + 2)})

Here are the parsed representations of the examples from the earlier sections of this document:

{(define identity [x] {x})

 (define fib [n]
   {(cond
     {{(n == 0 {0})}
      {(n == 1 {1})}
      {(else {(fib [(n - 1)] + fib [(n - 2)])})}})})

 (define print_sexp [v out]
   {(match v
      {empty {(display ["()"])}}
      {(cons [a d]
         {(if is_list [d]
           {{(display ["("])
             (print_sexp [a])
             (for [(v = in_list [d])]
               {(display [" "])
                (print_sexp [v])}
             (display [")"])}
            {(display ["("])
             (print_sexp [a])
             (display [". "])
             (print_sexp [d])
             (display [")"])}})}
      {(v {(print_atom [v])})})})
(1 + 2 + 3)
(1 + (2 + 3 + 4))
(1 / (2 * 3 * 4 + 5))
[1 (2 + 3 + 4)]
(list [1, (2 + 3 + 4), 5, 6])
(define identity [x] {x})
(define distance [(Point [x y])]
  {(sqrt [(x^2 + y^2)])}
(define add_curried [x] {(lambda [y] {(x + y)})})
(when true {(display ["true!"]) (error [])})
(when true
  {(displayln ["true"])
   (error [])})
(a {b c d e})
(long [expression]
  {b c d e})
(define distance [(Point [x y])]
  {(val sum = (x ^ 2 + y ^ 2))
   (sqrt [sum])})
(define curried_add [x]
  {(define inner [y]
     {(x + y)})
   inner})
(foo [(lambda [x]
        {(val res = x + 1)
         xres})
      (lambda [x]
        {(val res = x - 1)
         res})])
((begin {(display [x]) x}) + 1)
(if is_null x
  {{empty}
   {(val res = append [rest [l1]] l2])
    (cons [first [l1], res]}})
(if test {{then} {else}}
(cond a 
 {{(b {c d}}
  {(e {f})}
  {(g {h i})}
  {(else {j})}})

More examples

struct Posn [x, y]:
  property prop_equal_and_hash:
    let [hc = lambda [a: posn, hc]:
                hc [a.x] + hc [a.y],
         eql = lambda [a: posn, b: posn, eql]:
                 (eql [a.x, b.x) &&
                  eql [a.y, b.y])]:
      values [eql, hc, hc]
define apply_type_constraint [tc]:
  lambda [u]:
    lambda [st]:
      let* [type_pred = type_constraint_predicate[tc],
            term = walk [u, state_s[st]]]:
        cond
         | type_pred [term]: st
         | is_var [term]:
             let* [c = lookup_c [st],
                   T = c_T [c]]:
               cond
                | is_eq [T, tc] : st
                | not [T]       : set_c [st, term, c_with_T [c, tc]]
                | else          : false

miniKanren

define_relation appendo [l1, l2, l3]:
  fresh [x, y]:
    conde
     | l1 == null; l2 == l3
     | fresh [head, rest, res]:
         l1 == cons [head, rest]
         l3 == cond [head, res]
         appendo [rest, l3, res]

Parsed:

(define_relation appendo [l1 l2 l3]
  {fresh [x y]
    {(conde
       {{(l1 == null) (l2 == l3)}
        {(fresh [head rest res]
          {(l1 == cons [head rest])
           (l3 == cons [head rest])
           (appendo [rest l3 res])})}})}})

OMeta

Adapated from here

ometa BasicCalc <: Parser [
  Digit   = | super@d                  -> d.ToDigit [],
  Number  = | Number@n Digit@d         -> (n * 10 + d)
            | Digit,
  AddExpr = | AddExpr@x "+" MulExpr@y  -> (x + y)
            | AddExpr@x "-" MulExpr@y  -> (x - y)
            | MulExpr,
  MulExpr = | MulExpr@x "*" primExpr@y -> (x * y)
            | MulExpr@x "/" primExpr@y -> (x / y)
            | PrimExpr,
 PrimExpr = | "(" Expr@x ")"           -> x
            | Number,
 Expr =     | AddExpr
]

Parsed:

(ometa BasicCalc <: Parser
  [(Digit    = {{(super @ d                     -> d . ToDigit [])}}
   (Number   = {{(Number @ n Digit @ d          -> (n * 10 + d))}
                {Digit}}
   (AddExpr  = {{(AddExpr @ x "+" MulExpr @ y   -> (x + y))}
                {(AddExpr @ x "-" MulExpr @ y   -> (x - y))}}
   (MulExpr  = {{(MulExpr @ x "*" primExpr @ y  -> (x * y))}
                {(MulExpr @ x "/" primExpr @ y  -> (x / y))}
                {PrimExpr}})
   (PrimExpr = {{("(" Expr @ x ")".             -> x)}
                {Number}}
   (Expr     = {{AddExpr}})])

Tradeoffs

Advantages

cond | null: 0 | list(x): 1
     | cons(a, d): f(a) + f(d)
x | a | b
y

can be written

(x | a | b); y

Both parse as:

{(x {{a} {b}}) y}

Disadvantages:

Implementation

None.

mflatt commented 3 years ago

What if you switched the roles of () and [], so that parentheses are used in most places and [] is for grouping? Or what if {} was used for grouping and () and [] left more like shrubbery notation?

FYI, I changed the parser in the package implementing #163, because I think the pop quiz here pointed to a mismatch between the prose spec and the implementation. The point stands that the result is not obvious.

michaelballantyne commented 3 years ago

What if you switched the roles of () and [], so that parentheses are used in most places and [] is for grouping? Or what if {} was used for grouping and () and [] left more like shrubbery notation?

Assuming a parsed representation that uses tags, neither of these deeply bothers me but I think they run counter to the most common associations of each delimiter.

gus-massa commented 3 years ago

I think that Mathematica use a similar system, where scare brackets are used for function definition and application, and parenthesis are used only in math expressions.

mflatt commented 3 years ago

I'm trying to figure out how essential it is to equate a group with one item and that item. It seems like the sort of representation choice that usually goes wrong in subtle ways. For example, suppose a macro f rewrites f [X, Y] to X + Y. If X in a us turns out to be an identifier that is implemented by a macro, then X + Y ends up meaning that X receives + and Y. Writing the result template as (X) + (Y) doesn't do anything if (X) is the same as X at the reader level. Something like values [X] + values [Y] would mean the right thing, but you wouldn't want values [X] as macro boilterplate. Those problems are resolved if grouping is always reflected in the representation, even for single-item groups, but I'm not sure whether that creates other problems. Is representing a single-item group with a basic goal here, so that extra parentheses can be wrapped around any group whether or not its in an expression position, or is it less important than that overall?

It looks like the area several bugs in the parsing of fib. Here's an attempt to repair:

 (define fib [n]
   {(cond
     {{(n == 0 {0})}
      {(n == 1 {1})}
      {(else {(fib [(n - 1)] + fib [(n - 2)])})}})})
michaelballantyne commented 3 years ago

Thanks for the repair to fib; I've updated the text.

The problem with dropping parens is interesting. The goal of normalizing parens was to avoid situations where macros break due to extra parens that are used to disambiguate. For example, writing cond on one line requires parens, and I'd like to make sure all these parse the same:

cond | (x < 5: #t) | (else: #f)

cond
| (x < 5: #t)
| (else: #f)

cond
| x < 5: #t
| else: #f

That way the cond macro doesn't have to have additional cases for an explicitly-parenthesized alternative body.

The idea of x + y and (x) + y both successfully parsing but with different answers bothers me. This gets back to my other comment on enforestation. I would prefer a more structured system in which a macro never consumes a variable amount of syntax, and where parsing an undelimited region can be accomplished using information about operator arities and argument contexts before macros expand. But I'd need to read your enforestation and Rhombus syntax proposals more deeply to understand how compatible that goal is with the rest of the design.

mflatt commented 3 years ago

Thanks for the clarification.

I can appreciate the goal of allowing parentheses to group without interfering with matching. I expected that kind of interference with shrubbery parentheses, but for what it's worth, it hasn't turned out to be a problem. Implementing parentheses as a specific expression, binding, etc., form has worked well. I also consider it a feature that (fun) (x): x and fun ((x)): x are not equivalent to fun (x): x. The only place where I remember extra parentheses as a problem was in an intermediate ad hoc attempt at propagating static information; the problem was resolved there by an approach that's less ad hoc.

A similar grouping idea might be applied to shrubbery notation by removing the ability of{} to create blocks, but still allow them as extra wrappers to delimit blocks. The old rules about removing redundant blocks had the same goal, but I can see how splitting the role of {} from : and | might make things simpler and better.

michaelballantyne commented 3 years ago

for what it's worth, it hasn't turned out to be a problem

In that case, the parenthesis-dropping idea isn't essential. I'm not really sure I like

lambda ([x]): 1 + 2

parsing the same as

lambda [x]: 1 + 2

anyway.

So if we set that aside, as well as the goal of having orthogonal grouping and sequencing forms and its big tradeoffs, I think there are still two ideas in here I'd like to make sure come across:

  1. There is only one major lexical syntax for a given parsed syntax. I like the idea of removing {} as a way to create blocks in Shrubberies. I could see them functioning either in the way you describe as an optional delimiter for the extent of blocks:
cond | x > 5: { #t } | x < 5: { #f }

or as another grouping / sequencing form like () and [], which could be useful for things like dictionary and set literals.

  1. The notation is described not in terms of individual punctuation syntaxes, but in terms of bigger concepts. I think the Shrubbery proposal could adopt this idea. Right now the headings of the Shrubberies proposal are:

I think a syntax very similar to Shrubberies (especially if you drop {} blocks) could be described with this outline:

If you keep {} blocks, they would be described as a second syntax in the blocks section rather than the sequences section.

I would find that description much easier to follow.

mflatt commented 3 years ago

I like the idea of completely separating {} from block notation. The connection between : and {} was the main idea in tonyg's racket-something, which was a starting point of shrubbery notation. But maybe because of other parts of shrubbery notation, {} hasn't been especially useful for blocks in the Rhombus experiment prototype.

The main use of {} has been for macros that consume or return definition sequences, such as implementations of binding forms. Patterns and templates for those macros used to be written as ?{ .... }, sometimes in the middle of a group. It works to change each macro protocol to use a layer of parentheses, so block patterns and templates are written as ?(: .... ). That's not as nice as ?{ .... }, but it's close, and since contexts that manipulate blocks are relatively rare, it seems like a fine tradeoff. My experiment so far haven't imposed the rule that a block must have something before it, otherwise more would be needed between ( and :.

I tried going entirely without a delimiting form for shrubbery blocks, but then there's no way to render some shrubberies that are naturally constructed through templates. For example, if t | x | y might be spliced in place of EXPR in if a | EXPR | b. (A template splice would produce the correct shrubbery structure; it's just a question of representing that structure as text.) Delimiting is rarely needed, though, so my current experiment uses « ... » for delimiting blocks, leaving {} potentially for sets and maps. A delimiting « ... » pair is allowed for the three block-creating positions: just after :, just after |, and around a sequence of |s:

x: y:« a; b »; c
if t | «if f | a | b» | y
x: if t «| a | b»; y
michaelballantyne commented 3 years ago

Hmm. Now that I better understand the situations where {} is useful I feel more ambivalent. Thanks for the explanation, in any case!

michaelballantyne commented 3 years ago

These do seem like situations where Square's idea of having extra parentheses be insignificant works well, however. In Square I think the first two examples would be:

x: (y: a; b); c
if t | (if f | a | b) | y

I can see how in expansion it would be useful to manipulate blocks and alts separately from their surrounding group. Perhaps Square should make (: a; b) parse as {a b} and (| a | b) parse as {{a} {b}}. Then a ?(: ... ) pattern would make sense without needing extra parens in the protocol.