Open michaelballantyne opened 3 years ago
What if you switched the roles of ()
and []
, so that parentheses are used in most places and []
is for grouping? Or what if {}
was used for grouping and ()
and []
left more like shrubbery notation?
FYI, I changed the parser in the package implementing #163, because I think the pop quiz here pointed to a mismatch between the prose spec and the implementation. The point stands that the result is not obvious.
What if you switched the roles of () and [], so that parentheses are used in most places and [] is for grouping? Or what if {} was used for grouping and () and [] left more like shrubbery notation?
Assuming a parsed representation that uses tags, neither of these deeply bothers me but I think they run counter to the most common associations of each delimiter.
I think that Mathematica use a similar system, where scare brackets are used for function definition and application, and parenthesis are used only in math expressions.
I'm trying to figure out how essential it is to equate a group with one item and that item. It seems like the sort of representation choice that usually goes wrong in subtle ways. For example, suppose a macro f
rewrites f [X, Y]
to X + Y
. If X
in a us turns out to be an identifier that is implemented by a macro, then X + Y
ends up meaning that X
receives +
and Y
. Writing the result template as (X) + (Y)
doesn't do anything if (X)
is the same as X
at the reader level. Something like values [X] + values [Y]
would mean the right thing, but you wouldn't want values [X]
as macro boilterplate. Those problems are resolved if grouping is always reflected in the representation, even for single-item groups, but I'm not sure whether that creates other problems. Is representing a single-item group with a basic goal here, so that extra parentheses can be wrapped around any group whether or not its in an expression position, or is it less important than that overall?
It looks like the area several bugs in the parsing of fib
. Here's an attempt to repair:
(define fib [n]
{(cond
{{(n == 0 {0})}
{(n == 1 {1})}
{(else {(fib [(n - 1)] + fib [(n - 2)])})}})})
Thanks for the repair to fib
; I've updated the text.
The problem with dropping parens is interesting. The goal of normalizing parens was to avoid situations where macros break due to extra parens that are used to disambiguate. For example, writing cond
on one line requires parens, and I'd like to make sure all these parse the same:
cond | (x < 5: #t) | (else: #f)
cond
| (x < 5: #t)
| (else: #f)
cond
| x < 5: #t
| else: #f
That way the cond
macro doesn't have to have additional cases for an explicitly-parenthesized alternative body.
The idea of x + y
and (x) + y
both successfully parsing but with different answers bothers me. This gets back to my other comment on enforestation. I would prefer a more structured system in which a macro never consumes a variable amount of syntax, and where parsing an undelimited region can be accomplished using information about operator arities and argument contexts before macros expand. But I'd need to read your enforestation and Rhombus syntax proposals more deeply to understand how compatible that goal is with the rest of the design.
Thanks for the clarification.
I can appreciate the goal of allowing parentheses to group without interfering with matching. I expected that kind of interference with shrubbery parentheses, but for what it's worth, it hasn't turned out to be a problem. Implementing parentheses as a specific expression, binding, etc., form has worked well. I also consider it a feature that (fun) (x): x
and fun ((x)): x
are not equivalent to fun (x): x
. The only place where I remember extra parentheses as a problem was in an intermediate ad hoc attempt at propagating static information; the problem was resolved there by an approach that's less ad hoc.
A similar grouping idea might be applied to shrubbery notation by removing the ability of{}
to create blocks, but still allow them as extra wrappers to delimit blocks. The old rules about removing redundant blocks had the same goal, but I can see how splitting the role of {}
from :
and |
might make things simpler and better.
for what it's worth, it hasn't turned out to be a problem
In that case, the parenthesis-dropping idea isn't essential. I'm not really sure I like
lambda ([x]): 1 + 2
parsing the same as
lambda [x]: 1 + 2
anyway.
So if we set that aside, as well as the goal of having orthogonal grouping and sequencing forms and its big tradeoffs, I think there are still two ideas in here I'd like to make sure come across:
{}
as a way to create blocks in Shrubberies. I could see them functioning either in the way you describe as an optional delimiter for the extent of blocks:cond | x > 5: { #t } | x < 5: { #f }
or as another grouping / sequencing form like ()
and []
, which could be useful for things like dictionary and set literals.
;
and ,
:
|
\
I think a syntax very similar to Shrubberies (especially if you drop {}
blocks) could be described with this outline:
:
introduces a block.;
or newlines separate groups in blocks\
continues a group across newlines|
introduces alternatives.If you keep {}
blocks, they would be described as a second syntax in the blocks section rather than the sequences section.
I would find that description much easier to follow.
I like the idea of completely separating {}
from block notation. The connection between :
and {}
was the main idea in tonyg's racket-something, which was a starting point of shrubbery notation. But maybe because of other parts of shrubbery notation, {}
hasn't been especially useful for blocks in the Rhombus experiment prototype.
The main use of {}
has been for macros that consume or return definition sequences, such as implementations of binding forms. Patterns and templates for those macros used to be written as ?{ .... }
, sometimes in the middle of a group. It works to change each macro protocol to use a layer of parentheses, so block patterns and templates are written as ?(: .... )
. That's not as nice as ?{ .... }
, but it's close, and since contexts that manipulate blocks are relatively rare, it seems like a fine tradeoff. My experiment so far haven't imposed the rule that a block must have something before it, otherwise more would be needed between (
and :
.
I tried going entirely without a delimiting form for shrubbery blocks, but then there's no way to render some shrubberies that are naturally constructed through templates. For example, if t | x | y
might be spliced in place of EXPR
in if a | EXPR | b
. (A template splice would produce the correct shrubbery structure; it's just a question of representing that structure as text.) Delimiting is rarely needed, though, so my current experiment uses « ... »
for delimiting blocks, leaving {}
potentially for sets and maps. A delimiting « ... »
pair is allowed for the three block-creating positions: just after :
, just after |
, and around a sequence of |
s:
x: y:« a; b »; c
if t | «if f | a | b» | y
x: if t «| a | b»; y
Hmm. Now that I better understand the situations where {}
is useful I feel more ambivalent. Thanks for the explanation, in any case!
These do seem like situations where Square's idea of having extra parentheses be insignificant works well, however. In Square I think the first two examples would be:
x: (y: a; b); c
if t | (if f | a | b) | y
I can see how in expansion it would be useful to manipulate blocks and alts separately from their surrounding group. Perhaps Square should make (: a; b)
parse as {a b}
and (| a | b)
parse as {{a} {b}}
. Then a ?(: ... )
pattern would make sense without needing extra parens in the protocol.
Square notation
This proposal is an attempt to design a syntax similar to Shrubbery notation (https://github.com/racket/rhombus-brainstorming/pull/122), but where each syntactic element has a single role that can be described and understood separately.
A key syntax in Shrubbery notation has multiple roles: parentheses. Parentheses are used both as list-like syntax for function application, tuples, etc., and to indicate grouping in arithmetic expressions.
The central design decision in this proposal is to use parentheses only to disambiguate grouping and to use square brackets for all list-like structures, such as function argument lists. Thus the name.
Here are some examples in square notation:
Basic syntax
Items include atoms, groups, sequences, blocks, and alternative groups.
Atoms
This proposal concerns the grouping forms, so ignores the syntax of atoms. Let's assume atoms similar to Shrubberies. Note that it is not necessary to separate operators from other identifiers, however.
Groups
A sequence of whitespace-separated items on a line forms a group. Here's a group with five elements:
Pairs of
(
and)
form a nested group that can span lines. The following is a group with three elements, where the last element is a group with five elements.All elements of a group must begin at the same or greater column as the first element, but the first element may be at an earlier column than any introductory
(
. This is legal:This is not:
Sequences
Sequences are formed by pairs of
[
and]
, and items within are separated into groups by,
. Extra or trailing,
are disallowed. Line breaks are allowed within groups in a sequence.This is a sequence with two groups:
All lines beginning with elements of a sequence must begin at the same column as the first element, but multiple
,
separated elements may appear on a line, elements that are groups may span multiple lines, and the first element may be at an earlier column than the introductory[
.This is legal:
Blocks
Blocks start with a
:
following an item, optionally followed by an indent and newline. These examples have a block containing one group:Blocks may be nested on the same line:
Without use of parentheses to disambiguate, the elements following the last
:
belong to the innermost block.Within a block, groups are separated by
;
or newline.All lines of a block must begin on at the same column, but there may be multiple
;
separated elements on a line. These are legal:but these are not:
Continuing a group across lines in a block requires parentheses:
A block within a group is always the last element of a group. Where it ends depends on the surrounding context: a block nested within an outer block ends at a dedent; a block in a sequence ends at
,
or]
; a block in a parenthesized group ends at)
. Some examples:Because an item is required before the
:
, a block is never the first element of a group.Alternative groups
An alternative group is opened by a
|
following an item, either on the same line or a new line. Subsequent alternatives in the group are opened with|
. The contents of each alternative in an alternative group are parsed as a block.Note that because an alternative group must directly follow an item, a
:
preceeding an alternative group is not allowed.All lines of an alternative group must begin on the same line, but there may be multiple
|
separated elements on a line. If an alternative's block contains an un-parenthesized:
block, the alternative must appear beginning on its own line.These are legal:
These are not:
Like a block, an alternative group will always be the last element of a surrounding group and will always be preceeded by at least one other item.
Parsed representation
The parsed representation uses syntax objects with paren-shape annotations. Atoms represent themselves;
(item ...)
represents a multi-element group;[item ...]
represents a sequence, and{item ...}
represents a block.Because groups cannot begin with a block, blocks cannot occur directly nested. Thus
{{item ...} ...}
is available to unambiguously represent alternative groups. This is probably too cute. Maybe tags as in the Shrubbery proposal are a better idea, but I'll offer this as an alternative.Parentheses are only used for disambiguation, so extra parentheses are removed at read-time. The parsed representation of single element groups has no parentheses, and that for multiple element groups has a single set of parentheses. So the following all have the same parsed representation:
of
Here are the parsed representations of the examples from the earlier sections of this document:
More examples
miniKanren
Parsed:
OMeta
Adapated from here
Parsed:
Tradeoffs
Advantages
Because the syntaxes are more orthogonal, I think Square notation can be described more simply than Shrubberies. Though of course this current document is much less complete than the Shrubbery proposal.
Square notation is potentially easy to represent, print, and match with S-expressions without an extra layer of tagging. Unless people find the
{{item ...} ...}
representation is too awkward for alternative groups.Conceptually separate syntaxes for groups, sequences, blocks, and alternatives allow separate line-continuation rules for each context that avoid potential confusion. Pop quiz: how does this Shrubbery parse?
can be written
Both parse as:
Lack of
{}
means there is only one syntax for blocks, rather than two.Allowing line breaks in lists and parenthesized groups obviates
\
.Disadvantages:
Use of square brackets for function calls is less conventional, and violates one of @mflatt's original desiderata for Rhombus.
There is no special syntax for common data literals, such as lists, hashes, or sets, as there is only one sequence form:
[item, ...]
@samth's desire for convenient use of
|>
remains unsupported.And probably many I haven't noticed.
Implementation
None.