melt-umn / silver

An attribute grammar-based programming language for composable language extensions
http://melt.cs.umn.edu/silver/
GNU Lesser General Public License v3.0
57 stars 7 forks source link

Regularizing the let syntax. And its friends. #243

Open tedinski opened 6 years ago

tedinski commented 6 years ago

Some of this syntax was never really thought through carefully, we just kinda went with something and now it's stuck. I'd like to try to fix that.

Currently, our let syntax looks like this:

let x :: Integer = 2, y  :: Integer = 3 in x + y end

I think there are two things about this that are not ideal:

  1. We'd like to get rid of end, both for let and for case.
  2. This syntax for a list of let declarations is not regular with anything else in the Silver language.

In another issue, I looked up the Haskell let syntax to try to draw inspiration from:

let x = 2; y = 3 in x + y
let x = 2; y = 3; in x + y
let { x = 2; y = 3 } in x + y
let { x = 2; y = 3; } in x + y
let { x :: Integer; x = 2; y :: Integer; y = 3 } in x + y  (adding in some types)

However... I'm not sure if this is actually a good choice. We have an aesthetic choice to make: should we use semicolons inside expressions, or not?

I expect the current let syntax was chosen with this in mind, though I don't think I was the one to choose it, so who knows.

Right now we break this semicolons rule in a couple other places though:

We could also change decorate. And with clauses in forwarding should probably just be deprecated entirely. (Discouraged from use because of non-interference, but also in preference to just the forward.inh = ...; equation where used anyway.)

I lean towards thinking of these syntactic constructs as being key-value maps, and as a result, I think we should go with this non-semicolon syntax for both let and decorate:

let x :: Type = val, y :: Type = val in expr
let { x :: Type = val, y :: Type = val, } in expr
decorate expr with { x = val, y = val }

i.e. braces optional for let only, use commas, permit optional trailing commas.

This would then also mesh well with a record type, if we ever introduce such a thing. I think such a type should have literal syntax like { key = value, key = value }. And it meshes better with the annotation syntax we already have which can look like prod(key = value, key = value)

Obviously, any changes we make here will have to still accept the current syntax, but I think that wouldn't be an issue. Any comments on this opinion?

(Aside: The other let issue: We have #38 on making let recursive. We should allow let expressions that have explicit type annotations to be fully recursive, but we should also consider allowing let expressions to NOT have explicit type annotations, at the cost of not allowing such variables to be defined recursively. I think this would be surprisingly not that hard to implement, actually. It would allow very simple short code like let x = 1, y = 2 in x + y to be valid. That'd be nice.)

The downsides to going with this preferred syntax, as I see them, are:

  1. It could maybe be considered inconsistent that the equations we give for a production use semicolons but the "inherited equations" we give in decorate use commas. But these are already different enough that I'm not sure I buy that objection. Only decorate uses key = value, normal equations are var.attr = value.
  2. It's maybe a little bit less syntactically obvious when a single variable assignment in a let actually ends when it's just another comma. e.g. let val :: Type = expr(x, y, z, w, etc), -- oh hey, notice that trailing comma is not quite like the others. This thing is, I tried this with semicolons too, and it's actually barely any better.

Actually, now that I contemplate this last point a bit, I wonder if we might also want to offer yet another option for let syntax inspired by case:

let
| val :: Type = expr
| val :: Type = expr
in expr

but I'm not sure this isn't a silly idea. But... it does appear to visually help understand complicated let expressions in the example I'm looking at. Hmm.

Anyway, here's a current let (from ProductionGraph.sv), a new let with commas, a new let with semicolons, and Ted's weird vertical bar idea, just to serve as an example:

  top.stitchedGraph = 
    let newEdges :: [Pair<FlowVertex FlowVertex>] =
          filter(edgeIsNew(_, graph),
            flatMap(stitchEdgesFor(_, top.flowTypes, top.prodGraphs), stitchPoints))
    in let repaired :: g:Graph<FlowVertex> =
             repairClosure(newEdges, graph)
    in if null(newEdges) then top else
         productionGraph(prod, lhsNt, flowTypeVertexes, repaired, suspectEdges, stitchPoints)
    end end;

  top.stitchedGraph = 
    let {
      newEdges :: [Pair<FlowVertex FlowVertex>] =
        filter(edgeIsNew(_, graph),
          flatMap(stitchEdgesFor(_, top.flowTypes, top.prodGraphs), stitchPoints)),
      repaired :: g:Graph<FlowVertex> =
        repairClosure(newEdges, graph)
    } in
      if null(newEdges) then top else
        productionGraph(prod, lhsNt, flowTypeVertexes, repaired, suspectEdges, stitchPoints);

  top.stitchedGraph = 
    let {
      newEdges :: [Pair<FlowVertex FlowVertex>] =
        filter(edgeIsNew(_, graph),
          flatMap(stitchEdgesFor(_, top.flowTypes, top.prodGraphs), stitchPoints));
      repaired :: g:Graph<FlowVertex> =
        repairClosure(newEdges, graph);
    } in
      if null(newEdges) then top else
        productionGraph(prod, lhsNt, flowTypeVertexes, repaired, suspectEdges, stitchPoints);

  top.stitchedGraph = 
    let
    | newEdges :: [Pair<FlowVertex FlowVertex>] =
        filter(edgeIsNew(_, graph),
          flatMap(stitchEdgesFor(_, top.flowTypes, top.prodGraphs), stitchPoints))
    | repaired :: g:Graph<FlowVertex> =
        repairClosure(newEdges, graph)
    in
      if null(newEdges) then top else
        productionGraph(prod, lhsNt, flowTypeVertexes, repaired, suspectEdges, stitchPoints);
krame505 commented 6 years ago

I initially preferred semicolons for consistency, but I don't really have a problem with commas if we change everything else. I often find myself forgetting the last semicolon in decorate anyway, so it might be kinda nice...

I'm not really sure what I think about the vertical bar thing. It does look kinda nice, but it might confuse users since no other language uses anything like that. Also for case it is sort of like a disjunction of clauses, one of which is "true" (i.e. matches), while there isn't really any such natural analog in the case of let, IDK.

One other thing - I am against getting rid of end in case, due to the syntactic ambiguity that this causes with nested cases. We don't have indentation sensitivity like Haskell to avoid this, and wrapping in parens is more annoying, IMHO.

tedinski commented 6 years ago

Well, we also have the concrete productions use of vertical bars, but I suppose that too could be considered a set of alternatives. Hmm.

Interesting point about nested case clauses. Also hmm.

ericvanwyk commented 6 years ago

I don't understand this statement from your original note: We have an aesthetic choice to make: should we use semicolons inside expressions, or not?

Haven't we answered this? We do not. Semicolons indicate the end of an expression, at least on sets of equations.

Thus, it seems more consistent to have semicolons at the end of equations in let expressions. Why change to commas for this form? Both are lists of equations. One has a type annotation, one doesn't but they are more the same than equal. Same argument applies for decorate and forwarding expressions.

I don't like the vertical bars on lets - as Lucas states, vertical bars look like disjunction and have that connotation.

I think we should be consistent with end. Either have it for all of if-then-else, let, and case, or none of them. I'd vote for dropping end from all. I agree the parens around nested case expressions are awkward, but OCaml/ML do it this way and I don't find it too annoying anymore. Maybe a few semesters of teaching OCaml to undergrads would convince you of this too. :)

tedinski commented 6 years ago

I don't understand this statement from your original note: We have an aesthetic choice to make: should we use semicolons inside expressions, or not?

All I mean is that let x :: Type = expr; y :: Type = expr in expr as a whole is an expression that has semicolons in it.

Anyway, I used to think as you do, but then I considered this set of current syntax:

let x :: Type = expr, y :: Type = expr in expr
decorate expr with { name = value; name = value; }
func(anno = expr, name = expr)
hypothetical records:  { name = value, name = value }

and I realized:

  1. Presently, it's really decorate that's the odd one out.
  2. I really hate that decorate uses semicolons, now that I've thought about it!
  3. It might be nice for some future parse error repair mechanism if semicolons were a reliable signal we were all the way back out at equations or top-level declarations.
  4. And never mind error repair, I think I aesthetically prefer it, too.

So that's how I came to think maybe we should get rid of the semicolons from decorate instead of adding them to let.

RE: end. Lucas, do you have a good example of code we already have written that nests case expressions? I'm maybe a little bit worried it could result in confusing error messages, but perhaps we could fix that with a warning about the indentation of | or something...

RE: vertical bars. I agree my proposed syntax was not a good one, but I still think I want an alternative syntax for vertically-laid out large let blocks, so how about this syntax where we allow let repetition?

let x :: Type = expr
let y :: Type = expr
in expr

In the example:

  top.stitchedGraph = 
    let newEdges :: [Pair<FlowVertex FlowVertex>] =
      filter(edgeIsNew(_, graph),
        flatMap(stitchEdgesFor(_, top.flowTypes, top.prodGraphs), stitchPoints))
    let repaired :: g:Graph<FlowVertex> =
      repairClosure(newEdges, graph)
    in
      if null(newEdges) then top else
        productionGraph(prod, lhsNt, flowTypeVertexes, repaired, suspectEdges, stitchPoints);
ericvanwyk commented 6 years ago

Just to document our conversation, this looks good to me.

krame505 commented 6 years ago

ableC has plenty of nested cases, e.g. in abstractsyntax/host/TypeOps.sv. For example,

  case a, b of
  | builtinType(_, x), builtinType(_, y) ->
      case usualArithmeticConversions(x, y) of
      | nothing() -> errorType()
      | just(z) -> builtinType(nilQualifier(), z) -- qualifiers?
      end
  | pointerType(_, _), builtinType(_, _) -> a
  | builtinType(_, _), pointerType(_, _) -> b
  | pointerType(_, _), pointerType(_, _) -> builtinType(nilQualifier(), signedType(intType()))
  | vectorType(b1, s1), vectorType(b2, s2) ->
      if compatibleTypes(b1, b2, true, false) && s1 == s2 then a else errorType()
  | _, _ -> errorType()
  end;

The other thing about having end on case is that it provides a stronger visual indication of where the clauses end, all on the same indentation level, joined by vertical bars between case and end. I also don't really see having end here but not on let as an inconsistency, as in a case end marks the end of a series of clauses, which don't appear anywhere else in the language. On the other hand, the body of let is just an expression, and it makes no more sense to have end here than it would after a lambda.

Another, somewhat worse issue with removing end in case is that forgetting parens and leaving out a case could sometimes lead to type-correct code that compiles and silently gives the wrong behavior. Take for instance

case m of
| just(a) ->
    case a.blah of
    | foo(b) -> b.count
    | bar() -> 42
| _ -> -1

Here just(baz()) would give -1 instead of failing as expected. This sort of issue would probably be caught in languages like OCaml or SML because you are probably using some sort of editor supporting auto-indent, which we don't have yet for Silver ;-) Also the types of programming we do in Silver (working with various types of ASTs) often involves more nested pattern matching then you may otherwise been doing in those languages.
Adding indentation-sensitive warnings to Silver for these cases as Ted suggested could help, I guess, but to me it really just seems like a lot of extra complexity for not much gain.

Allowing repetition of let is maybe better than using |, and I don't have a specific issue with that syntax, except that it seems to be over-complicating things. In your original post, it seems to me that your 3rd option does just fine in dividing when each clause starts and stops, via indentation. Having 2 totally independent syntactic forms for let, one with braces and one involving repetition of keywords, seems like it might be more confusing then otherwise. I guess maybe I'm not totally clear, were you considering allowing more than one declaration in a row using commas without braces?

One other place where we have declarations inside expressions: do-notation, provided by the monad extension. Here we can write an expression like

do (bindList, returnList) {
  iss :: [[Integer]] <- isss;
  is :: [[Integer]] <- iss;
  sum :: Integer = foldr(add, 0, is);
  return sum * 3;
}

The more I think about it, the less I like having bindings with type signatures ended by commas... so while I am in favor of changing the decorate syntax to use commas, I guess now I don't like commas for (recursive) let quite so much. Maybe commas are fine though for non-recursive let, where we don't allow type signatures?
Hopefully some of this rambling made sense.

tedinski commented 6 years ago

I also don't really see having end here but not on let as an inconsistency, as in a case end marks the end of a series of clauses, which don't appear anywhere else in the language. On the other hand, the body of let is just an expression, and it makes no more sense to have end here than it would after a lambda.

A very good point. Whenever I get around to addressing this issue, I may choose to leave end on case just for the moment at least. (Among other things, it looks like making it optional would be a breaking change for some code like what you posted.)

This syntax choice is consistent with Coq at least. (match...end and let... no end.)

Adding indentation-sensitive warnings to Silver for these cases as Ted suggested could help, I guess, but to me it really just seems like a lot of extra complexity

It's actually a pretty simple thing to do. I think it would solve the confusion problem, but I might be coming around to your point of view that maybe we shouldn't create the possibility for confusion in the first place.

Having 2 totally independent syntactic forms for let, one with braces and one involving repetition of keywords, seems like it might be more confusing then otherwise.

This is a price I think it worth paying. There's just a lot of visual clarity benefit from doing something to make vertically-aligned large let blocks emphasize their structure better than any choice of mere punctuation would allow.

I guess maybe I'm not totally clear, were you considering allowing more than one declaration in a row using commas without braces?

Yeah, the braces will just be optional, though we'll probably stylistically encourage them. Except maybe in really short expressions?

One other place where we have declarations inside expressions: do-notation

Ah, not sure how I missed that. But that's okay, I think that notation is a pretty explicit attempt at looking "statement-like" even though it's technically an expression. Seems like a reasonable exception to me.

The more I think about it, the less I like having bindings with type signatures ended by commas

To be honest, this is something we'll have to try out and use to see how things shake out in practice.

I kinda suspect we'll end up almost always using the "let repetition" vertical syntax in practice, and that most of the reason we'd even offer the other forms is just to ensure that they're there if we want them in case I'm wrong about that.

And maybe quick short expressions like let x = func(z), y = other(x) in if something(y) then x else y

So I guess maybe an alternate question of your opinion on that syntax: if it was the ONLY kind of let, would you think it was an acceptable choice? Maybe I should think more about this...

krame505 commented 6 years ago

if it was the ONLY kind of let, would you think it was an acceptable choice?

Not sure to which syntax you are referring here. But just to condense my thoughts on this:

One other comment:

let foo :: Foo = ...
let bar :: Bar = ...
in baz

initially jumps out to me as being non-recursive, since it looks too much like

let foo :: Foo = ... in
let bar :: Bar = ... in
in baz

while

let {
  foo :: Foo = ...;
  bar :: Bar = ...;
} in baz

more naturally seems recursive, since it looks sort of like a production body.

tedinski commented 6 years ago

initially jumps out to me as being non-recursive

I get that. I think I'd just get used to it, though. Among other reasons, I think there'd just be no reason to write nested lets like that. Do you have any alternative syntax suggestions?

The design criteria I want to satisfy here is making it easier to visually scan a large vertical let block. The example I pulled earlier was a pretty good one. The problem isn't visually apparent when you've got short expressions or .... But the example's let let in structure, especially with syntax highlighting on the keywords, jumps out a lot more. At least to me.

Both the comma and semicolon variants weren't good enough for me. (And we are going to go with commas I think.) Although I suppose we could try increasing the indentation a bit:

  top.stitchedGraph = 
    let {
      newEdges :: [Pair<FlowVertex FlowVertex>] =
          filter(edgeIsNew(_, graph),
            flatMap(stitchEdgesFor(_, top.flowTypes, top.prodGraphs), stitchPoints)),
      repaired :: g:Graph<FlowVertex> =
          repairClosure(newEdges, graph)
    } in
      if null(newEdges) then top else
        productionGraph(prod, lhsNt, flowTypeVertexes, repaired, suspectEdges, stitchPoints);

vs

  top.stitchedGraph = 
    let newEdges :: [Pair<FlowVertex FlowVertex>] =
      filter(edgeIsNew(_, graph),
        flatMap(stitchEdgesFor(_, top.flowTypes, top.prodGraphs), stitchPoints))
    let repaired :: g:Graph<FlowVertex> =
      repairClosure(newEdges, graph)
    in
      if null(newEdges) then top else
        productionGraph(prod, lhsNt, flowTypeVertexes, repaired, suspectEdges, stitchPoints);

I still like the second better.

There's also a kind of nice symmetry with global var :: Type = Expr local var :: Type = Expr let var :: Type = Expr (Though I grant this particular comparison also makes the semicolon seem like a reasonable choice, argh. But another reason not to use semicolons is do notation, come to think of it. let x = e; could be recognized as a do-let due to the semicolon...)

The thing is, stylistically, I suspect our advice to users would be to use

let x = a, y = b in f(x, y)

when it can fit on one or two lines. Then use

let x :: Type =
  a...
let y :: Type =
  b...
in
  f(x, y)

once things need more than several lines. So the braces and commas syntax might even be vestigial. (I'd still add it, but I'm not sure if we'd use it. But if it turns out to have a use case, it should be there...)

So I want to make sure I get it right, since it'll probably be the most commonly used syntax.

krame505 commented 6 years ago

Yeah, I get that... but after thinking about it some more I still just don't like it :grin: It just doesn't fit with the rest of the language, or let syntax in any other language. Braces-with-semicolons provide regularity with our do-notation and Haskell let syntax, at least. I also still don't think that noticing where each new individual binding starts will be an issue, since syntax highlighting will probably highlight the type expression (and possibly also the type operator); if this is still a problem we could also institute a convention of leaving a blank line between bindings.

But at this point I doubt I'm going to change your mind :smiley: so I suppose I would be OK with doing both as a compromise.

By the way, if I understand correctly, you want to use commas for both let variants with braces? As you mentioned, I find this choice kinda ugly, as we are otherwise planning to otherwise be consistent throughout the language in having name = val always be separated by a comma, and name :: Type = val always terminated by a semicolon. Doing otherwise would be inconsistent with the current do notation syntax, so IMO we shouldn't go in that direction unless we also want to majorly rethink this as well.

But another reason not to use semicolons is do notation, come to think of it. let x = e; could be recognized as a do-let due to the semicolon...

Not sure I understand your point here - are you proposing to have let also function as do in some cases? If so, I don't like that, as it doesn't make it obvious that monadic stuff is going on. But the current do syntax already allows foo :: Foo <- bar; as bind and foo :: Foo = bar; as let.

tedinski commented 6 years ago

But at this point I doubt I'm going to change your mind 😃

Well, you're still bringing up things I haven't considered, so...

we could also institute a convention of leaving a blank line between bindings.

Hmm, also a good thought, though I'm not sure if I like creating extra space inside expressions like that.

But the current do syntax already allows foo :: Foo <- bar; as bind and foo :: Foo = bar; as let.

Oh. Did we ever carefully decide on that? Consistency with Haskell would have us use foo <- bar; for bind and let foo = bar; as let. I don't see a reason to be inconsistent. Possibly this choice was dictated because our current let syntax had ambiguity issues that needed solving to use the keyword there. I think those can be solved and we should change this about our do syntax.

Anyway, I'll probably pause this discussion for now, since I'm not going to get to this immediately anyway. Looks like we're in agreement on:

I think for the moment I'm convinced:

I might be persuaded:

I might be dissuaded:

Other things to think about:

And just to conclude with a few example blocks for future thought:

  top.stitchedGraph = 
    let {
      newEdges :: [Pair<FlowVertex FlowVertex>] =
          filter(edgeIsNew(_, graph),
            flatMap(stitchEdgesFor(_, top.flowTypes, top.prodGraphs), stitchPoints));
      repaired :: g:Graph<FlowVertex> =
          repairClosure(newEdges, graph);
    } in
      if null(newEdges) then top else
        productionGraph(prod, lhsNt, flowTypeVertexes, repaired, suspectEdges, stitchPoints);
  top.stitchedGraph = 
    let newEdges :: [Pair<FlowVertex FlowVertex>] =
      filter(edgeIsNew(_, graph),
        flatMap(stitchEdgesFor(_, top.flowTypes, top.prodGraphs), stitchPoints))
    let repaired :: g:Graph<FlowVertex> =
      repairClosure(newEdges, graph)
    in
      if null(newEdges) then top else
        productionGraph(prod, lhsNt, flowTypeVertexes, repaired, suspectEdges, stitchPoints);
  top.stitchedGraph = do {
    let newEdges :: [Pair<FlowVertex FlowVertex>] =
      filter(edgeIsNew(_, graph),
        flatMap(stitchEdgesFor(_, top.flowTypes, top.prodGraphs), stitchPoints));
    let repaired :: g:Graph<FlowVertex> =
      repairClosure(newEdges, graph);
    return
      if null(newEdges) then top else
        productionGraph(prod, lhsNt, flowTypeVertexes, repaired, suspectEdges, stitchPoints);
  };