scala / scala3

The Scala 3 compiler, also known as Dotty.
https://dotty.epfl.ch
Apache License 2.0
5.81k stars 1.05k forks source link

Consider syntax with significant indentation #2491

Closed odersky closed 7 years ago

odersky commented 7 years ago

I was playing for a while now with ways to make Scala's syntax indentation-based. I always admired the neatness of Python syntax and also found that F# has benefited greatly from its optional indentation-based syntax, so much so that nobody seems to use the original syntax anymore. I had some good conversations with @lihaoyi at Scala Exchange in 2015 about this. At the time, there were some issues with which I was not happy yet, notably how to elide braces of arguments to user-defined functions. I now have a proposal that addresses these issues.

Proposal in a Nutshell

Motivation

Why use indentation-based syntax?

This solves the alignment issue: The if and the elses are now vertically aligned. But it gives up even more control over vertical whitespace.

Impediments

What are the reasons for preferring braces over indentations?

But neither of these points are the strongest argument against indentation. The strongest argument is clearly

Proposal in Detail

Expanded use of with

While we are about to phase out with as a connective for types, we propose to add it in two new roles for definitions and terms. For definitions, we allow with as an optional prefix of (so far brace-enclosed) statement sequences in templates, packages, and enums. For terms, we allow with as another way to express function application. f with { e } is the same as f{e}. This second rule looks redundant at first, but will become important once significant indentation is added. The proposed syntax changes are as described in this diff.

Significant Indentation

In code outside of braces, parentheses or brackets we maintain a stack of indentation levels. At the start of the program, the stack consists of the indentation level zero.

If a line ends in one of the keywords =, if, then, else, match, for, yield, while, do, try, catch, finally or with, and the next token starts in a column greater than the topmost indentation level of the stack, an open brace { is implicitly inserted and the starting column of the token is pushed as new top entry on the stack.

If a line starts in a column smaller than the current topmost indentation level, it is checked that there is an entry in the stack whose indentation level precisely matches the start column. The stack is popped until that entry is at the top and for each popped entry a closing brace } is implicitly inserted. If there is no entry in the stack whose indentation level precisely matches the start column an error is issued.

None of these steps is taken in code that is enclosed in braces, parentheses or brackets.

Lambdas with with

A special convention allows the common layout of lambda arguments without braces, as in:

xs.map with x =>
  ...

The rule is as follows: If a line contains an occurrence of the with keyword, and that same line ends in a => and is followed by an indented block, and neither the with nor the => is enclosed by braces, parentheses or brackets, an open brace { is assumed directly following the with and a matching closing brace is assumed at the end of the indented block.

If there are several occurrences of with on the same line that match the condition above, the last one is chosen as the start of the indented block.

Interpreted End-Comments

If a statement follows a long indented code block, it is sometimes difficult as a writer to ensure that the statement is correctly indented, or as a reader to find out to what indentation level the new statement belongs. Braces help because they show that something ends here, even though they do not say by themselves what. We can improve code understanding by adding comments when a long definition ends, as in the following code:

    def f =
       def g =
          ...
          (long code sequence)
          ...
    // end f

    def h

The proposal is to make comments like this one more useful by checking that the indentation of the // end comment matches the indentation of the structure it refers to. In case of discrepancy, the compiler should issue a warning like:

// end f
~~~~~~
misaligned // end, corresponds to nothing

More precisely, let an "end-comment" be a line comment of the form

// end <id>

where <id> is a consecutive sequence of identifier and/or operator characters and <id> either ends the comment or is followed by a punctuation character ., ;, or ,. If <id> is one of the strings def, val, type, class, object, enum, package, if, match, try, while, do, or for, the compiler checks that the comment is immediately preceded by a syntactic construct described by a keyword matching <id> and starting in the same column as the end comment. If <id> is an identifier or operator name, the compiler checks that the comment is immediately preceded by a definition of that identifier or operator that starts in the same column as the end comment. If a check fails, a warning is issued.

Implementation

The proposal has been implemented in #2488. The implementation is quite similar to the way optional semicolons are supported. The bulk of the implementation can be done in the lexical analyzer, looking only at the current token and line indentation. The rule for "lambdas with with" requires some lookahead in the lexical analyzer to check the status at the end of the current line. The parser needs to be modified in a straightforward way to support the new syntax with the generalized use of with.

Example

Here's some example code, which has been compiled with the implementation in #2488.

object Test with

  val xs = List(1, 2, 3)

// Plain indentation

  xs.map with
       x => x + 2
    .filter with
       x => x % 2 == 0
    .foldLeft(0) with
       _ + _

// Using lambdas with `with`

  xs.map with x =>
      x + 2
    .filter with x =>
      x % 2 == 0
    .foldLeft(0) with
      _ + _

// for expressions

  for
    x <- List(1, 2, 3)
    y <- List(x + 1)
  yield
    x + y

  for
    x <- List(1, 2, 3)
    y <- List(x + 1)
  do
    println(x + y)

// Try expressions

  try
    val x = 3
    1.0 / x
  catch
    case ex: Exception =>
      0
  finally
    println("done")

// Match expressions

  xs match
    case Nil =>
      println()
      0
    case x :: Nil =>
      1
    case _ => 2

// While and Do

  do
    println("x")
    println("y")
  while
    println("z")
    true

  while
    println("z")
    true
  do
    println("x")
    println("y")

  // end while

// end Test

package p with

  object o with

    class C extends Object
               with Serializable with

      val x = new C with
          def y = 3

      val result =
        if x == x then
          println("yes")
          true
        else
          println("no")
          false

    // end C
  // end o
nafg commented 7 years ago

Another observation: This only eliminates { }. Scala still requires lots of ( ) and [ ]. So we're not going to look like F# or Python anyway. Scala will still need lots of these symbols. So it's not even buying much of the shiny candy anyway.

tenorviol commented 7 years ago

What about double curly braces? This is clearly better (because reasons):

if (dontWorry) {{
  beAwesome()
}}
LPTK commented 7 years ago

@nafg

This only eliminates { }. Scala still requires lots of ( )

But ( ) in expressions can usually be substituted with { }, so these would also be able to be eliminated in some situations. This could be a nice way to avoid nesting many pairs of parentheses. I don't particularly like the with syntax overloading, but at least it could allow replacing:

println(f(x.foo(readInt())))

... by this:

println(f with x.foo with readInt())
nafg commented 7 years ago

@LPTK

println(f with x.foo with readInt())

:scream:

But seriously, you know today you can do

println(f apply (x foo readInt()))
pkolaczk commented 7 years ago

@pkolaczk I don't think anyone is considering outlawing braces, just making them optional. So that would make a total of 9 ways

It is not the number of total ways that count, but the number of ways that make sense in a single project. A problem with braces is that if a project style guide enforces one style of placing them, then there is always a case where breaking the rules of the style guide will make code more readable. But if you don't enforce a consistent style guide, you get braces placed in so many different ways, that it is hard to read.

nafg commented 7 years ago

@pkolaczk I think there are two separate problems with flexibility.

One is that it makes a project's code inconsistent, which makes it annoying for everyone, especially people new to scala. Your point addresses this problem.

However there is a separate problem, which was my point and which is not solved, which is that people just don't like looking into a language and being told there are 9 ways to do something, or even 2 ways. It turns people off, and more importantly it makes it harder to learn.

Part of why it makes it harder to learn is that instead of learning valid grammar and everything else is invalid, it's more like a "space of valid grammars," and when you write invalid code it's harder to figure out what you got wrong. So you have to learn both what is valid grammar, and what is invalid grammar.

At least that's my theory for why it's harder or why people don't like it. Not sure if it makes sense, but in any case my experience is that that is the sentiment, whatever the reason is.

timperrett commented 7 years ago

Firstly, I applaud the responses in this thread for their objectivity, and openness to discuss this proposal, and as such won't attempt to coerce a technical choice one way or the other; some excellent technical arguments have been made already both for and against.

Instead, I want to voice a concern as a long-time Scala community member, and engineering leader at a fortune 15 company. I've been doing Scala for a decade, and operate a massive Scala codebase with many hundreds of engineers to educate, encourage and protect. With this frame, I cannot support a proposal such as this. Scala already has many strange and confusing corners, which for both new comers and veteran engineers alike, prove awkward and annoying... its not entirely clear why we would want to expand this domain of complicated and annoying corner cases, at a time when we should be shrinking this domain and driving for simplicity. The migration to dotty is already of huge concern for business with many millions of dollars invested in Scala, and this kind of arbitrary change to spaces over the long established braces, would - in my opinion - cause a hard fork in the community: those on dotty, and those who are not.

RichyHBM commented 7 years ago

Just a thought, but what about adding a secondary file type? .scala would continue as is, and a new ".dotty" file could require indentation, it could also provide a space for additional feature experimentation?

DarkDimius commented 7 years ago

I do like the feel of new syntax. But I'm hesitant to support this change. My biggest concern is copy-paste friendliness.

Being able to copy code from forums\stackoverflow\chat is a very important necessity for newcomers. While we can use scalafix to for local rewriting, we can not run scalafix on entire StackOverflow to do conversion in either direction.

Copy paste is also a powerfull tool for experts: when I play with a new a library to use I start by opening the tests that library developed. I copy some snipptets from tests to REPL in order to see how library behaves and feels.

If someone asks me to help him figure out an issue, he's likely to send me his code that I'll copy-paste. Same when I'll send him back a solution, he'll copy-paste it.

I am afraid that introducing errors during copy-paste stage might substantially hinder ability of our community to help each-other.

nafg commented 7 years ago

@RichyHBM how is that better?

RichyHBM commented 7 years ago

@nafg It was more of an idea to try and please both worlds, keeping scala as is but also providing a mechanism for developers to use these slightly more controversial features. Of course the compiler/back-end would need to support all these features equally but it could allow people developing in the language to know that a .scala file will always conform to the current scala language.

Whilst a feature is an addition to the language, this feels more like a change to what is already in the language, and as such I feel it deserves some differentiation

keithasaurus commented 7 years ago

Aside from history, I've never heard a good reason for curly braces to exist in this day-and-age. They reduce the signal-to-noise ratio of code. Significant spacing (and strict style requirements in general) lead to more focus on the logic and less focus on the format.

From personal experience, I'd also suggest not including an "end" statement. Hasn't been necessary in any of the code I've written in Python, elm, haskell, or coffeescript,

I agree with a lot of people that allowing two syntaxes is not going to be sustainable for the language. Would it be a great challenge to write an automated converter and deprecate curly braces?

DarkDimius commented 7 years ago

@keithasaurus

I've never heard a good reason for curly braces to exist in this day-and-age.

While it is not clear if experience is transferable between languages, but Go, a recently designed language of our day-and-age, made a conscious decision to not use significant spacing. See above: https://github.com/lampepfl/dotty/issues/2491#issuecomment-303074554.

obask commented 7 years ago

Have you considered using Haskell's "application operator" instead of "with" keyword? For me both ways are readable:

take 1 $ filter even [1..10]
-- OR
take 1 with filter even [1..10]

https://stackoverflow.com/a/19521376/1205211 PS I'm also not a big fan of "val"s on every line, even golang doesn't have this rudiment.

keithasaurus commented 7 years ago

Go is a very specific language, though. And it's written very specifically for google's own preferences. I mean I'm not sure the way it does garbage collection, generics, automated/enforced formatting (I do like this!) or other things is applicable to every language/situation. They may have run into issues with spacing every once in a while causing issues, but curly braces is an inefficiency in every part of your code. More characters, more lines, less homogeneous code.

For me, the lesson to learn from Python is that readability counts. Because aside from readability, there are very few things that Python does well. And yet it's one of the most widely used languages today.

shawjef3 commented 7 years ago

Readability is important, but so are write-ability and edit-ability. Braces give us the second two while taking away from the first in only the smallest way (trailing '}').

shawjef3 commented 7 years ago

Does the while block in the following go with the first or second do block?

  do
    println("x")
    println("y")
  while
    println("z")
    true
  do
    println("x")
    println("y")
keithasaurus commented 7 years ago

@shawjef3 I agree that the do/while above is weird. Although, in your example, it would likely be that the first do while is valid and then the second would look like an incomplete do while

I think the implementation should look like:

do
    println("x")
    println("y")
while (true)

// this code would be invalid.
    println("z")
    true
do
    println("x")
    println("y")

Or without parens:

do
    println("x")
    println("y")
while true //  this wouldn't be able to appear on the next line

This would also look weird with curly braces:

do {
    ....
} while {
    ....
} do {
    ....
}

I'm not sure I've seen a language where both do and while can have numerous statements like in the above, and I'm not sure I understand the point.

Either way, that's implementation-specific. I don't think it has anything to do with space significance in general.

nafg commented 7 years ago

@keithasaurus some have been given in this thread. Can you explain why you don't think they are issues?

shawjef3 commented 7 years ago

@keithasaurus My thinking is that a do block without a 'while` would be valid, because such a block in existing code is valid. Example:

class X {
  {
    println("valid")
  }
}
floating-cat commented 7 years ago

Looks no one mention Sass (a stylesheet language). Sass consists of two syntaxes, you can use the indented syntax if you use .sass file extension, brackets style if you use .scss file extension. Sass 3 introduced the latter one which is fully compatible with the syntax of CSS. Because Sass is much simpler to Scala, Sass two visually distinct ways don't have the similar problems mention above like Scala, but maybe we can still learn something from Sass. Though I am not sure whether use two different file extension is a good idea for Scala or we can use both style in the same file.

If you use use IntelliJ IDEA and copy-paste Java code into Kotlin file, and IDE will suggest to convert it. I think we can follow the same way if we adopt this proposal, the IDE or code editor could suggest to convert it to the indented/brackets style if you copy code from StackOverflow or other places. Though the pitfall still exists if you paste you code in the incorrect indented place.

som-snytt commented 7 years ago

Why did Martin use braces in the first place? To keep his pants up, of course.

This is an especially fun thread. I'm grateful to listen in on a vigorous and invigorating debate about a language getting long in the tooth.

Overloading : seems OK, as it is bound to pass _ on StackOverflow for "What are all the uses of colon in Scala?" Besides sequence wildcard, the meaning of trailing colon for operator associativity means you must remember to insert a space because colon is not punctuation. def *: Double.

White space also matters for nl inference. I forget the precise rules, so I take a refresher course on Coursera every year, the one called, nl? np!

So really we have been primed for significant white space. I don't even know why we must still call it white space, as it depends on your color scheme. White space supremacy ought to end with dotty.

Since a space is just an underscore without a score, one might honor a certain project by calling it the underscoreless, and we are concerned here with significant underscorelessness.

godenji commented 7 years ago

Can adapt to either syntax, though staying the course with existing brace based syntax is the path of least resistance (i.e. do nothing). Braces/indentation aside, it would be wonderful if Dotty somehow "fixed" the following syntactic requirements:

if(cond) x else y
if cond then x else y

foo match {
  case x: Bar =>
  case x: Baz =>
}

foo match {
  x: Bar =>
  x: Baz =>
}

If the case keyword were relegated to case class and case object definitions that would be wonderful. Apparently implicit functions will allow the removal of thousands of implicit method annotations from the Dotty compiler; eliminating case from pattern matching blocks would remove perhaps millions of case entries from current Scala code bases in the world.

AlexGalays commented 7 years ago

It has novelty value but this has got to have the lowest priority ever 😈

Never heard of anyone struggling with braces; heard about plenty of people struggling with other things.

V-Lamp commented 7 years ago

If both syntaxes are available at once, then it becomes one more choice on the developer's head: "Should I use braces or not for this expression?". A well iterated point on improving productivity is to minimize the amount of daily choices one makes. Scala has always been on the "more than one way" side, but I feel this takes it to a needless extreme.

If only one syntax is allowed, then there has to be a compiler flag. Within an organisation you can enforce a uniform coding style (and set of flags), but not across the internet. Imagine the horror on StackOverflow: "You need to set this compiler flag to X to run this snippet". What about copy-pasting "legacy" code snippets from SO into a brace-less dotty codebase?

From a language designer perspective, this is a minor parser change, but from a user perspective it is like a new language. As AlexGalays says above, the value of this compared to its risk/cost feels tiny.

As with every major language version, the risk of split on major versions is very real (Python...). It would be great for Scala to focus instead on minimizing that overhead across use cases and even after the first wave of adoption (and I know there is active work on this with Scala meta but this does not cover all use cases) and ensure a smooth and rather uniform transition. A thriving Scala is in our common interest 😃

koeninger commented 7 years ago

I've seen significant whitespace in python lead to errors that permanently destroyed end user data.

If you're changing anything about blocks, it should be to require braces even for one liners. It's safer.

LPTK commented 7 years ago

@DarkDimius

My biggest concern is copy-paste friendliness.

If braces are allowed to be used within indent-based code, copy-pasting should not be an issue. For example, say you want to copy this snippet of code:

    // sample code to copy-paste
    ls.map with
        _ + 1

The trick for copy-pasting code while conserving relative indentation is to always include the linebreak above the code snippet before pressing Ctrl-C, so that what you select would be:

    // sample code to copy-paste[   <- selection starts here
    ls.map with
        _ + 1]   <- selection ends here

Now, say you want to paste the code in the following program:

class A with
    if x > 0 then
        println(123)
        // want to insert code on the following line:

You only need to press { (modern editors will insert the closing } automatically)...

class A with
    if x > 0 then
        println(123)
        // want to insert code on the following line:
        {}

And then press Ctrl+V, which would result in valid code:

class A with
    if x > 0 then
        println(123)
        // want to insert code on the following line:
        {
    ls.map with
        _ + 1}

The IDE would then have no trouble reformatting this code using the preferred whitespace style, if desired. (Or you can easily do it manually later.)

class A with
    if x > 0 then
        println(123)
        // want to insert code on the following line:
        ls.map with
            _ + 1
DarkDimius commented 7 years ago

@LPTK, what I'm saying is not that "with proper tooling and by carefully copying proper parts of code it will work", but that you'll have substantially more errors if some-thing doesn't work out. All tools have to agree on how you copy code and how you paste it. All users should also be careful to copy code correctly by following some "tricks", as you called it. I don't think we should add more "tricks" that people should learn.

alejandrolujan commented 7 years ago

I feel strongly against this for three main reasons:

  1. As stated above by others, copy-pasting would very likely be affected by this (until/if IDEs catch up, who knows how long) which is a huge deal IMO.
  2. I get many complaints from newcomers about the underscore, in the lines of "why does it mean 10 different things depending on context?". Introducing more overloaded terms will negatively impact learning curve IMO.
  3. I regularly use my IDEs capacity to highlight the block of code surrounded by a pair of curly braces. How long until they catch up to meaningful whitespace and provide similar functionality?

But, fundamentally, I agree with others on the argument that this is not currently a problem at all.

lostmsu commented 7 years ago

@alejandrolujan Number 3 seems to be rather artificial. While it's bad IDEs will have to catch up, they will have to catch up with any syntax change. That should never stop one from happening to begin with.

adamw commented 7 years ago

Leaving aside considerations if this would work fine or not, I think that Scala already has too many syntactic options, which might be confusing even for somebody working with Scala for a longer time. Adding more would only add to the confusion, probably with little benefit.

ryantheleach commented 7 years ago

@lostmsu It's relevant because we are removing 'excess?' information.

Think of the verbosity of Java, that enables IDE's to have sufficient side channel information to correct a whole ton of newbie errors with the syntax, that just won't fly in Scala due to the meaning being far more ambiguous due to the amount of overloaded / optional keywords / different arrangements of keywords, and ultimately more efficient compression of information.

Anything that potentially harms IDE's from correcting and aiding newbies to the language should be avoided.

realzeus commented 7 years ago

The advantage of braces is that they provide fully deterministic syntax.

if (condition)
  println("something")
  action()

Currently it will be always with full determinism reformatted to

if (condition)
  println("something")
action()

Consider increase of efforts needed to detect and fix these errors.

I have worked with Python enough to say that these errors are very hard to find - every time you have to understand the entire block to check if indentation is correct. It's much more complicated than "missing brace".

Atry commented 7 years ago

I like the with lambda syntax.

@odersky Is it possible to loose the requirement of indentation? It would be very useful for asynchronous programming, for example:

def readFromDatabase(key: String): Future[String] = ???
def shutdownDatabase(): Future[Unit] = ???
def doSomethingWithJava(platformConfiguration: String): Future[Result] = ???
def doSomethingWithDotty(platformConfiguration: String): Future[Result] = ???

readFromDatabase("language").flatMap with language =>
readFromDatabase("version").flatMap with version =>
(
  if (language == "Scala" && version == "3.0") {
    readFromDatabase("dotty").map with dottyConfiguration =>
    doSomethingWithDotty(dottyConfiguration)
  } else {
    readFromDatabase("java").map with javaConfiguration =>
    doSomethingWithJava(javaConfiguration)
  }
).flatMap with result =>
shutdownDatabase().map with case () =>
result

The with lambda syntax can be used as a replacement of async/await

realzeus commented 7 years ago

Concerning vertical and horizontal alignment: Scala, Java or Python do not depend on horizontal spacing.

val a=1+2*3
val b = 1 + 2*3
val c = 1 + 2 * 3

are all the same. So, we are free to use spaces for emphasizing some code to other readers, not compiler. And compiler's behavior is always deterministic.

We have the same freedom with braces:

With significant indentation it will be prescribed when to put a new line. It will make map and filter chaining more verbose.

LannyRipple commented 7 years ago

I wouldn't mind this at all if I could mix and match. For small things I think it would be an improvement but I'd want to be able to fall back to {} where I needed to or desired. I think the proposal for "Significant Indentation" should be changed from

an opening/closing {/} will be implicitly inserted

to

an opening/closing {/} will be implicitly inserted unless found as the next token (which, for an opening brace, will turn off Significant Spacing until it is invoked again)

Mandatory significant spacing makes a program brittle. Without significant spacing I can make the computer work for me to format the program as I like. With significant spacing I have to hand-hold the computer to reach a correct program. (Computers should work for humans and not the other way around.)

ctoomey commented 7 years ago

I'd be in favor of this for a clean-sheet language, but as others have said, at this point it's only going to make things more confusing for newcomers and introduce more disparity between different developers' and teams' code.

Instead of this, I'd rather see a standard formatting along with a tool to enforce it ala gofmt. Having all Scala code formatted the same would be a lot more beneficial than adding one more way to make it different IMO.

felixmulder commented 7 years ago

Thank you all for your feedback. Both me and the rest of the Dotty team have been delighted to see the spirited, and civilized debate on this issue.

One of the key goals of Dotty is simplification, and this of course extends to the syntax of the Scala language itself. The implementation of the proposal as-is, is incredibly straight forward in the parser and scanner but does not require more alterations to other parts of the compiler pipeline - as demonstrated in #2488.

That being said: the proposal as it is now, adds a bit too much leniency to the syntax of Scala. The language might be better served evolved at a slower pace in a more principled manner.

We're very happy with the different suggestions in this thread and will perhaps draw on them as inspiration in the future. For now - I'm closing this issue as we will not be moving forward with this suggestion as is (also: our inboxes are overflowing).

If this feature, or any other is something that you feel should be further discussed, please open a thread on https://contributors.scala-lang.org/

Cheers, Felix & @lampepfl/dotty-team

tr00per commented 7 years ago

Just make Scala look like Haskell: https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17500010 ;) But seriously, Layout is cool, IMHO works better than what Python offers.

SRGOM commented 7 years ago

I don't use IDEs, I use vim and I like being able to close my eyes and hit ggVG=. Literally the only reason I avoid python is indentation-defined-blocks. It gets really hard to stay confident.

SRGOM commented 7 years ago

@felixmulder & @odersky Thank you guys for being so open about suggestions and welcoming and accepting negative feedback.

To be clear, while some remarks from people may have been dismissive and off-the-cuff, all of us have huge respect for the scala team and of course, Martin.

reverofevil commented 7 years ago

@felixmulder That was the most epic trolling in the PL community I've ever seen: introduce long awaited feature and discard it on the same week. Made me a good chuckle. Thanks!

harald-gegenfurtner commented 7 years ago

Let me just repeat this one statement:

Indentation does not scale well.

guizmaii commented 7 years ago

What I love with Scala, it's the confidence I have in the code. This proposal could lower the confidence I could have in my code. So, IMHO, it's not a good path to follow for a language that sell himself as "safe".

felixmulder commented 7 years ago

I think it's time to lock this now - thanks for the discussion guys! As aforementioned: http://contributors.scala-lang.org for more discussions :heart: