jashkenas / coffeescript

Unfancy JavaScript
https://coffeescript.org/
MIT License
16.49k stars 1.99k forks source link

Docs: Better explanations of comprehensions #2039

Closed showell closed 7 years ago

showell commented 12 years ago

There are a couple open issues on nested comprehensions, where folks ask whether CoffeeScript should be more like Python and Haskell. However those issues get resolved, we can probably start by documenting the current behavior more precisely.

First, without getting into abstract definitions, it's pretty easy to call out the two main differences between CoffeeScript and Python:

  1. Nested comprehensions are not automatically flattened (map instead of concatMap/flatMap)
  2. The index of the comprehension nested deepest varies fastest

Second, it's important to call out the distinction between these lines in coffeescript:

eat(x) for x in a # loop w/side effects
squares = (square(x) for x in a) # comprehension

Third, it's important to distinguish these two syntactical constructs (which produce the same code):

squares = (square(x) for x in a) # comprehension
squares = (for x in a 
  square(x)) # array expression

The two pieces of code are identical semantically, but they are different syntactically. IMHO we should use a precise definition of "comprehension" that refers specifically to the first syntax. Wikipedia and Python's PEP both provide precedence for calling out comprehensions as a specific syntactical notation for producing lists. Wikipedia emphasizes the analogy to set builder notation, which specifically puts the output function first in the expression.

I propose these definitions:

  1. Loops - classic loops used to produce side effects
  2. Postfix loops -- classic loops where the one-statement body goes first
  3. List expressions -- loops in CS that are evaluated as expressions, where "for" precedes the body
  4. Comprehensions -- syntax in CS that specifically places the output expression before the "for"

I predict these definitions will be up for extensive debate. See the discussion between me and @michaelficarra at issue 2030.

Once the definitions get nailed down, I propose that we reorganize the docs to show all four forms separately, even if we can't agree to come up with four different terms for the forms.

Other related issues:

  1. issue 1191
  2. issue 2038
showell commented 12 years ago

Here is a more symmetrical terminology for classifying the four forms:

  1. Prefix For Loops (aka classic loops w/side effects)
  2. Postfix For Loops (only one statement in body?)
  3. Prefix Array Expressions
  4. Postfix Array Expressions (aka comprehensions, with different semantics than Python/Haskell)

1 and #2 can produce the same JS code, but they give syntax flexibility on emphasizing either the loops or the body.

3 and #4 can produce the same JS code, but they give syntax flexibility on emphasizing either the loops or the resulting elements.

1/#2 differ from #3/#4 in the code they produce. Code from #3/#4 produces an array, which can either be assigned to a variable, inlined into an outer expression, or implicitly returned to the caller. Code from #1/#2 is more terse and generally slightly more performant, because it doesn't produce a new array.

michaelficarra commented 12 years ago
fn = -> a(); b() # "function literal"
fn = -> # any name other than "function literal"
  a()
  b()

Pretty crazy to call those two semantically equivalent constructs different things, right? That's how I feel about differentiating "comprehension" and "array expression" in the first post. Same thing with then/indent swaps. It's just an alternative syntax that generates the exact same AST. I defer to my descriptions in #2030 for my opinion on the proper categorisation and terms.

precedence

You've been using this word a lot, but I think you mean to say "precedents".

showell commented 12 years ago

I don't know why it's crazy to come up with different names for two syntactical forms. Sure, both of your examples are functions literals, but the first one uses the single-line function literal form, while the second one uses the block function literal form.

What possible benefit do we get from vague terminology?

There are precedents all over the place for using names to distinguish alternative syntaxes of the same semantic constructs. You have prefix if vs. postfix if. You have infix arithmetic expressions vs. RPN. You have whitespace-based blocking vs. brace-based blocking.

michaelficarra commented 12 years ago

@showell:

I don't know why it's crazy [...]

Because they're indistinguishable to the compiler. They just look different to us humans. So they're not actually different.

[...] vague terminology?

It's not vague when it describes a set of things that are indistinguishable from one another. Go ahead and call the syntaxes whatever you like -- single-line, multi-line, whatever -- it doesn't change the fact that they all cause indistinguishable AST nodes of type Code to be inserted into the AST.

There are precedents all over the place for using names to distinguish alternative syntaxes [...]

Correct, and those names are all defining syntactic properties. Here, we are trying to categorise/distinguish the set of all AST nodes generated with any syntax containing for/while/loop/unless.

geraldalewis commented 12 years ago

They just look different to us humans. So they're not actually different.

But shouldn't documentation be written for us humans?

showell commented 12 years ago

@michaelficarra Who is "we"? You may be trying to distinguish sets of AST nodes, but I'm trying to distinguish syntactical forms, which is more important to most CoffeeScript users. The context of this discussion is the documentation here:

http://coffeescript.org/

It's a syntax document:

  1. "There's also a handy postfix form, with the if or unless at the end."
  2. "CoffeeScript implements ECMAScript Harmony's proposed destructuring assignment syntax."
  3. "Block comments, which mirror the syntax for heredocs, are preserved in the generated code."
showell commented 12 years ago

@geraldalewis Documentation written for humans? Now that's just crazy talk. :)

showell commented 12 years ago

@michaelficarra I am reminded that there are only three truly difficult things in computer science: coming up with good names for things and avoiding off-by-one errors. Gotta run. Look forward to more discussion later in the day.

showell commented 12 years ago

Here is an example cut at the docs that starts with the imperative forms first.


CoffeeScript supports several looping constructs, with two interesting variations. First, you can use natural-language variations of constructs that emphasize the "body" of the loop. Second, loops can be turned into array expressions (aka "comprehensions").

Let's begin with traditional imperative loops.

Imperative Loops

The for-in loop operates on arrays.

for food in ['toast', 'cheese', 'wine']
  prepare food
  eat food
clean_dishes()

courses = ['greens', 'caviar', 'truffles', 'roast', 'cake']
for dish, i in course
  menu i + 1, dish
return

The for-of loop allows you to iterate over the keys and values of an object:

children = joe: 10, ida: 9, tim: 11

for child, age of children
  console.log "#{child} is #{age}"
return

The "while", "loop", and "until" statements allow low-level looping:

num = 6
while num -= 1
  sing "#{num} little monkeys, jumping on the bed.
    One fell out and bumped his head."

loop
  repeat_same_task() # infinite loop

until x > 1000
  x = x * 2
console.log x

Postfix Loops

Imperative loops can be written in postfix form to emphasize the actions:

eat food for food in ['apple', 'banana']
console.log "#{child} is #{age}" for child, age of children
shout "Listen to me!" until heard

You can use the "when" construct to avoid actions on certain loop values:

eat food for food in foods when food isnt 'chocolate'

Array Expressions

Now we get to a powerful and fundamental feature of CoffeeScript--loops can be turned into array expressions.

Let's start with a simple example:

cubes = (
  for x in values
    x * x * x
)

The above code maps values to a new array called cubes, where each element in cubes is the cube of the corresponding element from values.

All loops can be turned into expressions using the model above. Also, if a loop is the final statement executed in a function, then the loop is also implicitly turned into an array expression.

child_statements = (ages) ->
  # this function returns an array of strings
  for child, age of ages
     "#{child} is #{age} years old"

# Nursery Rhyme
num = 6
lyrics = while num -= 1
  "#{num} little monkeys, jumping on the bed.
    One fell out and bumped his head."

Comprehensions

When you create an array from another array, you sometimes want to emphasize the mapping function, placing it first in the expression. This syntactical construct is called a list comprehension.

# return an array of cubes, using only even numbers
# from the source array
cubes = (x*x*x for x in nums when x % 2 == 0)

All looping constructs can be turned into comprehensions.

whitespace_rows = (row while is_whitespace row = getrow())

Folks coming from Python and other languages should understand the following behaviors of CoffeeScript for nested comprehensions:

  1. Nested comprehensions are not automatically flattened.
  2. The index of the comprehension nested deepest varies fastest.

Closure Wrappers

When using a JavaScript loop to generate functions, it's common to insert a closure wrapper in order to ensure that loop variables are closed over, and all the generated functions don't just share the final values. CoffeeScript provides the do keyword, which immediately invokes a passed function, forwarding any arguments.

for filename in list
  do (filename) ->
    fs.readFile filename, (err, contents) ->
      compile filename, contents.toString()
showell commented 12 years ago

@geraldalewis @jashkenas @TrevorBurnham @michaelficarra

See my prior comment on this thread for a proposed rewrite of the "Loops and Comprehensions" section of the docs.

First, I introduce headings to call out the different forms that can be used:

  1. Imperative Loops
  2. Postfix Loops
  3. Array Expressions
  4. Comprehensions
  5. Closure Wrappers

Also, I try to lead with the most common form of loops, which are the imperative loops. I'm pretty sure this will put newbies on familiar ground right away, and I think that even advanced programmers will still want to learn the imperative form first. By starting with simple imperative loops, I am able to introduce for/of much earlier than the current docs do.

GeoffreyBooth commented 7 years ago

The current “Loops and Comprehensions” section of the docs is almost exactly @showell’s comment above, minus the sub-headings which I feel make things more intimidating than they need to be. Closing as this has already been done.