New-and-improved range, slice, splice and loop syntax

TrevorBurnham commented 13 years ago

The story so far

As of the latest release, 0.9.4, CoffeeScript used a Ruby-inspired syntax to describe ranges: [x..y] meant "everything from x to y, including y"; [x...y] meant "everything from x to y, excluding y."

This range syntax had several uses:

Range literals: arr = [1..5]; console.log arr # [1, 2, 3, 4, 5]
Loops: console.log 'For sparta!' for i in [1..300]
Slices: console.log ['a', 'b', 'c', 'd'][0...-1] # ['a', 'b', 'c']
Splices: arr = [5, 8, 15, 18, 22, 42]; arr[3..4] = [16, 23]; console.log arr # [5, 8, 15, 16, 23, 42]

By default, these ranges were stepped by 1 (-1 if y < x) starting from x; in a loop, other step values could be specified using by, as in for i in [0...1] by 0.01.

However, these syntaxes had some notable flaws, and were removed from master after a brief discussion (issue 746). A new syntax for loops was added to master, for i from x to y (equivalent to the former for i in [x..y]). Broad discussion of the changes took place at (issue 803). Jeremy initially announced at issue 830 that the old syntaxes would be brought back, but then had a change of heart.

This issue is to discuss a concrete set of proposals for reintroducing the old syntaxes, with several improvements to make them more versatile and intuitive.

Range literals

As of 0.9.4, these two statements generated identical JavaScript (save for the use of the variable i):

arr1 = [0..25]
arr2 = i for i in [0..25]

Range literals generated from integers <= 20 apart were automatically expanded into simple arrays rather than list comprehensions:

  [1..5] # compiled to [1, 2, 3, 4, 5]

Flaws

Each range literal (other than the simple, short ranges described above) expanded into its own for loop, making them take up a large amount of space relative to defining your own makeArray x, y function and reusing it. The resulting code also lacked readability.

Also, range literals lacked the versatility of the loop syntax, in that they did not support the by keyword.

Proposal

Restore the old syntax, but add support for by, and introduce two new helper functions (__range and __range_i, where the i is for "inclusive") rather than expanding each range literal into a list comprehension (helpers are slightly stylized here for readability):

__range = function(start, end, step) {
  up = end > start;
  step = step || up ? 1 : -1;
  result = [];
  for (i = start; up ? i < end : i > start; i += step) {
    result.push(i);
  }
  return result;
}
// __range_i is identical except for using `i <= end : i >= start` rather than `i < end : i > start`

Simple, short ranges would still compile directly to arrays. Others would compile as follows:

[x...y]      # __range(x, y)
[x...y] by s # __range(x, y, s)
[x..y]       # __range_i(x, y)
[x..y] by s  # __range_i(x, y, s)

Coders would benefit from a succinct, intuitive syntax that would compile to more efficient and readable JavaScript than a list comprehension.

Loops

The merits of the old vs. new loop syntax have been debated quite a bit. Here are several examples of the two:

for i in [0..25]
for i from 0 to 25

for i in [0.1 .. 1.2] by 0.1
for i from 0.1 to 1.2 by 0.1

for i in [x...y] by 0.5
for i from x til y by 0.5  # proposed, not yet on master

Flaws

The square brackets add unnecessary symbology, and misleadingly suggest that an array is being constructed. The dots may reduce readability when numbers have decimal places.

Proposal

Restore the old syntax unchanged. While I can't argue that it's perfect, there are several advantages over the new syntax: It's more succinct, frees up 3-4 keywords, and more clearly distinguishes inclusive and exclusive ranges than to and til.

It's also consistent with the range literal syntax described above. Indeed, given the presence of range literals, the loop syntax is just an optimization, compiling to a C-style loop rather than sacrificing the time and memory that would be required to run arr = [x..y]; for i in arr .... This makes the syntax exceptionally easy to learn.

Slices

As of 0.9.4, CoffeeScript had a thin sugar for the slice method that JavaScript strings and arrays come equipped with:

o[x...y] is o.slice(x, y)
o[x..y] is o.slice(x, y + 1)

Open ranges were also supported (with no exclusivity at the end of the array):

o[x...] is o.slice(x)
o[x..] is o.slice(x)
o[...y] is o.slice(0, y)
o[..y] is o.slice(0, y + 1)

Flaws

The sugar was popular, but one complaint came up again and again: When y was -1 in an inclusive slice, the result would be o.slice(x, y + 1) = o.slice(x, 0), rather than the desired o.slice(x).

Also, it was pointed out that while the slice syntax supported negative indices, property access (e.g. x[-1]) does not.

Proposal

Restore the old syntax, but add a helper for inclusive slices in order to avoid the -1 problem:

__slice_i = function(o, start, end) {
  end === -1 ? return o.slice(start) : return o.slice(start, end + 1)
}

Also, allow exclusivity for the last element in an open range. That way, ... consistently generates an array one element shorter than ... Compilation would be as follows:

o[x...y]   # o.slice(x, y)
o[x..y]    # __slice_i(o, x, y)
o[x...]    # o.slice(x, -1)
o[x..]     # o.slice(x)
o[...y]    # o.slice(0, y)
o[..y]     # __slice_i(o, 0, y)

The negative index issue is an unfortunate inconsistency, but I don't see it outweighing the succinctness and readability of o[x...y] over o.slice(x, y).

The helper function is equivalent to the one proposed by nathany at issue 831, and has been proposed several times before.

Splices

0.9.4 also had a splice syntax, for when an array slice was the target of an assignment:

o[x...y] = ins  # o.splice(x, y - x, ins...)
o[x..y] = ins   # o.splice(x, y - x + 1, ins...)
o[x...] = ins   # o.splice(x, o.length, ins...)
o[x..] = ins    # o.splice(x, o.length, ins...)
o[...y] = ins   # o.splice(0, y, ins...)
o[..y] = ins    # o.splice(0, y + 1, ins...)

Flaws

The splice syntax had all of the same flaws as the slice syntax, plus it lacked support for negative indices (bizarrely, JavaScript's slice function supports them, while its splice function doesn't).

Also, the syntax might lead some to believe that strings can be spliced, when JavaScript strings are in fact immutable.

Proposal

Use helpers, thereby adding negative index support and making the syntax as consistent as possible with the slice syntax:

__splice = function(o, start, end, ins) {
  if (start < 0) { start += o.length; }
  if (end < 0) { end += o.length; }
  splice.apply(o, [start, end - start].concat(ins));
  return o;
}
// `__splice_i` is identical except for using `end - start + 1` rather than `end - start`

Compilation would be as follows:

o[x...y] = ins  # __splice(o, x, y, ins)
o[x..y] = ins   # __splice_i(o, x, y, ins)
o[x...] = ins   # __splice(o, x, o.length, ins)
o[x..] = ins    # __splice_i(o, x, o.length, ins)
o[...y] = ins   # __splice(0, y, ins)
o[..y] = ins    # __splice_i(0, y + 1, ins)

It's true that this might mislead people into thinking that strings can be modified. Of course, JavaScript's own property access syntax has this same flaw:

str = 'abc'
str[1]       # 'b'
str[1] = 'Z' # runs without error, returns 'Z'
str          # still 'abc'

At least trying to call splice on a string will give you an error, rather than failing silently.

Closing comments

All this talk of helpers may unnerve some, who ask why these functions can't be part of a standard library rather than part of the language proper, and wonder about the effects on CoffeeScript's generated code size and performance.

It must be noted that none of these helpers would appear in generated code unless the relevant functionality is used, just as the __extends helper doesn't appear unless you use CoffeeScript class inheritance. Range literals and the slice/splice syntaxes may best be avoided by the optimization-minded. But for most coders, most of the time, these are immensely convenient functions. Making them a part of the language allows for a much-improved, more approachable syntax.

These syntaxes have long been some of the most-celebrated parts of CoffeeScript. It's my hope that they'll be a part of 1.0, so that CoffeeScript will continue to be known as one of the most intuitive, succinct, and all-around awesome languages the world has ever known.

nathany commented 13 years ago

Wow, Trevor, awesome writeup. I really hope your efforts help keep slicing/splicing in the language.

If I'm to nitpick, there are two bits of syntax I find odd:

The Open Range

o[x..] just feels incomplete to me.

Instead, I would propose that o[x..$] is equivalent to o[x..-1], with the thought that negative indexing could be achieved in a manner similar to #827: o[$] or o[$-3].

Giving $ a special meaning (o.length - 1) inside ranges is an idea taken from D. I think it makes good sense, as it is commonly used in Regex expressions.

Step Up

for in [s...e] by step looks a bit odd to me, in that the range and by feel disconnected.

For reference, Ruby uses (0..20).step(5). Python just uses range(0,20,5) much like a library like Underscore.js would provide.

In any case, it does work, and there is certainly merit to having a consistent syntax throughout, especially the case for for expressions with a default +1/-1 step.

In Closing

Syntax is important, but I think the main importance is that the features provided are useful and well tested.

The helpers you've worked on for slicing/splicing will get it behaving in the expected way. I think that's the key.

nathany commented 13 years ago

Well Trevor, it sure looks like this isn't happening. If I were really that important to me, I'd be inclined to create a fork with you, but alas I'd rather just wait and see how things play out once 0.9.9 is released. Maybe there will be a backlash that has never been seen since the new Gap logo. :-P

TrevorBurnham commented 13 years ago

@nathany o[x..$] doesn't work because, for one thing, $ is a valid variable name in JavaScript... and [0..20].step(5) is ambiguous because you could define a step function on the Array prototype. I agree that the separate by is a little awkward, but I haven't seen a better alternative yet.

And yes, I think a backlash is likely if slice/splice, etc. are removed. I'm hoping to avoid that by getting the community to reach a near-consensus before that happens.

gfodor commented 13 years ago

Great writeup. My vote is for the range syntax to stay the way it is as proposed here. This patch resolves most of the criticisms I find valid with the current syntax. For the record, these are two examples of criticisms I find weak:

The for loop syntax implies an array is being constructed, so it should be changed.
Having helpers be generated is undesirable.

The first doesn't make sense to me because semantically the code is the same as if the array was being constructed. It's a optimization by the compiler that no array is materialized, as it would be in any other language. (There are no side effects of array construction in Javascript, thankfully, so this makes it safe to perform this optimization.)

The second doesn't make sense to me because CoffeeScript is an abstraction and it should matter little to the end user what particular technique it takes to generate code. The strongest arguments regarding code generation are primarily performance ones and secondarily debugging ones -- the helper concern here seems too fuzzy to claim either one of these as a real reason other than some inherent notion of "purity."

michaelficarra commented 13 years ago

I'd have to agree with everything in this proposal except for the proposed differences between exclusive and inclusive open (to the right) ranges. But I could live with that, and that's a discussion for another time. Your summary at the end puts it perfectly:

Range literals and the slice/splice syntaxes may best be avoided by the optimization-minded. But for most coders, most of the time, these are immensely convenient functions. Making them a part of the language allows for a much-improved, more approachable syntax.

I'm all for bringing back each of these features. These are very important language features to me personally, and it would be a shame if CoffeeScript went 1.0 without them. If they get in soon, they'll have almost 2 months for hardening. Worst case scenario, they are poorly implemented or buggy and omitted for 1.0.

phook commented 13 years ago

Being something of a super generalist I find it odd that "..." exludes the last, but no such option is available for the first. So what about: [4..10] is inclusive, [.4..10.] is exclusive i.e. [3..9] [2..10:2] is [2,4,6,8,10] [.2..10.:2] is [3,5,7,9]

and why not create a helper function so strings CAN be modified?

Well, just my thoughts on how I'm planning to introduce them in my language.

michaelficarra commented 13 years ago

@phook: a major problem with your suggestion is that both .4 and 10. are valid number literals in javascript.

hen-x commented 13 years ago

and why not create a helper function so strings CAN be modified?

Strings don't share the same reference semantics as arrays, so they can't be modified in-place. At best, you can construct a new string value and then assign it to an existing variable. Even with a helper function, the copying-by-value would cause inconsistencies like this to arise:

a = "abcd"
b = a
b[1..2] = "xy"
b is "axyd" # true
a is "axyd" # FALSE

phook commented 13 years ago

@michaelficarra: I dont think floating point values in a range is an issue and it would be quite easy to specify a BNF that enforced integer values and identifier only

@sethausus: Interestingly enough I dont think your example is inconsistent at all. I can see the argument that string are by value, and arrays are by reference - but then the correct solution would be to change the string behavior to be by ref - its hardly more correct that nothing happens when you try to modify a string.

I guess nathany has a point about CoffeeScript just being lightweight sugar coating..

StanAngeloff commented 13 years ago

In 0.9.4 it was possible to use exclusive ranges to have something like this:

x = y = 10
alert i for i in [x...y]

This basically allows you to start a loop without bothering to check if the start and end are the same as you know if they are, nothing will happen.

You may assume adding - 1 will do the trick:

x = y = 10
alert i for i from x to y - 1

but that makes it even worse since this is a safe loop and you'll get 10, 9. Adding + 1 is no good either.

So the new syntax makes it absolutely impossible to skip over a loop without explicitly saying so. You need an explicit line if x isnt y at the top. You can't have nice short for loops as you used to.

I am using exclusive ranges a lot in code generation and trying to see how things will work with master scares me. A lot of the generation happens in herestrings and if I were to try and switch over to master all of this will have to get assigned to variables:

# before, 1 line for code generation; emphasis on being short and concise
"""
/* #{ query.columns[index] } #{ (' ' for i in [0...longest - query.columns[index].length]).join '' }= */
[..snip..]
"""

# after, too much logic; 4 lines and extra -1
indent = []
if longest isnt (length = query.columns[index].length] - 1)
    indent = (' ' for i from 0 to longest - length)
"""
/* #{ query.columns[index] } #{ indent.join '' }= */
[..snip..]
"""

Makes me sad.

jashkenas commented 13 years ago

Stan -- that's a really good point.

StanAngeloff commented 13 years ago

How about tweaking the default behaviour to always be exclusive and using all (as in for all .. of) to get this:

# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list = (i for i from 0 to 10)

# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
list = (i for all i from 0 to 10)

michaelficarra commented 13 years ago

After Stan's post above, I just have to reiterate: for me, personally, a language would be very lacking without range literals or slicing/splicing by ranges. This was reinforced when using Ruby and its various range syntaxes for the first time in a few months yesterday. It's how I would naturally write these constructs (say, if I was writing pseudocode). Plus, there's no doubt it's more {scan,read}able.

Not having range literals wouldn't make me completely stop using coffeescript (as you can see, it hasn't), but it does make me less proud of the language and less willing to show it off to everyone I know as a solution to everything.

But this is purely subjective, take it with a grain of salt. I just felt I would regret not voicing these opinions.

gfodor commented 13 years ago

+1 from me too on keeping range literals. I think this is a case where the noble goal of adding readable English syntax does far more harm than good.

michaelficarra commented 13 years ago

How about tweaking the default behaviour to always be exclusive and using all (as in for all .. of) to get this

@stan: I like it in spirit, but I think the semantics become a little less obvious. Coming to this language, I'd bet I would think that the all was just optional. They feel like they mean the same thing to me. Also, I think this may be a separate topic and, if you feel it is, should be discussed elsewhere.

karl commented 13 years ago

+1 on keeping range literals from me too.

jashkenas commented 13 years ago

Thanks for casting all the votes in this ticket -- the ayes have it. We'll bring back the old range syntax before pushing out the next release. I'll edit my position in the 1.0 ticket. Leaving this open until the patches land back on master.

jashkenas commented 13 years ago

whew: After a tussle with the compiler, range literals are now back on master, in their 0.9.4 form. The same tests still pass. That said, I haven't implemented any of the changes that Trevor proposes at the top of the ticket. Those tweaks in behavior probably aren't worth defining helper functions for -- remember that helper functions violate the golden rule of CoffeeScript, and we have to be really certain that an exception to the rule is justified.

To go over them again:

x = [1..10] syntax can't use by. I don't think this is a big deal -- a range literal is a range literal, period. by is a qualifier on a loop.
Use a __range helper for x = [1..10] ... this is a code-size optimization that we can pursue later, but again, I'm not sure that range-into-array usage is common enough to justify a helper.
Adding a special-case __slice helper, to treat -1 values specially, breaks the semantics of slice, ditto for special treatment for negative indices with splice.

nathany commented 13 years ago

By "breaks the semantics" do you mean "works as expected"?! (esp. when coming from the language who's syntax it borrows). Using an inclusive range to -1 should definitely not return an empty array/string.

michaelficarra commented 13 years ago

I agree with nathany here. This has nothing to do with using a helper. Imagine the helper's code was inlined wherever we used slices. We would still want slices to have the suggested semantics, no? As a separate issue, we can reduce that generated inline code by making a (__slice) helper.

TrevorBurnham commented 13 years ago

In the case of ranges, the compelling argument for the helper is output readability. It's always nice to be able to jump between CoffeeScript and the corresponding JavaScript, and if you write

x = [a..b]

then the output

x = __range_i(a, b);

is much more human-friendly than the current

x = (function() {
  _result = [];
  for (var _i = a; a <= b ? _i <= b : _i >= b; a <= b ? _i += 1 : _i -= 1){ _result.push(_i); }
  return _result;
}).call(this);

That it minifies much more nicely (given multiple uses) is icing on the cake.

In the case of slice/splice, there are two compelling reasons. One is, of course, the a..-1 problem; as you know, several issues have been independently raised about it throughout CoffeeScript's history. The other reason is to make the syntax internally consistent. For instance, this is awfully surprising:

arr = [1, 2, 3, 4]
arr[0..-2]              # [1, 2, 3]
arr[0..-2] = [5, 6]
arr[0..-2]              # [5, 6, 1, 2, 3, 4] ?!

As michaelficarra points out, we could inline the code needed to work around these pitfalls—but as with ranges, the result would be output that's both tough to read and tough to minify.

nathany commented 13 years ago

While I'm more concerned with inclusive slices and negative number splices working properly, I agree with both Trevor on the readability and michael that it is a separate issue from this one.

So ignoring the mechanics of helper functions/libraries for a minute, is generated code, inline or otherwise, but clearly not 1 for 1, against the "Golden Rule" of CoffeeScript?

icokk commented 13 years ago

I also like the old syntax [start .. end] for ranges but I believe that the automatic "count up if start < end else count down" is both inefficient and usually undesirable, especially when start and end are variables (in my experience, if you have variable end and it happens to be smaller than start, you always want that to denote an empty range).

I think the proper solution is:

keep the [start..end] and [start...end] syntax, but add the optional step parameter in the form [start..end:step]
count up if step is positive or omitted and down only if step is negative - programmers typically know whether they want to count up or down, so they should state that explicitly.

This also significantly simplifies the for loop, for instance: for i in [a..b] always compiles as for ( i = a; i <= b; i++ ) and for i in [a..b:-1] compiles as for ( i = a; i >= b; i += -1 ) The only time when checks are necessary is when step is a variable, e.g. for i in [a..b:step] compiles as for ( i = a; step >= 0 ? i <= b : i >= b; i += step )

As for the slices vs. splices with negative indices, I believe that default Javascript negative index handling should be avoided and the arr[start...*] syntax from issue #827 be used instead (consistently in slice and splice). A helper function could wrap slice that returns [] when the upper bound is negative.

satyr commented 13 years ago

@icokk: That's exatcly what it was with for-from (#746).

icokk commented 13 years ago

@satyr: Yes, but it uses the old syntax and is not only usable in for loops but also for slices, splices and standalone (to generate lists). I think that very little existing (< 9.4) code would break because of this change.

michaelficarra commented 13 years ago

I agree with everything icokk has stated. This would make coffeescript range semantics much closer to Ruby's.

TrevorBurnham commented 13 years ago

I agree that [a..b:step] is preferable to [a..b] by step, though I'd like it even better if there were no ambiguity with the object syntax (since, anywhere else, b:step would mean {b:step}). What would you think of using a double-colon instead: [a..b::step]?

I'm indifferent on the question of whether [3..1] should give you [3, 2, 1] or []; I'd assumed that the CoffeeScript community favored the former, but consistency with Ruby is nice as well.

robofish commented 13 years ago

The Haskel language uses this syntax for steps: [2,4..8] evaluates to [2,4,6,8] [1,3..10] evaluates to [1,3,5,7,9] [0,10..50] evaluates to [0,10,20,30,40,50] thus the step interval is not written down explicitly, but I like the syntax very much :)

michaelficarra commented 13 years ago

@TrevorBurnham: I'm not a big fan of the : step syntax, but unfortunately that proposed syntax would also be ambiguous with b.prototype.step or b.prototype[2].

@robofish: how does that work for variable step or variable range start points? If I wanted a range from start to 10 by 2, would it look like this: [start,(start+2)..10]? Because that would be both gross and misleading.

I'm in favor of [start..end by step] for range step syntax.

gfodor commented 13 years ago

I don't have anything to add here other than subjective opinions, but I did want to send a big thank you to Jeremy for putting this back in. It's the right move and will maintain the uber-awesome factor that CoffeeScript has over vanilla JS.

robofish commented 13 years ago

@michaelficarra: You are right, that does look gross and is also redundant. I've somehow never come across a situation like this in Haskell. [Still thinking about a elegant way to do this in Haskell ... ]

Sorry to interrupt the discussion ... :)

nathany commented 13 years ago

@robofish That is a very nice syntax, with the exception of what @michaelficarra pointed out. In Haskell, does [1..10] assume a step of one, such that the next value is 2?

robofish commented 13 years ago

Yes, by default it useses a step of one, thus the next value will be 2 in your example.

TrevorBurnham commented 13 years ago

@michaelficarra Ah, quite right—while [a..b:step] looks a bit ambiguous, [a..b::step] actually would be ambiguous (as would my first thought, [a..b | step]). A comma would be unambiguous, but I wouldn't like for x = [a...b, step] to have such a different meaning than [a..., step] = x. by reads nicely, but I don't like special-case keywords (since by is unreserved, by step would mean by(step) in any other context).

Another option would be the as-yet-unused \, which kind of makes sense—in many languages, \ is used to express something division-related yet distinct from ordinary / division. In this case, [a..b \ step] would mean "divide the range a...b into parts of size step."

michaelficarra commented 13 years ago

@TrevorBurnham: not bad, I like it. My only concern is that the only place I've seen \ before, other than escapes of course, is in Haskell's lambda syntax (apparently \ looks sort of like λ to some people), so it's a little work to not think about it that way, but it should really be no problem for most people. It's good, though, that it doesn't share any other syntax similarities nor semantic ones. I also see how it represents a division-like operation. I'd definitely support that syntax.

I'd also still be okay with using by as long as we can deal with adding yet another reserved word to the language. I'm not too worried about reserving by because it should never be used as a function name outside of DSLs. That said, coffeescript is great for making DSLs....

satyr commented 13 years ago

@michaelficarra: by is reserved already.

$ coffee -e 'by = 1'
Error: Parse error on line 1: Unexpected 'BY'

michaelficarra commented 13 years ago

@satyr: Oh, perfect. Well then, I'm definitely on the side of the by syntax, though have no problems with TrevorBurnham's proposed \ syntax.

TrevorBurnham commented 13 years ago

Ah, now I see: It looks like

foo() for value in arr by 2

is supported, not just the better-known

foo() for value in [a..b] by 2

Given that, I suppose it makes sense to keep the word. Which means the issue comes down to the current

[a..b] by 2

vs.

[a..b by 2]

Of the two, I actually prefer the current syntax, and I'd like to see it extended to all ranges rather than just the for context:

arr = [a..b] by 2

Granted, it does feel a little inconsistent that you can't write arr1 = [a..b]; arr2 = [a..b] by 2, but you can think of [a..b] as being an abstract "range" until converted to an array by the process of assignment.

jashkenas commented 13 years ago

If what this ticket comes down to at this point is the ability to write:

arr = [a..b] by 2

.. then I think we can safely leave it out. Think of the by 2 as a modification of the looping process over an abstract [a..b] list ... not a change to the contents of the [a..b] list itself.

michaelficarra commented 13 years ago

Well, there was still discussion about defining a syntax for stepped ranges. [a..b by step] is probably the most supported suggestion at this point.

Also, I believe we never came to an agreement about adding __slice and __splice helpers. I'm pretty strongly in favor of adding these to correct the problems Trevor originally mentioned in the related "cons" sections of his proposal.

akidee commented 13 years ago

It's important not to change too much or remove features (bases on 0.9.4) that are simply useful. What I experienced is that array[1..-2] does not work anymore. But it should. +1 from me

jashkenas / coffeescript