ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.66k stars 2.47k forks source link

change switch range syntax to be more clear and perhaps also allow exclusive ranges #359

Closed andrewrk closed 3 years ago

andrewrk commented 7 years ago

Right now, .. is the only slice operator available, and it is exclusive. Meanwhile ... (1 extra dot) is the only switch range operator available, and it is inclusive.

I believe that the difference in exclusivity of each kind of expression is appropriate based on typical use cases, however the difference in syntax is subtle. It may be worth choosing more clear syntax to represent status quo, or perhaps adding the exclusive ability to switch statements.

Here's an example of a switch statement where exclusive ranges are better:

switch (rng.getRandomPercent()) {
    0...30 => std.debug.warn("Choice A\n"),
    30...70 => std.debug.warn("Choice B\n"),
    70...100 => std.debug.warn("Choice C\n"),
}

Right now this would give you an error because 30 and 70 are used twice. To fix it, the code would look like this:

switch (rng.getRandomPercent()) {
    0 ... 30 - 1 => std.debug.warn("Choice A\n"),
    30 ... 70 - 1 => std.debug.warn("Choice B\n"),
    70 ... 100 - 1 => std.debug.warn("Choice C\n"),
}

It's not so bad, especially considering the -1 happens at compile-time, but this is an example of where exclusive range is desired. Another example would be enum ranges. There is no reasonable way to do "enum value" minus 1. Another example would be if they were floats instead of integers. In this case -1 doesn't make sense and you absolutely need the exclusivity ability.

Here are some proposals:

If we have a for range syntax (See #358) then that should be taken into consideration as well.

raulgrell commented 7 years ago

Since the for over a range is under consideration, I just want to think out loud a bit.... using the two different range operators allowed in both places:

var array: [3]u8 {0, 1, 2 }
array[0..0] == []u8{}
array[0..1] == []u8{0}
array[0...0] == []u8{0}
array[0...1] == []u8{0,1}

// With chars, the range being exclusive can get weird
switch (c) {
    'a'...'b' => {}, // inclusive
    'c'..'f', 'g' => {}, // exclusive (no f)
    'f'..'g' => {}, // OK (no f above, no g here)
}

// Ints are ultimately the same, but easier to reason I guess 
switch (c) {
    1 .. 10 => {},
    10 .. 100 => {},
    100 .. 1000 => {},
}

// If you could do a switch on floats, inclusive would be weird
switch (f) {
    0.0 ... 1.0 => {}, // inclusive
    1.0 .. 2.0 => {}, // Not OK, 1.0 in two branches
    2.0 .. 3.0=> {}, // OK
}

If we're considering python style array slicing could we do negative indices:

var array: [5]u8 {0, 1, 2, 3, 4 }
array[0...] == []u8 {0, 1, 2, 3, 4 }
array[0..3] == []u8 {0, 1, 2 }
array[0...-1] == []u8 {0, 1, 2, 3, 4 }
array[0...-2] == []u8 {0, 1, 2, 3}
array[0..-1] == []u8 {0, 1, 2, 3 }

Could we iterate backwards?

array[-1...] == []u8 {4, 3, 2, 1, 0 }
array[5...0] == []u8 {4, 3, 2, 1, 0 }
array[5..0] == []u8 { 4, 3, 2, 1 }

And finally, could we specify a stride/step?

array[0... : 2] == []u8 {0, 1, 2, 3, 4}
array[0..3 : 2 ] == []u8 {0, 2 }
array[1..5 : 3 ] == []u8 {1,4}
array[0...5 : -1] == []u8 {4, 3, 2, 1, 0} // Would this be a better way to iterate backwards?

This only really makes sense if we're able to do the same thing for the for over a range:

var values = []u8 {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
var slice = values[0 ... 10 : 4];

for (0 ... 10 : 4 ) | x, i | {
    assert( x == slice[i] )
    printf("{}, {}, {}", x, i, slice[i])
} 
>> 0, 0, 0
>> 4, 1, 4
>> 8, 2, 8
andrewrk commented 7 years ago

I think having both .. and ... is reasonable, although I do want to encourage programmers to use the exclusive one for slices.

We couldn't do the backwards or stride, because a slice only creates a pointer and a length; it does not copy data around.

As for negative... it seems simpler to require a usize for the end argument of a slice and leave it to the user to figure out how to do indexes an offset from the length.

thejoshwolfe commented 7 years ago

Another problem with negative indexes is that if the compiler doesn't know at comptime if an index is positive or negative, it would have to emit a conditional branch, which sounds like a bad idea. If there was going to be a way to index backwards, it would need to be comptime unambiguous.

raulgrell commented 7 years ago

Yeah, the python style slicing is not appropriate. Doing step/direction in the for wouldn't require the copying around as it would just be a while loop with a counter, but the point here is to make the syntaxes consistent and it makes things less simple not more... So yeah, disregard the above...

The other only other thought on this subject I wanted to share is specifying a range not with start and end, but start and number of elements...

ie:

var array: [5]u8 {0, 1, 2, 3, 4 }

// Exclusive
array[0..2] == []u8{0,1}

// Inclusive
array[0...2] == []u8{0,1, 2}

// Range
array[0 : 0] == []u8{}
array[0 : 2] == []u8{0, 1}
array[2 : 2] == []u8{2, 3}

// Each in the for
for (0 .. 2 ) | x, i | { }  // 0, 1
for (0 ... 2) | x, i | { }  // 0, 1, 2 
for  (2 : 2)  | x, i | { }  // 2, 3 

// Each in the switch
switch (c) {
    'a'...'b' => {}, // inclusive
    'c'..'A' => {}, // exclusive (no A)
    'A':26 => {}, // Range - All capital letters
}
andrewrk commented 7 years ago

I think it's reasonable to want to have a start and a length rather than start and end. But I think there's value in the language having a single convention.

raulgrell commented 7 years ago

Yeah, this is purely syntactic sugar and completely unecessary.

Consider the following:

fn printRange(a, b) {
    for (a ... b) | x, i | { }  // Fails if a > b
    for  (arr[a ... b])  | x, i | { }  // Fails if a > b OR b > arr.len OR a > arr.len
}

fn printN(a, n) {
    for  (a : n)  | x, i | { }  // Can't fail
    for  (arr[a : n])  | x, i | { }  // Fails if n > arr.len OR a + n > arr.len
}
andrewrk commented 7 years ago

I see your point here, but I question this assertion:

for (a ... b) | x, i | { }  // Fails if a > b

I think this would simply iterate 0 times, the same way that this would:

var i: usize = 100;
while (i < 10; i += 1) {}

As for the other one:

for  (arr[a ... b])  | x, i | { }  // Fails if a > b OR b > arr.len OR a > arr.len

Because of the transitive property, we only have to compare a <= b and then b <= arr.len.

raulgrell commented 7 years ago

I think this would simply iterate 0 times, the same way that this would:

var i: usize = 100; while (i < 10; i += 1)

Good point, sounds reasonable. You'd need to check the second case anyway.

andrewrk commented 7 years ago

Now slicing syntax is .. instead of .... So the syntax is at least not misleading.

raulgrell commented 7 years ago

Looks good!

skyfex commented 6 years ago

It should be said that this syntax is the exact opposite of what Ruby does: https://ruby-doc.org/core-2.1.5/Range.html

Not that Ruby should dictate Zig, but it's very unfortunate.

But the syntax is very clear though. I think it's hard to see the difference clearly.

I had a related comment here: https://github.com/ziglang/zig/issues/358#issuecomment-408850822

bheads commented 6 years ago

I think having both .. and ... will lead to lots of bugs..

andrewrk commented 6 years ago

I updated the OP to clear up confusion.

Ruby's syntax is nuts. How could more dots mean less numbers in the range? The mnemonic is completely backwards!

ghost commented 6 years ago

If you visualise .. vs ... a .. b a ... b

In the .. case, the distance between both letters is smaller, thus b is in the range In the ... case, the distance is bigger, thus b is not in the range

Somehow this always made sense to me and I actually never mistyped but actually explaining it I reckon the other way around makes just as much, probably actually much more sense.

andrewrk commented 6 years ago

Huh, alright. That's as reasonable a mnemonic as any, so I'll take back my "nuts" comment :-) For the purposes of this proposal, I think it does make sense to avoid directly contradictory syntax with other popular languages, if possible.

thejoshwolfe commented 6 years ago

i don't think stride makes sense in a switch, and it definitely doesn't make sense with floats or enums in a switch. i think stride is really only meaningful in a looping context over in #358.

binary132 commented 6 years ago

Oh, right. Oops.

What about n to m?

thejoshwolfe commented 6 years ago

The more I think about it, the more this syntax is growing on me:

It's very clear what these mean without explanation, and we're not directly contradicting any other language's convention. (don't forget to consider Bash's {0..9} syntax too, which is inclusive.) I say we replace the ... in switch with ..<= and add ..< in switch.

But now the question is do we update the slicing syntax to match this? This feels a little weird to me:

The problem is that there's no < comparison going on in slicing (except for the safety check). Slicing is fundamentally an arithmetic operation, not a comparison. A switch case with a range is fundamentally a comparison and not arithmetic. So using this syntax for slicing doesn't make as much sense to me. And a[0..<] looks really stupid.

So I actually think we can leave slices the way they are. We're not explicitly saying whether slicing is inclusive or exclusive at either bounds, but come on. Everyone should know that upper bounds are exclusive for slices, just like everyone should know that indexes start at 0. Slicing is an arithmetic operation, and exclusive upper bounds is how you avoid doing +1/-1 nonsense.

thejoshwolfe commented 6 years ago

wait, if we want to support switching on floats, we need to support exclusive lower bound too. ok new proposal for switch range syntax (slice syntax still unaffected):

and i'm thinking that for grammar purposes, the .. is a separate token from the comparison operators, so you could put spaces in like a <= .. <= b.

BarabasGitHub commented 6 years ago

That looks really ugly. And I would also question the need for using floating point in switch cases. It's really finicky, because the floating point value is often not exactly what you typed because it's in binary format and not decimal. Better not go there imo.

skyfex commented 6 years ago

I like @thejoshwolfe's suggestion. The a .. b syntax could be generalized. It could be syntactic sugar for Range {.from=a, .to=b}, or some kind of special built-in tuple. But this doesn't make sense if switch uses the same syntax. Like he says, switching is doing a comparison operation, not actually iterating over a range. I think it makes a lot of sense to make those operators look more like comparison operators.

This also resolves the question of wether a .. b is inclusive or exclusive. Then a and b are just two numbers really, and it should be considered obvious that arr[a..b] is exclusive on b. If it's used elsewhere it should be made sure that it's obivous from context as well.

Switch on floats could be nice. I think only a <..< b is safe in that case. Equality on float is tricky. But if x <= b is allowed on floats then a <..<= b should be too.

Maybe it looks a bit ugly to some, and a few more characters to type, but it's easier to read unambiguously

daurnimator commented 5 years ago

Yesterday I different switch range usecase came up: I wanted to switch on type and have a case for i0...i63 and then a different one for i64...i65535

Rocknest commented 5 years ago

Yesterday I different switch range usecase came up: I wanted to switch on type and have a case for i0...i63 and then a different one for i64...i65535

@daurnimator you already can switch on size of integers, just like that:

switch (@typeInfo(arg).Int.bits) {
    0...63 => //
    64...65535 => //
}
Rocknest commented 5 years ago

I propose this syntax:

switch (c) {
    5 -> 10 => {}, // exclusive, another variant: a ~~ b
    'a' ->+ 'z' => {}, // inclusive, another variant: a ~~+ b
}

Proposal for range syntax

Srekel commented 4 years ago

A tiny suggestion: If it's decided that both .. and ... are allowed in some context, maybe it's better to have .. and .... (four dots).

I feel like the difference between two and three dots is small enough that there will be hard-to-find typo bugs, similar to the classic if (mybool);

Four dots would stand out clearly.

momumi commented 4 years ago

if we want to support switching on floats, we need to support exclusive lower bound too.

In python you can write 1 < x <= 20 which translates to (1 < x) and (x <= 20) (except x is only evaluated once). So you can write this:

if (0 <= x <= 10) {
    // ..
} else if (10 < x <= 20) {
    // ..
} else if (x > 20) {
    // ..
}

Which is very intuitive to understand. It's more flexible too because you can use it outside of switch statements as well. Also, in python you can chain more than one expression: ie 0 < x < y < 100 becomes (0 < x) and (x < y) and (y < 100). The expression a == b == c == d becomes (a == b) and (b == c) and (c == d) etc.

adontz commented 4 years ago

I think it is very important to remember, that range bounds may be constants defined somewhere else, so all this +1/-1 may just confuse and make code less readable.

Adding my 2 cents to @thejoshwolfe proposal.

switch (x) {
     0<= ... <  5 => {}, // 0, 1, 2, 3, 4
     5<= ... <=10 => {}, // 5, 6, 7, 8, 9, 10
    10<  ... < 15 => {}, // 11, 12, 13, 14
    14<  ... <=20 => {}, // 15, 16, 17, 18, 19, 20
}

maybe even

switch (getRandom(0, 20)) {
     0<= |x| <  5 => {}, // 0, 1, 2, 3, 4
     5<= |x| <=10 => {}, // 5, 6, 7, 8, 9, 10
    10<  |x| < 15 => {}, // 11, 12, 13, 14
    14<  |x| <=20 => {}, // 15, 16, 17, 18, 19, 20
}
gingerBill commented 4 years ago

In Odin, there were many options to go for iff I wanted to unify slicing operations and ranges. However, I decided not to unify them and keep them as different concepts because they are fundamentally different ones too. The act of slicing is different to indexing with a range, you can treat them as if they were the same, but they are actually different things conceptually.

array[lo:hi] // slicing syntax, [lo, hi)
case a ..< b:  // range syntax [a, b)
case a ..  b:  // range syntax [a, b]

If you wanted to unify these conceptions, these are the possible solutions:

a .. b
a ... b

a .. b   or a ... b
a ..= b

a ..< b
a .. b or a ... b

The first approach is the most confusing for two reasons, the things are not that distinct in their appearance and they can have the opposite meanings in different languages e.g. Ruby vs Rust.

For Odin I settled on the third approach because it's probably the clearest view in my opinion.

ManDeJan commented 4 years ago

I also wanted to give my 2 cents, I like how Raku handles this: https://docs.raku.org/type/Range adding a caret to either side of the .. that indicates that the point marked with it is excluded from the range.

switch (x) {
     0  ..^  5 => {}, // 0, 1, 2, 3, 4
     5  ..  10 => {}, // 5, 6, 7, 8, 9, 10
    10 ^..^ 15 => {}, // 11, 12, 13, 14
    14 ^..  20 => {}, // 15, 16, 17, 18, 19, 20
}
gingerBill commented 4 years ago

@ManDeJan In Nim, the caret is used to be shorthand to mean from the end.

This it the problem with choosing syntax. Every other language chooses it differently.

jakwings commented 4 years ago

My proposal?

elements[.[a, b)]
elements[.[a, b]]
elements[.(a, b]]
elements[.(a, b)]

switch (x) {
    .[ 0,  5) => {}, // 0, 1, 2, 3, 4
    .[ 5, 10] => {}, // 5, 6, 7, 8, 9, 10
    .(10, 15) => {}, // 11, 12, 13, 14
    .(14, 20] => {}, // 15, 16, 17, 18, 19, 20
}

my feeling now: (>_<) oh sissy me, why not just use .. and ...? off-by-one errors are not new and will never be old, you can also mistype a < b and get a <= b instead.

so my real proposal:

a .. b   // exclusive
a ... b  // inclusive

never mind those crazy ideas about float range, enum range and enum-indexed array/tuple...

Mouvedia commented 3 years ago

I am against using : because it's currently used for sentinel elements. That would be confusing.

Mouvedia commented 3 years ago

Can we have a reason as to why this has been closed?

thejoshwolfe commented 3 years ago

status quo:

the two symbols are different looking .. vs ..., so there's no problem with inconsistency. confusion will always be a subjective matter, but status quo is not horribly confusing at least.

closing this issue doesn't preclude the possibility of changes, but the wording of this issue's title and the lack of concrete proposal here mean that this discussion is not actionable. if a change is to be considered, it should be a separate issue with a concrete proposal.

jedisct1 commented 3 years ago

I quite agree that the status quo is fine. Having both .. and ... for slices may be more confusing than useful in actual applications. And it is certainly not required.

Using brackets for intervals would look neat, but once again:

.(10, 15) => {}, // 11, 12, 13, 14

I don't see any reason to ever use that over 11...14.

gonzus commented 3 years ago

I think it is actually more complicated to have to remember the rules .. is for slices and excludes upper bound and ... is for switches and includes the upper bound than to just have both of them operate consistently in all cases: .. excludes upper bound, ... includes upper bound.

It is unfortunate that there already exist inconsistencies in how other languages assign the semantics, but again, I would personally much rather remember which is which, and use any of them as I see fit, than remember only one is for slices and only one is for switches.

Summary: please take this as my personal opinion that this issue should be reopened. Cheers!

Rocknest commented 3 years ago

I have been appreciating zig's preference to keywords over operators (especially control flow keywords). Operators may be heavily overloaded with incompatible meanings in different programming languages, but keywords are less likely to cause confusion. So i have come up with this idea: x upto y for exclusive switch ranges and x uptoand y for inclusive switch ranges. I think the slice syntax is fine as it is.

switch (rng.getRandomPercent()) {
    0 upto 30 => std.debug.warn("Choice A"), // 0-29
    30 upto 70 => std.debug.warn("Choice B"), // 30-69
    70 upto 100 => std.debug.warn("Choice C"), // 70-99
}

switch (c) {
    'a' upto 'c' => {}, // a, b
    'c' uptoand 'e' => {}, // c, d, e
}