Proposal: Multi-object `for` loops

ghost commented 3 years ago

Say I have some large arrays of the same size, and I want to perform some element-wise operation on them. In status quo, that might look like this:

const doSomething = fn (c: *[1024]u32, a: [1024]u32, b: [1024]u32) void {
    for (c) |*ec, i| {
        const ea = a[i];
        const eb = b[i];
        // Long and complicated sequence
    }
};

What we really want to do is capture each element of a, b and c at the same index, however the best we can do is capture the index itself and use it to index into a and b -- a bit cleaner syntactically than straight C-style for loops maybe, but with many of the same problems: we need the index directly, we're not sure that all the bounds line up/aren't overrun, the compiler may have a hard time optimising the code (it needs to be sure that indices are used directly and which arrays are/are not mutated, and it will most likely insert bounds checks at every access in what may be a performance-critical inner loop), and crucially: intent is not communicated.

This could be made much cleaner and clearer, to both the author and the compiler, if we were allowed to do it like this:

const doSomething = fn (c: *[1024]u32, a: [1024]u32, b: [1024]u32) void {
    for (c, a, b) |*ec, ea, eb| {
        // Long and complicated sequence
    }
};

All arrays must be the same length -- this will be checked by the compiler if lengths are comptime-known, and differing runtime-known lengths will be checked illegal behaviour. If we use the value of the index, we may still capture it, although for clarity this is done with a semicolon rather than a comma: |*ec, ea, eb; i|. (We also make this change to single-object for loops.) (Upon reflection, I don’t think index capture is necessary, or at least either useful or optimiser-friendly enough to justify inclusion. The continue expression currently used in while loops is sufficient for this purpose and maintains language consistency.)

ghost commented 2 years ago

No dice, same issue. with (var i: usize = 0) is strictly more typing than plain var i: usize = 0, so the laziest solution is the same and has the same problem.

You seem to be seeking symmetry over practicality.

ominitay commented 2 years ago

Maybe I am, without knowing? I don't think I'm really so bothered anymore; I'm not happy about this, but I don't really care if it does get accepted.

Mouvedia commented 2 years ago

You seem to be seeking symmetry over practicality.

I would say symmetry helps discoverability.

ominitay commented 2 years ago

Yeahh, I'd argue symmetry here would be beneficial, and that the solution is practical, but evidently this is a fairly controversial topic lol

topolarity commented 2 years ago

If we're desperate for symmetry, I'd prefer the ability to have a combined for-while loop:

for (0..) |i| while (it.next()) |item| {
    // ...
}

The loop would exit (a) when the while condition fails or (b) when the elements of the for are exhausted.

In general, there'd be one combined loop syntax:

for (...) |...| while (...) |...| : (continue_expr) {
   // ...
} else {
   // ...
}

To determine the element type of a pseudo-arrays like 0.., we could:

default to usize if a more precise type cannot be inferred from the bounds' types, OR
require explicit type ascriptions in the capture, such as for (0..) |i: usize|

InKryption commented 2 years ago

@topolarity

If we're desperate for symmetry, I'd prefer the ability to have a combined for-while loop:
for (0..) |i| while (it.next()) |item| {
    // ...
}
The loop would exit (a) when the while condition fails or (b) when the elements of the for are exhausted.

In general, there'd be one combined loop syntax:
for (...) |...| while (...) |...| : (continue_expr) {
   // ...
} else {
   // ...
}
To determine the element type of a pseudo-arrays like 0.., we could:
* default to `usize` if a more precise type cannot be inferred from the bounds' types, OR

* require explicit type ascriptions in the capture, such as `for (0..) |i: usize|`

That looks a bit too ambiguous with a control flow statement that does something entirely different in status quo. My immediate assumption about what that would do is execute the while loop for every index in the for loop - which I would also immediately think is an error, since it's a non-terminating statement.

topolarity commented 2 years ago

That looks a bit too ambiguous with a control flow statement that does something entirely different in status quo. My immediate assumption about what that would do is execute the while loop for every index in the for loop - which I would also immediately think is an error, since it's a non-terminating statement.

Yeah, I share that concern. This is probably the wrong way to approach this.

In fact, if for iteration were supported on pseudoranges alone, this is basically just an enclosed if:

for (0..) |i| if (it.next()) |item| {
    // ...
} else break;

jibal commented 2 years ago

I think there's only an illusion of unification here--I'm not sure that's enough reason not to use the range notation but people should be aware of the inconsistency.

for (a..b) |i| ranges i from a thru b-1.

for (a..) |i| starts i at a and continues until the loop is broken, or is illegal.

for (x, a..b) |v, i| ranges i from a thru b-1 and ranges v over x[a..b], or ranges v over x[0..(b-a)] ... or is illegal.

for (x, a..) |v, i| ranges i from a thru a + x.len - 1 and ranges v over x[0..x.len], or ranges i from a thru x.len - 1 and ranges v over x[a..x.len], or is illegal.

for (x, 0..) |v, i| ranges i from 0 thru x.len - 1 and ranges v over x[0..x.len].

If you ban all the ambiguous or problematic forms, you're left with just

for (a..b) |i| and for (x, 0..) |v, i|

and thus '".." is the only thing in common, with the syntax and semantics otherwise disjoint. Again, I don't know that this is enough reason not to do it, but I don't think it's nearly as neato as it's been made out to be.

It seems to me that iterating over a set of parallel slices--the original proposal here--and iterating over a sequence of integers (or iterating n times) are quite different problems that have been artificially attached to each other here by a special construct, 0.. that superficially resembles a range but was introduced just to deal with the fact that expanding the current capture syntax to multiple slices is problematic. I think there are lots of solutions to the latter problem that are better than what is being proposed here. One that was mentioned is to use ; to separate the index capture from the slice capture (and if so, this should eventually become the only way to do it for single slices as well)--surely whatever parsing issues this raises can be overcome ... a search by the parser in error mode for a ; to terminate a statement could skip those bracketed by |. And if not, other characters could be used, e.g., |v @ i| and |xitem, yitem, zitem @ i| have a naturalness that makes clear which is the value and which is the index. (Again, this would mean migration to that form, eliminating |v, i| as an option. Frankly, I don't think |v, i| was ever a good idea because , suggests a sequence of like things. It is true though that 0.. makes them like things, sort of.) (And yes, I know that using @ means a change to the very stable tokenizer, but that's not enough reason not to do it.) [P.S. my brain finally kicked in and reminded me that @ is used to introduce builtin functions, so that's almost certainly a no-go. Oh well.]

FWIW, I think the worst of all possible worlds is to eliminate the index capture altogether and use a variable defined in the outer scope--this is already widely considered to be a problem with while that is bug-prone, ugly, verbose, and forces people to do weird things to avoid name clashes ... if my only choices were that or 0.., I would take the latter.

const-void commented 2 years ago

looks pretty good! if there was some way to expose the iterator, that would facilitate variable iteration. could we optionally name the for loop and have that token act as an iterator accessor?

//ingest a data stream 3 bytes at a time
for j(0..stream_len) |i| {
   const server_rv = stream[i] ++ stream[i+1] ++ stream[i+2];
   processResult(server_rv);
   j(3);  // or j += 3;
}

haze commented 2 years ago

I'm still on the team of using a variable in the outside scope because it makes the language simpler, but I won't lie and say the index capture syntax for for loops hasn't tripped me up before

jibal commented 2 years ago

Making the language simpler shouldn't be done at the expense of ergonomics ... that misses the point of making it simpler. (Which is why const foo = fn(...) replacing fn foo(...) is a really bad idea that will discourage using Zig.) There's considerably more cognitive load to both read and write code with a manually initialized and incremented variable that, in the presence of multiple loops, needs either to be given a distinct name or put into a scope block ... which is two different ways to do it. Which is why almost every other language has moved in the other direction. Zig has some great innovations that came from not slavishly sticking to what everyone else does, but sometimes there is accumulated wisdom driving such trends.

ominitay commented 2 years ago

@jibal Function definitions as expressions isn't to make the language simpler. It benefits actual usecases.

jibal commented 2 years ago

It's been explicitly stated that the reason for it is to make the language consistent. Since fn foo(...) is just syntactic sugar it doesn't affect use cases ... that syntax could be retained and still allow for the other.

ominitay commented 2 years ago

Yes it does hugely simplify many usecases; just because you don't know of them doesn't mean they don't exist...

jibal commented 2 years ago

Please don't be rude. I'm aware of the use cases and didn't say that they don't exist. As I said, disallowing some syntactic sugar obviously doesn't affect use cases since function definition expressions can be added to the language regardless.

This was just a parenthetical comment that doesn't warrant (further) debate. I won't respond again.

ominitay commented 2 years ago

It doesn't make sense to have two equal ways to do something in a language... either we have function definitions as expressions only, or we have the present syntax.

Also I wasn't being rude, please don't allege such things of me.

jibal commented 2 years ago

"Making the language simpler shouldn't be done at the expense of ergonomics"

"Function definitions as expressions isn't to make the language simpler. "

"It's been explicitly stated that the reason for it is to make the language consistent."

"It doesn't make sense to have two equal ways to do something in a language... either we have function definitions as expressions only, or we have the present syntax."

andrewrk commented 2 years ago

This proposal isn't really open for discussion anymore, it's settled at least in my mind of exactly what is going to come of it.

ziglang / zig

Proposal: Multi-object `for` loops #7257