ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.5k stars 2.52k forks source link

Proposal: Multi-object `for` loops #7257

Closed ghost closed 1 year ago

ghost commented 3 years ago

Say I have some large arrays of the same size, and I want to perform some element-wise operation on them. In status quo, that might look like this:

const doSomething = fn (c: *[1024]u32, a: [1024]u32, b: [1024]u32) void {
    for (c) |*ec, i| {
        const ea = a[i];
        const eb = b[i];
        // Long and complicated sequence
    }
};

What we really want to do is capture each element of a, b and c at the same index, however the best we can do is capture the index itself and use it to index into a and b -- a bit cleaner syntactically than straight C-style for loops maybe, but with many of the same problems: we need the index directly, we're not sure that all the bounds line up/aren't overrun, the compiler may have a hard time optimising the code (it needs to be sure that indices are used directly and which arrays are/are not mutated, and it will most likely insert bounds checks at every access in what may be a performance-critical inner loop), and crucially: intent is not communicated.

This could be made much cleaner and clearer, to both the author and the compiler, if we were allowed to do it like this:

const doSomething = fn (c: *[1024]u32, a: [1024]u32, b: [1024]u32) void {
    for (c, a, b) |*ec, ea, eb| {
        // Long and complicated sequence
    }
};

All arrays must be the same length -- this will be checked by the compiler if lengths are comptime-known, and differing runtime-known lengths will be checked illegal behaviour. If we use the value of the index, we may still capture it, although for clarity this is done with a semicolon rather than a comma: |*ec, ea, eb; i|. (We also make this change to single-object for loops.) (Upon reflection, I don’t think index capture is necessary, or at least either useful or optimiser-friendly enough to justify inclusion. The continue expression currently used in while loops is sufficient for this purpose and maintains language consistency.)

SpexGuy commented 3 years ago

I really like this for a couple reasons:


I'm torn on the ;. On one hand, it increases readability with multiple captures and prevents errors. On the other hand, it makes the capture syntax asymmetric and specializes it to loops. Having written this out though, it sounds silly. The benefits clearly outweigh the drawbacks.

Mouvedia commented 3 years ago

Shouldn't we start with if first?

e.g.

const one: ?u32 = 1;
const two: ?u32 = 2;
if (one) |foo| if (two) |bar| { // I want one if here
  // …
};
ghost commented 3 years ago

@Mouvedia Multi-object if would just be syntactic sugar over status quo. This proposal has real implications for performance and encouraged programming style, as noted above by Spex.

There might be a case for multi-object while, as that cannot be done cleanly in status quo, but that would be a separate proposal.

Mouvedia commented 3 years ago

I see. I was just advocating for consistency of multi captures.

ghost commented 3 years ago

Taking @Mouvedia's idea a bit further, here's a list of potential multi-objectification candidates:

// looping over two iterators:
while (iter1(), iter2()) |val1, val2| {
    // body
}

// status quo:
var done = false;
while (!done) {
    if (iter1()) |val1| {
        if (iter2()) |val2| {
            // body
        } else {
            done = true;
        }
    } else {
        done = true;
    }
}

In short, multi-object variants should be seriously considered for if and while on optionals, even though the same would not work for error unions.

ghost commented 3 years ago

I can see the case for multi-object while in particular, but I'm not sure if that pattern is used all that often in actual code -- and it has a burden of utility to overcome, distinguishing as it does collections of optionals and single error unions. (What if I want to check an optional and an error union? How would that be represented in the else capture? If I can't do that, that seems to me like an obscure language rule, and I'll be frustrated.)

ghost commented 3 years ago

I can see the case for multi-object while in particular, but I'm not sure if that pattern is used all that often in actual code.

I'm not sure either. But I suspect that this pattern is not used more often, precisely because it is so painful. BTW, this not only applies to Zig. Even in languages that are more friendly towards iterators, native loops usually support only one iterator. If you need to loop over two or more in parallel, you either drop back to explicit control-flow and the raw iterator protocol, or, if you are lucky, the language supports some kind of "zip" mechanism via generics, macros or lambdas. If Zig broke the mold in this regard, and made looping over multiple iterators seem natural, maybe this pattern would rise in popularity.

What if I want to check an optional and an error union? How would that be represented in the else capture? If I can't do that, that seems to me like an obscure language rule, and I'll be frustrated.

As you say, error unions can't be handled. If you need to work with multiple sequences, one of which may return an error, you are back to the much more verbose status-quo solution. I admit that this discontinuity is aesthetically unpleasant. However, on balance, I feel that the special case of looping over sequences that follow a standard iterator protocol (i.e. optionals) is important enough to be worth-while. Just my 2¢.

thejoshwolfe commented 3 years ago

This proposal is very compelling because of the assertion that the objects are all the same len. That is often desirable, and offering syntactic sugar will encourage programmers to include that assertion where they otherwise might be too lazy to include it. If a programmer leaves off the assertion and simply indexes into all the slices/arrays during the loop, that implicitly asserts that the other objects must only be at least as long as the one being looped over, and the implicit assertion gets even weaker if there are any early exits from the loop, such as break or return. Since the length assertion included in this proposal communicates intent, this proposal is clearly an improvement for the use case when this proposal is applicable.

There is a question of how common this use case would be, and that kind of thing is difficult to quantify. In a meeting with @EleanorNB , @SpexGuy , and @andrewrk we were able to find more than one use case from real actual code from more than one author present in the meeting. Seems common enough. Let's do it.

The discussion of generalizing this idea to other syntactic constructs is interesting, but those ideas should be their own issue/proposal if they are going to be accepted. This proposal is just for for loops.

There is a potential footgun when refactoring a loop to take an additional object where the optional index capture will be reinterpreted as the new final object capture, such as from line 1 to line 2 below:

for (a) |x, i| {} // line 1
for (a, b) |x, i| {} // line 2 - probably a bug
for (a, b) |x, y| {} // line 3 - probably not a bug
for (a, b) |x, y, i| {} // line 4

This potential problem is not bad enough to prevent accepting this proposal. There can be other proposals to mitigate this potential problem.

thejoshwolfe commented 3 years ago

proposal for the syntax (ref https://github.com/ziglang/zig-spec/blob/master/grammar/grammar.y search for ForStatement ):

proposal for the semantics (i'm experimenting with a formal specification syntax using <angle brackets> to denote placeholders for syntactic constructs, and $name to denote variables that cannot be observed in the zig program.):

for a for loop with N expressions (where N is at least 1):

for (<expr[0]>, ... <expr[N-1]>) |<capture[0]>, ... <capture[N-1]>| <body>

This is syntactic sugar for:

{
    // evaluate all the exprs in order before doing any asserts.
    const $exprs = .{ <expr[0]>, ... <expr[N-1]> };
    // get and assert len before the loop starts.
    const $len = $exprs[0].len;
    comptime var $e = 1;
    inline while (e < $exprs.len) : (e += 1) {
        assert($len = $exprs[e].len);
    }
    // now onto the actual loop
    var $i: usize = 0;
    while ($i < $len) : ($i += 1) {
        <declare_capture(<capture[0]>, $exprs[0][i])>
        ...
        <declare_capture(<capture[N-1]>, $exprs[N-1][i])>
        <body>
    }
}

where <declare_capture(<capture>, $expr)> is a macro for:

That was a fun exercise in formal specification. Now that I've written it, i don't think i like this notation. :shrug:

Note that I'm attempting to propose status quo for the N=1 case.

Mouvedia commented 3 years ago

for (a, b) |x, y, i| {}

@thejoshwolfe I wish the separator for the index were different—e.g. ; or |—as @EleanorNB was proposing initially:

although for clarity this is done with a semicolon rather than a comma: |*ec, ea, eb; i|.

ghost commented 3 years ago

We actually spent a good half hour bikeshedding that in the meeting. The primary issue with ; is that the parser, when recovering from an error, may see it and interpret it as a statement end, thus interpreting the following incorrectly.

We didn't agree on a syntax; as Josh has stated above, the footgun inherent in status quo was deemed not significant enough to block the proposal at large. From memory, these were our ideas (I'm doubtless missing some -- there were a lot; see the minutes for more when they go up):

Mouvedia commented 3 years ago

Ill add for (a, b) | x, y | i | { … } to your list.

We actually spent a good half hour bikeshedding that in the meeting.

@SpexGuy I hope you will find a good separator because, now that we have a list, there's a back and forth (for the developer) needed to match the index—in particular if there are many captures. We have 2 groups of captures hence they should be clearly delimited IMHO.

daurnimator commented 3 years ago

I think about this as some sort of destructuring. How about if you could destructure tuples inside of |.....|?

e.g.

kristoff-it commented 3 years ago

Just to diversify the discussion a bit since all the proposals until now have been about symbols, here's an idea that goes in the opposite direction:

for (a, b) |x, y, index=i| {

}

It's annoying that it requires that many extra keystrokes, but it's explicit and easy to parse visually. Maybe index should be idx, maybe a syntax that is not inspired by named arguments is preferable, but IMO if we start having a variable number of arguments in for loops, then we should consider cranking up the explicitness for the index much more than what a single symbol provides.

Also I think all symbol proposals look bad except the semicolon one, but apparently that's also the one with other downsides.

haze commented 3 years ago

I disagree with any of these solutions. I believe it's best to stick with the status quo. It doesn't require any extra work from zig and works as intended. I'm very wary of adding features to reduce keystrokes. Remember people, you type this code up once, in specialized editors that make this take no more than a couple seconds, This feature is tiny enough that it doesn't warrant all of the work and discussion allotted to it.

ghost commented 3 years ago

It's not about keystrokes -- it's about preventing bugs. If you had an index before, but then introduce another iterand with integer elements, the compiler will not detect that and that's a bug. There's a strong case for distingushing index capture.

I'm not sure if ; would necessarily be significantly harder to parse; that was just the only concrete point raised against it in the meeting. IIRC [] also might have had similar issues. @SpexGuy would be the one to ask about that.

kristoff-it commented 3 years ago

It's not about keystrokes -- it's about preventing bugs. If you had an index before, but then introduce another iterand with integer elements, the compiler will not detect that and that's a bug. There's a strong case for distingushing index capture.

Yes, that's pretty much what my comment was about.

haze commented 3 years ago

I'm not in favor of adding this functionality to for loop because it adds more functionality to something that's already simple enough. Give the for loop an indexable thing, and you will go over it's elements. add another parameter and you get the index as well. If anything, a special construct could be made to support SoA use cases without conflating the original for.

ghost commented 3 years ago

Dude. This was brougt to a design meeting. We spent a solid hour and a half discussing it. We are getting multi-object for. That's that. The only thing up in the air is index capture syntax.

haze commented 3 years ago

Okay, but I am also free to leave my opinion on the matter, even if it was also accepted...

canadaduane commented 3 years ago

I'm not in favor of adding this functionality to for loop because it adds more functionality to something that's already simple enough.

Perhaps a separate keyword? Taking inspiration from foreach, what about:

forzip (c, a, b) |*ec, ea, eb| {
    // Long and complicated sequence
}
canadaduane commented 3 years ago

All arrays must be the same length -- this will be checked by the compiler if lengths are comptime-known, and differing runtime-known lengths will be checked illegal behaviour.

What would happen if the runtime-known lengths differ? How would the error be handled?

haze commented 3 years ago

All arrays must be the same length -- this will be checked by the compiler if lengths are comptime-known, and differing runtime-known lengths will be checked illegal behaviour.

What would happen if the runtime-known lengths differ? How would the error be handled?

I’d imagine it would be a compiler error since the use case is if the objects all had the same length

SpexGuy commented 3 years ago

Yep. If the lengths are known at compile time and don't match, it's a compile error. Otherwise if they don't match at runtime it's a panic in safe modes and UB in unsafe modes. This allows the optimizer to pick only the cheapest length to compute, and delete the others as dead code.

On Mon, Jun 7, 2021, 5:21 AM haze @.***> wrote:

All arrays must be the same length -- this will be checked by the compiler if lengths are comptime-known, and differing runtime-known lengths will be checked illegal behaviour.

What would happen if the runtime-known lengths differ? How would the error be handled?

I’d imagine it would be a compiler error since the use case is if the objects all had the same length

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ziglang/zig/issues/7257#issuecomment-855804815, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARNCWQ3VYMB2GGFMC2CCXDTRSMRJANCNFSM4UGSU2AQ .

canadaduane commented 3 years ago

I've been thinking about this proposal in comparison to Haskell's zip and Rust's zip.

I like the idea of capturing the intent to iterate over two+ arrays that are the same length; however, it seems there are several alternative but adjacent intentions that could be valuable to capture, too. For example:

I'm too new to know how zig thinks about these things, but I raise these in case there is value in making the new syntax versatile enough to capture these.

ghost commented 3 years ago

Thinking about indices, do we even need them? Previously they were very ergonomic for SoA, but that’s unnecessary now. What do we lose by requiring a separate variable to be incremented, like in while iteration?

ghost commented 3 years ago

What do we lose by requiring a separate variable to be incremented, like in while iteration?

We lose a guaranteed-to-be-in-bounds index variable, which, for single-object loops at least, is as relevant as it was before this proposal.

ghost commented 3 years ago

It’s only guaranteed to be in bounds if it’s used directly with no modification, which if used as:

In short, any case where the guarantee would permit optimisation is covered directly by this proposal without indices, and I’m struggling to think of any case where an optimisation missed by not doing this would be inside a hot loop, so I have to wonder if it’s worth worrying about. (In debug mode a one-time extraneous check is acceptable, and in release modes it will be eliminated anyway.)

ghost commented 3 years ago

Makes sense. There is still a loss of convenience, but probably no more than with while loops.

Would the : (i += 1) syntax be supported instead?

ghost commented 3 years ago

I don’t see why not, but I also don’t see why. My gut feeling would be to allow it, but the higher-ups may not agree.

ghost commented 2 years ago

Here's another idea for how to deal with indices:

for (array1, array2, 0..) |*a, b, i| {
     a.* = b + i;
}

No need to remember whether the index goes first or last, or to define and manually increment a counter. As a side effect, this would also solve 80% of #358.

andrewrk commented 2 years ago

That is brilliant. I love it. It's now an accepted amendment to this proposal.

jumpnbrownweasel commented 2 years ago

for (array1, array2, 0..) |*a, b, i| {
     a.* = b + i;
}

Is the 0.. meant to be a range, effectively giving you a slice of all the arrays? Or is it just a fixed notation for index capture?

andrewrk commented 2 years ago

It makes a pseudo array which can then be captured like an array element. So this will be possible as well:

for (0..10) |i| {
    // ...
}

Which as @zzyxyzz noted is effectively a reversal on the rejection of #358. It would also allow something like this for consistency's sake:

for (0..10, 10..20) |a, b| {
    // ...
}
ominitay commented 2 years ago

I'm left fairly confused as to why this is being accepted, when while loops make for (0..10) |i| { redundant, amounting to breaking the Zig zen of having only one way to do things. Additionally, would this mean we're removing for (array) |a, i|, to be replaced with for (array, 0..) |a, i|? If not, then this will make syntax inconsistent. I'm not a fan of this at all.

This proposal would also imply that var array: [10]u8 = 0..10; would be valid, or else would be another inconsistency. Also, this can be done in userspace (see zig-range).

jumpnbrownweasel commented 2 years ago

At first I thought this was too much unnecessary sugar. But after thinking more I think it makes sense in that it fully embraces array and slice iteration, including their indexes, and enables optimizations. And at least there is still just one for loop.

The additional benefit of simple integer ranges (without arrays or slices) is also useful but very limited: no inclusive end points, no counting down, no step size, and the type is always usize. If nothing else I guess it makes the zig-range trick unnecessary without additional syntax, but I don't see that as very important.

@nektro brought up a good point on discord however, which is that for consistency the existing for syntax should change to require the 0.. when an index is used. It would be good to discuss that before finalizing this.

jumpnbrownweasel commented 2 years ago

The additional benefit of simple integer ranges (without arrays or slices) is also useful but very limited: no inclusive end points, no counting down, no step size, and the type is always usize. If nothing else I guess it makes the zig-range trick unnecessary without additional syntax, but I don't see that as very important.

On second thought, coming to grips with the zig-range trick is important. Since many people are using it and newcomers ask for it and latch onto it, having such a basic thing in a separate repo is pretty funky. I think it should either be part of the language/std library, or we should have an alternative like the one in this proposal.

ghost commented 2 years ago

Here's some more detailed reasoning behind this change.

In terms of language complexity:

In terms of safety / cognitive overhead:

for (0..length) |i| { // ... }


- `for (0..) |i|` is an easily detected compile error. You have to combine an open-ended range with a fixed-length object. Fixed ranges (`a..b`) can stand on their own.

In terms of convenience:
- As a reminder, the previous plan was to remove index capture from for loops entirely, because it would clash with multi-object capture. I think it was a reasonable tradeoff, but still a shame that the convenience of working with multi-object loops was to be improved at the expense of single-object loops.
- A lot of people have been asking for ranged iteration in #358. This does not provide a complete solution, but is still an improvement.
ominitay commented 2 years ago

I don't see why we can't do indices in for loops like we already do in while loops. It would make sense to me for continue expressions to work on both forms of loop.

var i: usize = 0;
for (array) |value| : (i += 1) {
    ...
}

This has several benefits over status quo:

ghost commented 2 years ago

@ominitay What you say is true, but Zig isn't as minimalist as it sometimes pretends to be. Taking the One Way to Do Things principle to its logical conclusion, we could eliminate for loops, loops with continuation expressions, switches, the continue statement and probably quite a few other things. All control flow can be reasonably described with while, if and break. But Zig isn't above some syntax sugar if it improves readability or prevents footguns in sufficiently common use cases. I feel this is one of those cases, but I suppose in the end it's a matter of taste.

ominitay commented 2 years ago

@zzyxyzz I agree with that, I guess it's just about which solution we deem more effective :) I personally don't like this syntax, when we could use the continuation syntax in a for loop, but I recognise a lot of people disagree with me lol

marler8997 commented 2 years ago

@ominitay I would otherwise agree with you here, but you left out an important word in the "zen point" you quoted:

Only one "obvious" way to do things.

This amendment from @zzyxyzz makes for (start..end) |i| the "obvious" way to do ranged iteration. Compared to while it's more specialized (doesn't handle all the cases while can) but also more convenient since you don't have to introduce a new scope and declare a new counter variable with an explicit type. This means when people can use for they will because it's more convenient which is what makes it the "obvious" way to do it.

P.S. Also the fact that you're skeptical of this puts you in good company since Andrew already rejected this before :) I think the fact that this feature removes the need for a "hacky" implicit array index variable for (arr) |e, i| is what puts it in the win category for me.

ominitay commented 2 years ago

@marler8997 Only one "obvious" way to do things.

My mistake. I think the point I was trying to make was more that having for loops support ranges is, imo, unnecessary while while loops can do the job just as well. It doesn't make much sense to me to introduce a less powerful tool to do a subset of what can already be done with another, well-suited, tool.

P.S. There are multiple possible alternatives to implicit index variables. I don't think that that's a unique advantage of introducing ranges.

cryptocode commented 2 years ago

also more convenient since you don't have to introduce a new scope and declare a new counter variable with an explicit type.

And also, this goes beyond convenience: people are sloppy about introducing new scopes just for counter variables (or forget during refactoring etc) and thus introduce a potential for bugs by reusing counter variables by mistake. Another plus for the for-range proposal.

ominitay commented 2 years ago

And also, this goes beyond convenience: people are sloppy about introducing new scopes just for counter variables

This is only a partial solution to this issue though, as it doesn't address it at all for while loops. See #8019, which is (imo) a better solution to this.

cryptocode commented 2 years ago

And also, this goes beyond convenience: people are sloppy about introducing new scopes just for counter variables

This is only a partial solution to this issue though, as it doesn't address it at all for while loops. See #8019, which is (imo) a better solution to this.

The point stands, it solves the scoping issue for simple counted loops (which are very common) while also solving the other problems initiating the proposal. Especially important if 8019 doesn't get accepted (which seems likely imo, it's more natural/less annoying to introduce extra scopes for more complex cases) Ranged-for seems like a very pragmatic middle ground.

ghost commented 2 years ago

This is only a partial solution to this issue though, as it doesn't address it at all for while loops.

Your proposed solution doesn’t address it for for loops either. Surely an “incomplete” solution is better than no solution?

Also I don’t quite understand the hangup on while and for being different. They’re different constructs, with different purposes. That’s why we have both.

ominitay commented 2 years ago

Your proposed solution doesn’t address it for for loops either. Surely an “incomplete” solution is better than no solution?

How doesn't it?

Also I don’t quite understand the hangup on while and for being different.

They're different, but that doesn't mean that a solution can't be applicable to both. Here, I think that the same approach for indices is the best solution.

ghost commented 2 years ago

The specific problem raised with the while loop index solution is that braces are cumbersome so the laziest solution is to leak the index to the enclosing scope. Status quo for loops do not have this problem, so we do not want to introduce it. Your proposed solution would require the same braces as the while case, and so would introduce the same problem.

If you want this aspect of for and while to be the same, then propose a real solution to this problem that works for both. Don’t introduce a new problem so they both have the same flaw.

ominitay commented 2 years ago

The solution already exists in the form of 8019, as I mentioned.