Closed ghost closed 1 year ago
I really like this for a couple reasons:
len
, facilitating vectorization (maybe, this depends on some semantics of slices that aren't finalized yet).const Foo = struct { a: []u32, b: []const u32 };
fn initA(foo: *const Foo) void {
for (foo.a) |*a, i| {
a.* = process(foo.b[i]);
}
}
In this example, assume the compiler cannot fully inline process
(or something volatile happens, or there's another memory clobber of some sort). It must assume that foo.b.ptr
may have been modified, and reload it in the next iteration of the loop. This means that this loop is doing a double indirection before calling process
, which cannot be optimized away. But if the slice is specified in the loop expression, the compiler makes a semantic copy, so it can assume that the pointer is not modified between iterations.
I'm torn on the ;
. On one hand, it increases readability with multiple captures and prevents errors. On the other hand, it makes the capture syntax asymmetric and specializes it to loops. Having written this out though, it sounds silly. The benefits clearly outweigh the drawbacks.
Shouldn't we start with if
first?
e.g.
const one: ?u32 = 1;
const two: ?u32 = 2;
if (one) |foo| if (two) |bar| { // I want one if here
// …
};
@Mouvedia Multi-object if
would just be syntactic sugar over status quo. This proposal has real implications for performance and encouraged programming style, as noted above by Spex.
There might be a case for multi-object while
, as that cannot be done cleanly in status quo, but that would be a separate proposal.
I see. I was just advocating for consistency of multi captures.
Taking @Mouvedia's idea a bit further, here's a list of potential multi-objectification candidates:
for
-- Intuitive and useful; provides extra information to the optimizer.if
with optional -- Moderately useful syntax sugar.if
with error union -- Makes no sense, because of the else
case.while
with optional -- Pure syntax sugar, but very convenient for iterator composition (see example below).while
with error union -- Makes no sense, because of the else
case.switch
-- Makes no sense, unless we want to introduce switching on Cartesian products, which I'm not advocating.// looping over two iterators:
while (iter1(), iter2()) |val1, val2| {
// body
}
// status quo:
var done = false;
while (!done) {
if (iter1()) |val1| {
if (iter2()) |val2| {
// body
} else {
done = true;
}
} else {
done = true;
}
}
In short, multi-object variants should be seriously considered for if
and while
on optionals, even though the same would not work for error unions.
I can see the case for multi-object while
in particular, but I'm not sure if that pattern is used all that often in actual code -- and it has a burden of utility to overcome, distinguishing as it does collections of optionals and single error unions. (What if I want to check an optional and an error union? How would that be represented in the else
capture? If I can't do that, that seems to me like an obscure language rule, and I'll be frustrated.)
I can see the case for multi-object while in particular, but I'm not sure if that pattern is used all that often in actual code.
I'm not sure either. But I suspect that this pattern is not used more often, precisely because it is so painful. BTW, this not only applies to Zig. Even in languages that are more friendly towards iterators, native loops usually support only one iterator. If you need to loop over two or more in parallel, you either drop back to explicit control-flow and the raw iterator protocol, or, if you are lucky, the language supports some kind of "zip" mechanism via generics, macros or lambdas. If Zig broke the mold in this regard, and made looping over multiple iterators seem natural, maybe this pattern would rise in popularity.
What if I want to check an optional and an error union? How would that be represented in the else capture? If I can't do that, that seems to me like an obscure language rule, and I'll be frustrated.
As you say, error unions can't be handled. If you need to work with multiple sequences, one of which may return an error, you are back to the much more verbose status-quo solution. I admit that this discontinuity is aesthetically unpleasant. However, on balance, I feel that the special case of looping over sequences that follow a standard iterator protocol (i.e. optionals) is important enough to be worth-while. Just my 2¢.
This proposal is very compelling because of the assertion that the objects are all the same len
. That is often desirable, and offering syntactic sugar will encourage programmers to include that assertion where they otherwise might be too lazy to include it. If a programmer leaves off the assertion and simply indexes into all the slices/arrays during the loop, that implicitly asserts that the other objects must only be at least as long as the one being looped over, and the implicit assertion gets even weaker if there are any early exits from the loop, such as break
or return
. Since the length assertion included in this proposal communicates intent, this proposal is clearly an improvement for the use case when this proposal is applicable.
There is a question of how common this use case would be, and that kind of thing is difficult to quantify. In a meeting with @EleanorNB , @SpexGuy , and @andrewrk we were able to find more than one use case from real actual code from more than one author present in the meeting. Seems common enough. Let's do it.
The discussion of generalizing this idea to other syntactic constructs is interesting, but those ideas should be their own issue/proposal if they are going to be accepted. This proposal is just for for
loops.
There is a potential footgun when refactoring a loop to take an additional object where the optional index capture will be reinterpreted as the new final object capture, such as from line 1 to line 2 below:
for (a) |x, i| {} // line 1
for (a, b) |x, i| {} // line 2 - probably a bug
for (a, b) |x, y| {} // line 3 - probably not a bug
for (a, b) |x, y, i| {} // line 4
This potential problem is not bad enough to prevent accepting this proposal. There can be other proposals to mitigate this potential problem.
proposal for the syntax (ref https://github.com/ziglang/zig-spec/blob/master/grammar/grammar.y search for ForStatement
):
for
now takes an ExprList
between the parens, like function call parameters. (This allows a trailing comma after the last expression.)ExprList
construct).for
loop now takes any number of captures, any of which can be pointer captures (except the index capture; see below). The capture list also allows a trailing comma.proposal for the semantics (i'm experimenting with a formal specification syntax using <angle brackets>
to denote placeholders for syntactic constructs, and $name
to denote variables that cannot be observed in the zig program.):
for a for
loop with N
expressions (where N
is at least 1
):
for (<expr[0]>, ... <expr[N-1]>) |<capture[0]>, ... <capture[N-1]>| <body>
This is syntactic sugar for:
{
// evaluate all the exprs in order before doing any asserts.
const $exprs = .{ <expr[0]>, ... <expr[N-1]> };
// get and assert len before the loop starts.
const $len = $exprs[0].len;
comptime var $e = 1;
inline while (e < $exprs.len) : (e += 1) {
assert($len = $exprs[e].len);
}
// now onto the actual loop
var $i: usize = 0;
while ($i < $len) : ($i += 1) {
<declare_capture(<capture[0]>, $exprs[0][i])>
...
<declare_capture(<capture[N-1]>, $exprs[N-1][i])>
<body>
}
}
where <declare_capture(<capture>, $expr)>
is a macro for:
<capture>
is a pointer capture *<name>
, then const name = &$expr;
const <capture> = expr;
That was a fun exercise in formal specification. Now that I've written it, i don't think i like this notation. :shrug:
Note that I'm attempting to propose status quo for the N=1
case.
for (a, b) |x, y, i| {}
@thejoshwolfe I wish the separator for the index were different—e.g. ;
or |
—as @EleanorNB was proposing initially:
although for clarity this is done with a semicolon rather than a comma:
|*ec, ea, eb; i|
.
We actually spent a good half hour bikeshedding that in the meeting. The primary issue with ;
is that the parser, when recovering from an error, may see it and interpret it as a statement end, thus interpreting the following incorrectly.
We didn't agree on a syntax; as Josh has stated above, the footgun inherent in status quo was deemed not significant enough to block the proposal at large. From memory, these were our ideas (I'm doubtless missing some -- there were a lot; see the minutes for more when they go up):
for (a, b) |x, y, i|
(status quo; cleanest, but can mask errors)for (a, b) |x, y, _|
(required index, except in case of one iterand; prevents sneaky bugs, but defeats the point somewhat)for (a, b) |x, y; i|
(my initial suggestion; makes intuitive sense, but makes parser failure recovery harder)for (a, b) |x, y, [i]|
(matches index notation, more immediately obvious what it does; but a special case for the parser)for (a, b) |x, y| : i
(has precedent in continue expressions, can be left off non-intrusively -- I favour this one)for (a, b) |x, y$ i|
(at this point we were getting tired and started trolling)Ill add for (a, b) | x, y | i | { … }
to your list.
We actually spent a good half hour bikeshedding that in the meeting.
@SpexGuy I hope you will find a good separator because, now that we have a list, there's a back and forth (for the developer) needed to match the index—in particular if there are many captures. We have 2 groups of captures hence they should be clearly delimited IMHO.
I think about this as some sort of destructuring. How about if you could destructure tuples inside of |.....|
?
e.g.
for (x) |{a,b}| {
=> equivalent to for (x) |tmp| { const a = tmp[0]; const b = tmp[1];
for (x) |{a,*b}| {
=> equivalent to for (x) |*tmp| { const a = tmp[0]; const b = &tmp[1];
for (x) |{a,b}, i| {
=> equivalent to for (x) |tmp, i| { const a = tmp[0]; const b = tmp[1];
if (x) |{a,b}| {
=> equivalent to if (x) |tmp| { const a = tmp[0]; const b = tmp[1];
Just to diversify the discussion a bit since all the proposals until now have been about symbols, here's an idea that goes in the opposite direction:
for (a, b) |x, y, index=i| {
}
It's annoying that it requires that many extra keystrokes, but it's explicit and easy to parse visually.
Maybe index
should be idx
, maybe a syntax that is not inspired by named arguments is preferable, but IMO if we start having a variable number of arguments in for loops, then we should consider cranking up the explicitness for the index much more than what a single symbol provides.
Also I think all symbol proposals look bad except the semicolon one, but apparently that's also the one with other downsides.
I disagree with any of these solutions. I believe it's best to stick with the status quo. It doesn't require any extra work from zig and works as intended. I'm very wary of adding features to reduce keystrokes. Remember people, you type this code up once, in specialized editors that make this take no more than a couple seconds, This feature is tiny enough that it doesn't warrant all of the work and discussion allotted to it.
It's not about keystrokes -- it's about preventing bugs. If you had an index before, but then introduce another iterand with integer elements, the compiler will not detect that and that's a bug. There's a strong case for distingushing index capture.
I'm not sure if ;
would necessarily be significantly harder to parse; that was just the only concrete point raised against it in the meeting. IIRC []
also might have had similar issues. @SpexGuy would be the one to ask about that.
It's not about keystrokes -- it's about preventing bugs. If you had an index before, but then introduce another iterand with integer elements, the compiler will not detect that and that's a bug. There's a strong case for distingushing index capture.
Yes, that's pretty much what my comment was about.
I'm not in favor of adding this functionality to for loop because it adds more functionality to something that's already simple enough. Give the for loop an indexable thing, and you will go over it's elements. add another parameter and you get the index as well. If anything, a special construct could be made to support SoA use cases without conflating the original for.
Dude.
This was brougt to a design meeting.
We spent a solid hour and a half discussing it.
We are getting multi-object for
. That's that.
The only thing up in the air is index capture syntax.
Okay, but I am also free to leave my opinion on the matter, even if it was also accepted...
I'm not in favor of adding this functionality to for loop because it adds more functionality to something that's already simple enough.
Perhaps a separate keyword? Taking inspiration from foreach, what about:
forzip (c, a, b) |*ec, ea, eb| {
// Long and complicated sequence
}
All arrays must be the same length -- this will be checked by the compiler if lengths are comptime-known, and differing runtime-known lengths will be checked illegal behaviour.
What would happen if the runtime-known lengths differ? How would the error be handled?
All arrays must be the same length -- this will be checked by the compiler if lengths are comptime-known, and differing runtime-known lengths will be checked illegal behaviour.
What would happen if the runtime-known lengths differ? How would the error be handled?
I’d imagine it would be a compiler error since the use case is if the objects all had the same length
Yep. If the lengths are known at compile time and don't match, it's a compile error. Otherwise if they don't match at runtime it's a panic in safe modes and UB in unsafe modes. This allows the optimizer to pick only the cheapest length to compute, and delete the others as dead code.
On Mon, Jun 7, 2021, 5:21 AM haze @.***> wrote:
All arrays must be the same length -- this will be checked by the compiler if lengths are comptime-known, and differing runtime-known lengths will be checked illegal behaviour.
What would happen if the runtime-known lengths differ? How would the error be handled?
I’d imagine it would be a compiler error since the use case is if the objects all had the same length
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ziglang/zig/issues/7257#issuecomment-855804815, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARNCWQ3VYMB2GGFMC2CCXDTRSMRJANCNFSM4UGSU2AQ .
I've been thinking about this proposal in comparison to Haskell's zip and Rust's zip.
I like the idea of capturing the intent to iterate over two+ arrays that are the same length; however, it seems there are several alternative but adjacent intentions that could be valuable to capture, too. For example:
zipWith
iterates even when there is no known length (e.g. streams)I'm too new to know how zig thinks about these things, but I raise these in case there is value in making the new syntax versatile enough to capture these.
Thinking about indices, do we even need them? Previously they were very ergonomic for SoA, but that’s unnecessary now. What do we lose by requiring a separate variable to be incremented, like in while
iteration?
What do we lose by requiring a separate variable to be incremented, like in while iteration?
We lose a guaranteed-to-be-in-bounds index variable, which, for single-object loops at least, is as relevant as it was before this proposal.
It’s only guaranteed to be in bounds if it’s used directly with no modification, which if used as:
In short, any case where the guarantee would permit optimisation is covered directly by this proposal without indices, and I’m struggling to think of any case where an optimisation missed by not doing this would be inside a hot loop, so I have to wonder if it’s worth worrying about. (In debug mode a one-time extraneous check is acceptable, and in release modes it will be eliminated anyway.)
Makes sense. There is still a loss of convenience, but probably no more than with while loops.
Would the : (i += 1)
syntax be supported instead?
I don’t see why not, but I also don’t see why. My gut feeling would be to allow it, but the higher-ups may not agree.
Here's another idea for how to deal with indices:
for (array1, array2, 0..) |*a, b, i| {
a.* = b + i;
}
No need to remember whether the index goes first or last, or to define and manually increment a counter. As a side effect, this would also solve 80% of #358.
That is brilliant. I love it. It's now an accepted amendment to this proposal.
for (array1, array2, 0..) |*a, b, i| { a.* = b + i; }
Is the 0..
meant to be a range, effectively giving you a slice of all the arrays? Or is it just a fixed notation for index capture?
It makes a pseudo array which can then be captured like an array element. So this will be possible as well:
for (0..10) |i| {
// ...
}
Which as @zzyxyzz noted is effectively a reversal on the rejection of #358. It would also allow something like this for consistency's sake:
for (0..10, 10..20) |a, b| {
// ...
}
I'm left fairly confused as to why this is being accepted, when while
loops make for (0..10) |i| {
redundant, amounting to breaking the Zig zen of having only one way to do things. Additionally, would this mean we're removing for (array) |a, i|
, to be replaced with for (array, 0..) |a, i|
? If not, then this will make syntax inconsistent. I'm not a fan of this at all.
This proposal would also imply that var array: [10]u8 = 0..10;
would be valid, or else would be another inconsistency. Also, this can be done in userspace (see zig-range).
At first I thought this was too much unnecessary sugar. But after thinking more I think it makes sense in that it fully embraces array and slice iteration, including their indexes, and enables optimizations. And at least there is still just one for
loop.
The additional benefit of simple integer ranges (without arrays or slices) is also useful but very limited: no inclusive end points, no counting down, no step size, and the type is always usize. If nothing else I guess it makes the zig-range trick unnecessary without additional syntax, but I don't see that as very important.
@nektro brought up a good point on discord however, which is that for consistency the existing for
syntax should change to require the 0..
when an index is used. It would be good to discuss that before finalizing this.
The additional benefit of simple integer ranges (without arrays or slices) is also useful but very limited: no inclusive end points, no counting down, no step size, and the type is always usize. If nothing else I guess it makes the zig-range trick unnecessary without additional syntax, but I don't see that as very important.
On second thought, coming to grips with the zig-range trick is important. Since many people are using it and newcomers ask for it and latch onto it, having such a basic thing in a separate repo is pretty funky. I think it should either be part of the language/std library, or we should have an alternative like the one in this proposal.
Here's some more detailed reasoning behind this change.
In terms of language complexity:
a..
) are already in the language. We simply allow them to be iterated over directly, without an intervening array.for (slice, 0..) |a, i| {}
.In terms of safety / cognitive overhead:
for (slice) |idx, item| {}
, and this might pass casual inspection. The new notation makes this mistake more obvious: for (slice, 0..) |idx, item| {}
.
var i: usize = 0;
while (i < length) : (i += 1) {
// ...
}
for (0..length) |i| { // ... }
- `for (0..) |i|` is an easily detected compile error. You have to combine an open-ended range with a fixed-length object. Fixed ranges (`a..b`) can stand on their own.
In terms of convenience:
- As a reminder, the previous plan was to remove index capture from for loops entirely, because it would clash with multi-object capture. I think it was a reasonable tradeoff, but still a shame that the convenience of working with multi-object loops was to be improved at the expense of single-object loops.
- A lot of people have been asking for ranged iteration in #358. This does not provide a complete solution, but is still an improvement.
I don't see why we can't do indices in for loops like we already do in while loops. It would make sense to me for continue expressions to work on both forms of loop.
var i: usize = 0;
for (array) |value| : (i += 1) {
...
}
This has several benefits over status quo:
for (array) |i, value| {
footgun.i
. while
and for
loops.@ominitay
What you say is true, but Zig isn't as minimalist as it sometimes pretends to be. Taking the One Way to Do Things principle to its logical conclusion, we could eliminate for loops, loops with continuation expressions, switches, the continue statement and probably quite a few other things. All control flow can be reasonably described with while
, if
and break
. But Zig isn't above some syntax sugar if it improves readability or prevents footguns in sufficiently common use cases. I feel this is one of those cases, but I suppose in the end it's a matter of taste.
@zzyxyzz I agree with that, I guess it's just about which solution we deem more effective :) I personally don't like this syntax, when we could use the continuation syntax in a for loop, but I recognise a lot of people disagree with me lol
@ominitay I would otherwise agree with you here, but you left out an important word in the "zen point" you quoted:
Only one "obvious" way to do things.
This amendment from @zzyxyzz makes for (start..end) |i|
the "obvious" way to do ranged iteration. Compared to while
it's more specialized (doesn't handle all the cases while
can) but also more convenient since you don't have to introduce a new scope and declare a new counter variable with an explicit type. This means when people can use for
they will because it's more convenient which is what makes it the "obvious" way to do it.
P.S. Also the fact that you're skeptical of this puts you in good company since Andrew already rejected this before :) I think the fact that this feature removes the need for a "hacky" implicit array index variable for (arr) |e, i|
is what puts it in the win category for me.
@marler8997 Only one "obvious" way to do things.
My mistake. I think the point I was trying to make was more that having for loops support ranges is, imo, unnecessary while while loops can do the job just as well. It doesn't make much sense to me to introduce a less powerful tool to do a subset of what can already be done with another, well-suited, tool.
P.S. There are multiple possible alternatives to implicit index variables. I don't think that that's a unique advantage of introducing ranges.
also more convenient since you don't have to introduce a new scope and declare a new counter variable with an explicit type.
And also, this goes beyond convenience: people are sloppy about introducing new scopes just for counter variables (or forget during refactoring etc) and thus introduce a potential for bugs by reusing counter variables by mistake. Another plus for the for-range proposal.
And also, this goes beyond convenience: people are sloppy about introducing new scopes just for counter variables
This is only a partial solution to this issue though, as it doesn't address it at all for while
loops. See #8019, which is (imo) a better solution to this.
And also, this goes beyond convenience: people are sloppy about introducing new scopes just for counter variables
This is only a partial solution to this issue though, as it doesn't address it at all for
while
loops. See #8019, which is (imo) a better solution to this.
The point stands, it solves the scoping issue for simple counted loops (which are very common) while also solving the other problems initiating the proposal. Especially important if 8019 doesn't get accepted (which seems likely imo, it's more natural/less annoying to introduce extra scopes for more complex cases) Ranged-for seems like a very pragmatic middle ground.
This is only a partial solution to this issue though, as it doesn't address it at all for
while
loops.
Your proposed solution doesn’t address it for for
loops either. Surely an “incomplete” solution is better than no solution?
Also I don’t quite understand the hangup on while
and for
being different. They’re different constructs, with different purposes. That’s why we have both.
Your proposed solution doesn’t address it for for loops either. Surely an “incomplete” solution is better than no solution?
How doesn't it?
Also I don’t quite understand the hangup on while and for being different.
They're different, but that doesn't mean that a solution can't be applicable to both. Here, I think that the same approach for indices is the best solution.
The specific problem raised with the while
loop index solution is that braces are cumbersome so the laziest solution is to leak the index to the enclosing scope. Status quo for
loops do not have this problem, so we do not want to introduce it. Your proposed solution would require the same braces as the while
case, and so would introduce the same problem.
If you want this aspect of for
and while
to be the same, then propose a real solution to this problem that works for both. Don’t introduce a new problem so they both have the same flaw.
The solution already exists in the form of 8019, as I mentioned.
Say I have some large arrays of the same size, and I want to perform some element-wise operation on them. In status quo, that might look like this:
What we really want to do is capture each element of
a
,b
andc
at the same index, however the best we can do is capture the index itself and use it to index intoa
andb
-- a bit cleaner syntactically than straight C-stylefor
loops maybe, but with many of the same problems: we need the index directly, we're not sure that all the bounds line up/aren't overrun, the compiler may have a hard time optimising the code (it needs to be sure that indices are used directly and which arrays are/are not mutated, and it will most likely insert bounds checks at every access in what may be a performance-critical inner loop), and crucially: intent is not communicated.This could be made much cleaner and clearer, to both the author and the compiler, if we were allowed to do it like this:
All arrays must be the same length -- this will be checked by the compiler if lengths are comptime-known, and differing runtime-known lengths will be checked illegal behaviour.
If we use the value of the index, we may still capture it, although for clarity this is done with a semicolon rather than a comma:(Upon reflection, I don’t think index capture is necessary, or at least either useful or optimiser-friendly enough to justify inclusion. The continue expression currently used in|*ec, ea, eb; i|
. (We also make this change to single-objectfor
loops.)while
loops is sufficient for this purpose and maintains language consistency.)