Open SpexGuy opened 2 years ago
Out of curiosity, how much does this overlap with Return Value Optimization under C-style semantics?
Are there cases where this goes beyond what the optimizer can do on its own today (w/ a naive C-style lowering that introduces temporaries)?
Out of curiosity, how much does this overlap with Return Value Optimization under C-style semantics?
This is a very similar problem to C++'s RVO and NRVO semantics. The core problem underlying both is that the result location pointer is observable to the program, and can be stored into memory somewhere. In C++ the pointer is exposed via the this
pointer in a constructor, in Zig it can be exposed through the @frame()
builtin in an async call or the semantics from #2765. This ability to observe the pointer forces it to be a semantic issue, and not just an optimization.
Are there cases where this goes beyond what the optimizer can do on its own today (w/ a naive C-style lowering that introduces temporaries)?
There are a few cases where it can bleed into semantics. In the case where the optimizer inlines a function, it will probably be able to remove the copy. But if the function is not inlined, it becomes very difficult. There might be a way to convey the set of operations that would expose the result location in a well defined way, and allow it to remove the copy if it knows a called function does not do those things, but I'm a bit skeptical that LLVM can do that out of the box.
Here are some examples where the RLS semantics might pin down the optimizer:
async fn foo() void {
const frame = @frame();
var ptr = &frame;
suspend {}
assert(ptr.* == @frame());
}
fn spawnAsync() @Frame(foo) {
return async foo();
}
test {
const frame = spawnAsync();
resume frame;
nosuspend await frame; // is this a lifetime bug or not?
}
fn makeEmptyList() RingList {
var result: RingList = undefined;
result.next = &result; // by #2765 semantics this is the result location
return result;
}
test {
const list = makeEmptyList();
assert(list.next == &list); // is this true, false, or undefined?
}
These examples require an answer to the result location question. The first is either a lifetime bug or not, but the second properly pins down the optimizer. If the assert definitely fails, then the optimizer may not remove the temporary. If the assert definitely succeeds, then the language must not use a temporary. If the comparison is undefined
, then the optimizer would be free to remove a copy, but it means that #2765 doesn't actually mean anything and should be un-accepted.
Here's another case, without lifetime problems:
fn checkResult(u32 const *ptr) u32 {
var result: u32 = 4;
assert(ptr == &result); // this should definitely be a well-defined true or false
return result;
}
test {
var r: u32 = 2;
r = checkResult(&r);
}
Thanks for those helpful examples. I was having trouble grokking the impact of RLS without aliasing, but the value is clear now.
Sounds like an equivalent way to describe it is "RLS, with temporaries" (to avoid aliasing), plus guaranteed temporary/copy elision when initializing variables.
Very nice approach - I'm a fan of the proposal :+1:
Very nice writeup.
Does this and if yes, how would this, affect C interop or semantics of translated C code? How are the result location semantics called in C/the C standard?
Missing documentation issue in #2809.
Thank you for the great write-up! I was wondering whether my own thoughts on the issue would be of any help eventually, and it seems your conclusions mostly line up with mine. Here are my remaining 2 cents anyway:
I think when the caller is required to guarantee limits to aliasing, having syntax to do/indicate this at the call site is the proper solution. IMHO this applies to RLS the same as to parameter passing though.
Does this and if yes, how would this, affect C interop or semantics of translated C code? How are the result location semantics called in C/the C standard?
Result locations are for the zig internal calling conventions only, callconv(.C) functions have C semantics where the parameters are copies and the result is always stored in a temporary (subject to the as-if rule which allows for elision as an optimization in some cases).
noalias
on arguments at call sites
I've thought about this and a lot of similar variants, and it might be worth doing but I'm not totally sure. One difficulty with any syntax on call sites is that it also has to work with @call
, and therefore be integrated into tuples. Additionally, in practice most arguments are noalias, so this would be a very common attribute. When I started looking into this I combed through a lot of code looking for these sorts of mistakes, and I found almost none. The few that I did find tended to be overwriting the mutable version with the const version, which probably doesn't violate aliasing rules? I think there's more that's worth investigating here, but I think it's better as a separate proposal with more motivation.
I personally don't see the difficulty with extending @call
since it's a builtin function. We can add an additional argument that is an array of argument attribute structs (noalias
plus whatever we introduce down the line).
It's not pretty, but it's not meant to be, and afaict for the primary use of generic code / meta-programming that should work.
Therefore, I think that annotating caller and callee for where stack copies are made on function call and elided are the sanest option.
I suspect that with comptime information this also would leave the door open for external static analysis as borrow checking of very error prone parts and, except for arbitrary global pointers, proper ecapsulation via stack copy.
Do you have other ideas? Or what do you think? @SpexGuy Please tell me, where and how my model/assumption is flawed.
this would then at worst duplicate function instances
@matu3ba Small nitpick, we technically have 2^n
possible combinations for n
function arguments (+ 1 return value), since every argument could be individually passed by-copy or by-reference.
Although often there may only be one or two large structs and for the other arguments the optimization of passing by reference isn't worth adding new instantiations.
Small nitpick, we technically have 2^n possible combinations for n function arguments (+ 1 return value), since every argument could be individually passed by-copy or by-reference.
Mhm. This could be eliminated, if the callee could force the caller to use 1 specific semantic for each argument (similar to what SpexGuy suggested. But this would prevent the caller to opt-in into more safety (at cost of lower performance).
However, I can not exactly follow why it matters, how each argument is passed over as function argument. I might be too tired (~2h sleep), but as far as I understand, the parameter type is only relevant for the type matching at the usage side and the optimizer has freedom over how it lowers it (copy by value or reference).
Do you have an (ideally) brief code example with desugaring like what SpexGuy is doing in the talk ("ATTACK of the KILLER FEATURES - Martin Wickham - Software You Can Love Vancouver 2023") that demonstrates per fn args annotation is necessary @rohlem ?
If (which I am not yet convinced) fn argument is necessary, then I dont see another way to provide the necessary flexibility compactness unless offering 2 ways to do things: SpexGuy's approach + per fn argument annotation on caller + callee.
Do you have a[ ...] example [...] that demonstrates per fn arg annotation is necessary?
@matu3ba I'm not saying it's necessary to decide for each argument (and the result) individually. I just wanted to point out there's no technical reason to stick to one strategy for all arguments. In fact, the status-quo strategy of deciding based on byte size already decides for each argument individually. (Unless you meant to always keep small-enough arguments by-copy and only apply the decision on larger arguments.)
the optimizer has freedom over how it lowers it (copy by value or reference)
Right, the main issue here is just that Zig currently tells the optimizer that none of the arguments and the result location alias each other, even if this is the case.
My opinion (as already stated previously) is still that the caller / call site should explicitly annotate noalias
guarantees; if not there may be aliasing and the compiler should ensure copies are made.
Note that we already have noalias
syntax for function parameters, but currently don't verify these assertions at all.
With call-site noalias
annotations, it's (relatively) trivial to see whether the arguments that are expected noalias
were passed noalias
- if not, compile error.
IMO for that reason alone the feature would be desirable.
EDIT 2: I mixed up semantics here; in status-quo noalias
currently applies to pointees of pointer arguments;
I assumed / would propose it to apply directly to the values instead. (Maybe making status-quo expressible as *noalias T
? Not sure.)
```zig const P = struct{x: u8, y: u8}; fn flip(in: P) P { return .{.x = in.y, .y = in.x}; } const a = P{.x = 3, .y = 4}; test { var b = a; // my proposed solution // We don't tell the compiler that anything is `noalias`, // so it should assume that everything may alias and make copies for us. b = flip(b); // Here we tell the compiler that `b` doesn't alias `b` // => code will miscompile, as it does in status-quo; // maybe not full illegal behavior though if we can limit the allowed effects? noalias b = flip(noalias b); // The result of an initialization is always `noalias` by default, // because we cannot reference `c` on the right hand side. // (Here `noalias` on the argument is not necessary, though it still documents // that no global mutated by `flip` aliases `b`.) var c = flip(noalias b); // Tricky: Values that the compiler provides backing for in-place // (literals, results of operators, functions) are trivially `noalias`, // because the compiler can prove that they are. // HOWEVER it then can't re-use that value as a result location. var d = flip(.{ a.x, c.y}); // We could allow this next example for explicitly // telling the compiler to re-use the temporary's location. // (We can imagine different functions like "double" which are correct // even without copies, that declare their argument `noalias` as well.) var e = flip(noalias .{ b.x, d.y}); // For regularity, the compiler shouldn't be allowed to "be smart" and // replace `.{d.x, d.y}` with value-equivalent `d` IMO: causes issues here. noalias d = flip(.{d.x, d.y}); } ```
---Note that function bodies can always be generated to assume by-reference noalias
,
we just have to wrap them with a prologue/epilogue that provides space and copies maybe-aliasing values to avoid aliasing issues (=clobbering).
The amount of extra code we generate for this is only linear to the number of maybe-aliasing values in the combinations used in code.
EDIT: Important detail I initially forgot (apparently it was late for me too):
Because (obviously) aliasing reads alone don't conflict, only mutable var
require copies.
Further, because arguments are never mutable, that means really only the result location and global variables would need to be considered.
(That actually brings us back down to only ever having a set of 4 options: {no copies, copy RLS, copy mutated global bytes, copy RLS + mutated global bytes}.
And for completeness, if the bytes only read from (arguments + globals) are fewer bytes than the bytes written to (i.e. of globals),
then instead copying these read-only bytes is more efficient and can be done instead.)
There are specific cases where we could detect this aliasing and make it Checked Illegal Behavior, but we would not be able to detect all cases. Consider as an example,
fn swizzle(v: *const [4]f32) [4]f32 {
return .{ v[1], v[2], v[3], v[0] };
}
test {
var arr = [5]f32{ 0, 1, 2, 3, 4 };
arr[0..4] = swizzle(&arr[1..5]);
}
This function call is aliasing the parameter and the return value, and therefore should be Checked Illegal Behavior, IMO. Could we add runtime assertion for a parameter aliasing with the return value?
Can the problem happen when using "single assignment"?
It seems to me that most problem will come from trying yo mutate a variable "in place" with: x = swizzle(x)
instead of const y = swizzle(x)
. Can the compiler distinguish between the two, and only insert checks for "in-place" mutations?
Could we add runtime assertion for a parameter aliasing with the return value?
Yes, but this isn't all cases, and may even have false positives. Here's another case where aliasing is hard to detect:
const Inner = [2]Big;
const Outer = struct { v: *Inner };
fn foo(o: Outer) Inner {
return .{ o.v[1], o.v[0] };
}
test {
var o: Outer = ...;
o.v.* = foo(o);
}
Here the parameter to foo is &o
and the result location is v
, which is a separate memory region that does not alias. However the body of foo
performs reads from v
that alias the result location, and are therefore illegal. You can imagine even more difficult to detect cases involving global variables pointing to linked lists containing chains of pointers, one of which aliases a parameter or result location.
Additionally, [*]T
parameters cannot get this check because we don't know how large they are.
Finally, depending on how exactly we define aliasing, there may be false positives. For example, imagine a function like this:
fn last(s: []const T) T {
return s[s.len-1];
}
test {
var v: [4]T = ...;
v[0] = last(&v);
}
Here the parameter and the result location overlap, but the function performs no reads that overlap the result location. So there is no aliasing in this example, despite the overlapping types.
Can the compiler distinguish between [single assignment and in-place mutation cases], and only insert checks for "in-place" mutations?
To get a correct check without false positives, we would have to verify within the function that no individual reads or writes overlap the result location. This is possible to do, but cannot be turned on or off by the caller. So these checks would be present in all calls.
Thanks for all the examples you're producing, it's very helpful to the discussion.
I'm suggesting runtime checks (yet imperfect) because I don't really see another way forward for Zig. Disabling result pointer optimization like in C, means user will have to deal with it. Handling it correctly for the compiler seems to require Rust or Val-like restrictions to the language.
What about setting the result pointer to undefined (0xaa) when entering the function in Debug mode? That way reads through any alias will be detected. This would also break your last
example for slices of 1 items.
But last
has the property that it's only returning a preexisting value and will need to do a memcopy to the result location anyway. If I'm not mistaken such functions aren't affected by the current result location semantic bug, because the result location will only be written to just before returning. If it's true and it's possible to detect such functions reliably, it should diminish the number of "false positive" you mentionned.
Can someone summarize what the issues are with the original proposal? It seems like the way forward to me; even if it may produce suboptimal code in some cases, it appears to solve the problem with async.
@SpexGuy I'm new to Zig and probably not qualified to be commenting, but here I go.
I came to this issue via your talk: https://youtu.be/dEIsJPpCZYg?si=GCUFI46nHnjQ_SaH
It was a bit off-putting to learn of issues with aliasing caused by hidden pointer magic. Thanks for working to make Zig better!
- to allow returning large structs without a large perf cost
It's interesting that to avoid a copy requires introducing a tmp variable and a copy, but if the optimizer can often remove the later, then it makes sense in the end.
For a language with "no hidden memory allocations" and "no hidden control flow", I didn't expect the hidden "value-to-const-reference transformation". But I'm not about to ask for "no hidden optimizations." 😂
- to allow async frames to be composed into structs without illegally copying them
- to allow pinned types to be composed into structs without illegally copying them
Swift 5.9 recently added the concept of noncopyable structs. Would it be worth considering something like that? Async frames aren't the only thing that shouldn't be copied, so why not allow users to mark things as noncopyable too? 🤔 (Obviously recognizing that Swift is a very different language with built in reference counting a lot of copying)
@nathany see https://github.com/ziglang/zig/issues/7769 for a proposal about explicitly non-copyable types
My grug brain thinks we should just define the semantics of x = .{ ... }
to as if x
was assigned all at once, aka transactionally. I think this semantic definition solves everything as this is only an issue when assigning an struct/tuple/array to a var.
// ORIGINAL
const V2 = struct { x: f32, y: f32 };
test {
var v = V2{ .x = 0, .y = 1 };
v = V2{ .x = v.y, .y = v.x };
}
// DE-SUGARED
const V2 = struct { x: f32, y: f32 };
test {
// BEGIN TX
var tmp: V2 = undefined;
tmp.x = 0;
tmp.y = 1;
// COMMIT
var v: V2 = undefined;
v.x = tmp.x;
v.y = tmp.y;
// END TX
// BEGIN TX
var tmp2: V2 = undefined;
tmp2.x = v.y;
tmp2.y = v.x;
// COMMIT
v = undefined;
v.x = tmp2.x;
v.y = tmp2.y;
// END TX
}
So yes temporaries for all assignments.
I feel that this is a large enough footgun that we should immediately fix this with temporaries and then debate other solutions after.
This is a correctness problem and we shouldn't worry about performance.
My grug brain thinks we should just define the semantics of
x = .{ ... }
to as ifx
was assigned all at once, aka transactionally.
That makes sense, but then how would we plague the language with ill-conceived experimental semantics for many years?
Especially when it boosts the performance in many cases while merely unleashing pure chaos in any code that is vaguely self-referential? Remember: Only code that works is benchmarked.
Sarcasm aside, the best time to add Forwarding Assignment Syntax to the language (thus returning to sane defaults) was when RLS was first added. The second best time is now.
Motivation
Result Location Semantics have three core goals in Zig
Unfortunately this feature also causes some unintended side effects, like #4021, #3234, and #12064.
The problem looks like this:
With the current semantics, the result location
&v
of the second assignment is forwarded to the struct initialization expression, causing it to be equivalent toThis means that
v.x
is overwritten in the first assignment, and then used in the second assignment.This is made worse by the transparent value-to-const-reference transformation which is performed on some struct parameters, which can cause aliasing to be easily observed, as in #3696 and #5973.
Here these two features conspire to preserve aliasing through the function call. The parameter is transformed into a const reference and the result location is forwarded to the function. Both of these are
&v
, so this has the same behavior as the first example.This behavior also has consequences for the optimizer. In order for an RLS transformation to be a strict win, the implicit return pointer needs to be marked
noalias
. However, this mark would turn the problems listed above into Unchecked Illegal Behavior, allowing the optimizer to mark them asunreachable
. But without thenoalias
mark, the optimizer may in some cases be constrained and unable to combine writes to the result location due to concerns about aliasing. Additionally, since RLS applies to all structs, it includes very small structs that could be more efficiently returned in registers.There are specific cases where we could detect this aliasing and make it Checked Illegal Behavior, but we would not be able to detect all cases. Consider as an example,
Aliasing occurs here, but the only way to find it is to keep track of every read and write, and perform a pairwise comparison on all of them. Once loops get involved, the memory requirement increases for every iteration of the loop, as does the work required for detection. So it's unreasonable to make this checked in the general case.
Since it has occurred naturally in nonobvious ways, it shouldn't be UIB. But as shown above, we can't really make it CIB either. Implementation defined behavior here would only create a needless and difficult to observe portability burden. The alternative is to make it well-defined, and allow result pointers to alias. But that defeats the first core goal of RLS, to be an optimization. It also causes unexpected behavior, as shown by the linked issues. So something needs to change.
The Fix
The obvious fix is to introduce temporaries on all assignments, but this would give up all of the three core goals. However, there are many cases where we can detect that a value must not alias, and safely preserve the current behavior. For example:
The only cases that need to change are of the form
For these expressions, the existing var may alias inside the expr. For the sake of avoiding accidental aliasing, we should change the interpretation of these statements to do the following:
&<existing_var_or_expr>
@TypeOf(<existing_var_or_expr>)
, or an anonymous temporary if it has an inferred type<expr>
using the temporary as the result locationThis change may have a performance cost in some cases, however it will also allow us to safely put the
noalias
attribute on result location pointers. Functions would be generated in the same way they are now, with an RLS pointer, but calls to those functions may need to introduce a temporary based on the form of the call.Forwarding Assignment Syntax
Sometimes you need to overwrite a large struct, and avoiding a copy is important for performance. After this change, you have two options:
I think this is probably good enough for most use cases, but if there are convincing arguments we could consider introducing a new syntax that avoids the temporary and asserts that no aliasing will occur. A "forwarding assignment" where the lhs is used as an input to the rhs. I'd like to avoid syntax bikeshedding in this issue, but here are some examples of how it could look:
Consequences for Async and Pinned
Aside from optimization, async and pinned types have correctness requirements based on avoiding copies. Since this change will introduce copies which could break existing async code, we should implement compile errors for copying async frames before or at the same time as making this change. Once that's in though, most of async will still work with these changes. For example,
But there are still places where you need to assign to existing memory. For example:
We have a sort of workaround for these cases, in the form of
@asyncCall
, but it's a pretty dramatic change in syntax for a small change in semantics. We don't have any parallel workaround for pinned structs. So for the sake of async and pinned, we may want to more strongly consider adding a Forwarding Assignment Syntax.Consequences for Parameter Passing
The value-to-const-reference conversion on parameters dovetails with the old RLS semantics to cause a lot of problems, as described at the beginning. However, with RLS "fixed" to avoid aliasing, I suspect that leaves it in a much better place. I think we should fix RLS first, and then see what problems still arise from parameter conversion and consider it separately if it is still causing problems.
Consequences for Result Location References
These new temporaries may be observable by code using #2765, if code is allowed to reference the result location directly. However #2765 is not implemented yet so it won't affect any existing code.