ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.77k stars 2.47k forks source link

Footgun: hidden pass-by-reference #5973

Open ghost opened 4 years ago

ghost commented 4 years ago

Zig 0.6.0 (not master). This is related to, actually maybe a subset of, https://github.com/ziglang/zig/issues/4021 / https://github.com/ziglang/zig/issues/3696 (this issue doesn't involve result copy elision).

I understand that this was intended to be a feature of zig: args passed as values "may" be silently translated to pass-by-reference by the compiler. I think the intent was to stop the user from passing const pointers as an "optimization". But it's also a footgun, sort of like https://github.com/ziglang/zig/issues/2915. The problem occurs when you have another non-const pointer aliasing the same memory as the argument value.

const std = @import("std");

const Thing = struct {
    value: u32,
};

const State = struct {
    thing: Thing,
};

fn inner(state: *State, thing: Thing) void {
    std.debug.warn("before: {}\n", .{thing.value}); // prints 10

    state.thing.value = 0;

    std.debug.warn("after: {}\n", .{thing.value}); // prints 0
}

pub fn main() void {
    var state: State = .{
        .thing = .{ .value = 10 },
    };
    inner(&state, state.thing);
}

The behavior here depends on the compiler implementation. It seems that right now, if thing is a struct value, it's passed by reference. But if it's a bare u32, it's passed by value. I don't know if it will always be this simple (I assume there are plans to pass "small" structs by value.)

The workaround for this situation is to make an explicit copy using a redundant-seeming optimization, probably accompanied by a comment explaining what's going on. Or else to restructure the code at a higher level, but then this footgun will still be lurking in the shadows.

I think that any optimistic "assume no aliases" optimization ought to be opt-in rather than opt-out. That would mean, either go back to the C way of things, or add a new syntax (some symbol that means "compiler can choose between value and const pointer"). Either way, a plain argument should always be passed by value. What do others think?

Vexu commented 4 years ago

Seems like the same thing is happening in #5455.

jmoorman commented 3 years ago

I think that any optimistic "assume no aliases" optimization ought to be opt-in rather than opt-out. That would mean, either go back to the C way of things, or add a new syntax (some symbol that means "compiler can choose between value and const pointer"). Either way, a plain argument should always be passed by value. What do others think?

I completely agree with the footgun sentiment, but defaulting to pass-by-value for arrays doesn't seem quite right. In C, an array is actually just a pointer so it can be passed by value without making a copy of the memory (and the compiler just straight-up ignores the specified size). It's a nice feature of Zig that you can pass an array argument and actually have the compiler enforce its size, but making a local copy of the memory is rarely the behavior that you'd want.

ghost commented 3 years ago

I disagree. Zig arrays are "by value" in other contexts, like assignment. Also, if structs are passed by value but arrays are passed by reference, what do we do with a struct containing an array? Simpler if they are treated the same.

You can pass a large struct, you can also pass a large array. You can also overflow the stack with a bunch of calls (although there's a plan to stop that at compile time). I think this is what you sign up for.

C's treatment of arrays/pointers is an idiosyncracy, one which we don't need to inherit.

ikskuh commented 3 years ago

Also, if structs are passed by value but arrays are passed by reference, what do we do with a struct containing an array? Simpler if they are treated the same.

Just for clarification: Zig doesn't define what values are passed by value or passed by reference which allows the compiler to choose either of those. This allows passing small values in registers and larger values via reference, no matter if the small value is a struct or a array[1] or a simple integer.

The core problem here is that zig doesn't safety-check for aliasing when doing so and prevent compilation or do other measures to prevent that footgun

ghost commented 1 year ago

This seems like a significant design flaw.

Is it even possible for the compiler to flawlessly deduce whether something is being aliased? If not, I definitely think that pass-by-pointer should go back to being explicit, as in C.

nektro commented 1 year ago

related: https://github.com/ziglang/zig/issues/12638 related: https://github.com/ziglang/zig/issues/12251 related: https://github.com/ziglang/zig/issues?q=is%3Aissue+is%3Aopen+provenance

matklad commented 1 year ago

Somewhat simplified example which shows the different treating of primitives and structs:

const std = @import("std");

fn f(a: i32, b: *i32) void {
   b.* = 92;
   std.debug.print("{}\n", .{a});
}

const S = struct { x: i32 };
fn g(a: S, b: *S) void {
   b.x = 92;
   std.debug.print("{}\n", .{a});
}

pub fn main() void {
   var i: i32 = 0;
   f(i, &i); // prints 0

   var s: S = S{ .x = 0 };
   g(s, &s); // prints 92
}

Prints different results as of 0.11.0-dev.38+b40fc7018

mlugg commented 1 year ago

Fairly obvious point, but it's worth noting that this issue is more subtle than aliasing of pointers directly in the function body:

const std = @import("std");
const S = struct { val: u32 };
var glob: *S = undefined;

fn f(x: S) void {
    g();
    std.log.info("{}", .{x.val});
}

fn g() void {
    glob.val = 42;
}

pub fn main() void {
    var x: S = .{ .val = 0 };
    glob = &x;
    f(x); // prints 42
}

I agree that it's not a good idea for this optimisation to change core and otherwise safe semantics like this, but I don't think the solution is to revert to C-style argument passing. Instead, I think functions should avoid passing by reference any value to which a mutable reference may exist; so any global var, or a local var which has &x evaluated at any point prior to the function call (except where that pointer is coerced to a const one), may be mutated from elsewhere, so should not be passed by value. This does raise the issue that the decision of whether to pass by reference now lies with the call site - we certainly don't want duplicate code in the binary for functions which sometimes take a parameter by reference and sometimes by value, and we also want to allow function pointers to pass by reference. The solution here is just to implement "pass by value" for large types by duplicating the value on the stack and passing a reference to that value. This is of course slightly less efficient than directly passing the large value, but the cost of the indirection is probably insignificant in practice given the relative rareness of this case.

IntegratedQuantum commented 1 year ago

I think functions should avoid passing by reference any value to which a mutable reference may exist; so any global var, or a local var which has &x evaluated at any point prior to the function call

I don't think this works good enough. For example if the given struct is on the heap then you would always make a costly copy.

This would undermine the promise that I don't need to worry about pass-by-value vs pass-by-reference.

​ To me the "assume no aliases" optimization with safety checks (related: #1108) would be a better solution.

mlugg commented 1 year ago

I don't think this works good enough. For example if the given struct is on the heap then you would always make a costly copy.

This would undermine the promise that I don't need to worry about pass-by-value vs pass-by-reference.

Just to check, are you referring to a case like this?

fn foo(ally: std.mem.Allocator) !void {
    const ptr = try ally.create(MyLargeType);
    defer ally.destroy(ptr);
    bar(ptr.*);
}

fn bar(x: MyLargeType) void {
    _ = x; // ...
}

Assuming you are: my intuition is that for types you're heap-allocating, you'd normally be passing them around by pointer anyway. That said, I agree this isn't ideal. One solution would be a way for Allocator to annotate its functions to specify that the memory is owned by the caller (kinda like a noalias for the return type), but that opens the floodgates to a whole ownership system, so may be out of scope for Zig.

​ To me the "assume no aliases" optimization with safety checks (related: #1108) would be a better solution.

The problem with this approach is that opting out of the pass-by-ref becomes actually quite annoying:

fn baz() void {
    var x = MyLargeType.init();
    global_for_later = &x;
    // we don't want to pass x by reference here!
    // but to avoid it, we need to make a copy, so a whole new const...
    const x1 = x;
    bar(x1);
    // plus if we wanted to scope this more nicely we'd need a scope around that const+call - our 1-line call has suddenly become 4!
}

This is a pretty gross usage for what should be a reasonable use case.

In my mind, if we want to go the explicit route (rather than trying to make this an unnoticeable optimisation), we need a better way of annotating that a value should be copied (or, conversely, that it can definitely be passed by reference). One solution would be a new builtin @copy, which takes a value and returns a guaranteed stack temporary with the same value, so the above call becomes bar(@copy(x)). The problem with this is that it requires you to be somewhat aware of the compiler's implementation; I don't have to annotate this for something like a single u8, but when do I have to start doing it? This goes against the simplicity of writing Zig code IMO.

A better route might be to go the other way: implement the optimisation in the way I discussed, but also add a @byRef which annotates a value as being able to be passed by reference even when the optimisation would say otherwise. That means the above call would just be bar(x), and in the rare case where the implicit optimisation isn't possible, you specify the call in foo as bar(@byRef(ptr.*)). This still isn't ideal - you'd likely end up having to spot the expensive case when profiling and fix it later (or just know the compiler implemention as with @copy, which kinda sucks) - but I think it's better since the default behaviour will be "correct" 90% of the time. In fact, the default behaviour will make the code work correctly 100% of the time, it just won't necessarily pass things in the most optimal way. I think a performance issue is better than a blatant semantics footgun in the default case.

kuon commented 1 year ago

To me, the most important thing here is to ensure this issue is fixed. If I have an argument like fn foo(bar: Bar) I want the guarantee that bar is my own and nothing will ever touch it. This is a really important guarantee to have.

If the type is large and I want to avoid a copy, I think it is good enough to just leave the responsibility to the programmer and let the programmer pass *Bar or *const Bar. Thus making @byRef IMO not really needed.

For those two reasons, I fully agree with @mlugg saying:

I think functions should avoid passing by reference any value to which a mutable reference may exist; so any global var, or a local var which has &x evaluated at any point prior to the function call

IntegratedQuantum commented 1 year ago

my intuition is that for types you're heap-allocating, you'd normally be passing them around by pointer anyway.

That is correct with the exception of calling member functions, which can implicitly reference/dereference struct pointers. And member functions are fairly common.

Overall the example would be

const MyLargeType = struct {
    ...
    fn bar(self: MyLargeType) void {
        _ = self; // ...
    }
};
fn foo(ally: std.mem.Allocator) !void {
    const ptr = try ally.create(MyLargeType);
    defer ally.destroy(ptr);
    ptr.bar();
}

One solution would be a way for Allocator to annotate its functions to specify that the memory is owned by the caller

That wouldn't help much in the case of member functions, because the allocation might be further up in the callstack and has been passed as a pointer until the member function got called. ​

There are also some further dificulties from member functions. Like for example ArrayList.deinit() it takes a *Self and then calls ArrayList.allocatedSlice() which takes a Self.

The problem with this [@copy] is that it requires you to be somewhat aware of the compiler's implementation; I don't have to annotate this for something like a single u8, but when do I have to start doing it?

Yeah this is a big problem. Especially given that the cutoff between pass-by-value and pass-by-reference might be target dependent. This could be solved by making all parameters pass by reference in debug mode. Then you would always need the explicit @copy if otherwise the pointers would alias. That could be a bit tedious though, considering that every time you pass a variable twice to the same function it would cause a runtime error.

I think a performance issue is better than a blatant semantics footgun in the default case.

I agree. However I think we should choose the solution with the best performance.

kuon commented 1 year ago

The member function problem that @IntegratedQuantum pointed is a real sack of knots, but if we try to untangle them we can take out the following:

My feeling is that passing an argument as bar: Bar should check the "I want to own the memory" 100% of the time. This is what this issue is about, so let's agree on that. To address this, the solution proposed by @mlugg namely, passing by value or by reference when we can be sure there is no pointer referencing the value, checks this box.

Now, the problem is when and argument is passed as bar: Bar and the programmer actually means "I don't care". This is the case in member functions as pointed out.

I think we can solve this only two ways, with a full ownership system, which we rule out or explicitness from the programmer.

We already have bar: *const Bar or bar: *Bar but this yields a *T and not T. I said that @byRef is not needed, but maybe being able to say "I want T but I don't care who owns the memoryis actually needed, thus making something like@byRef` required.

Something like bar: &Bar would yield T and means I don't care who owns the memory. But unlike @byRef it would be explicit where the variable is used and not where it is passed (it would be equivalent to mark all call with @byRef).

williamhCode commented 10 months ago

I was looking into Carbon language the other day, and they have the exact same way of passing in arguments. I want to share how they deal with this issue. In their design docs, this outlines parameters.

It states:

The bindings in the parameter list default to let bindings, and so the parameter names are treated as value expressions.

If we look into value expressions it says that:

Once an object is bound to a value expression in this way, any mutation to the object or its storage ends the lifetime of the value binding, and makes any use of the value expression an error.

Note: this is not intended to ever become "undefined behavior", but instead just "erroneous". We want to be able to detect and report such code as having a bug, but do not want unbounded UB and are not aware of important optimizations that this would inhibit.

I think that this solution would be good, as it makes sure the default argument passing method is always the most efficient, but also notifies the user when an unexpected behavior occurs. In this way, we can and should treat the default parameter similar to const T& in c++, which just guarantees the function will not mutate the passed in value. Then we can have an explicit syntax for actually making a copy of the argument, for cases when it's intended.

If we go with a "I want to own the memory" 100% of the time fix, that is simply not possible without copying data (or using an ownership system). So in this way, we sacrifice performance and everyone goes to using *const T whenever they can because they don't know when the compiler will copy their arguments. This will end up like in c++ where people just put const T& everywhere, which is the complexity that what we want to avoid.

So with the above implemented, the code would throw an error at std.debug.warn("after: {}\n", .{thing.value});, stating that the value has been modified at line state.thing.value = 0; or something similar.

g2mt commented 3 months ago

I think the automatic pass-by-reference semantics is not a good design decision in my opinion. Since Zig is a low level language, behaviors like this should be explicit and not dependent on compiler optimizations. Without introducing anything new to the language, I would personally propose that the compiler should always pass structs and arrays by value. If a struct is too large to be copied, the compiler should warn the user, and tell the user that they should pass those structs by pointers.

haudan commented 3 months ago

Two cents from a layman:

If a struct is too large to be copied, the compiler should warn the user, and tell the user that they should pass those structs by pointers.

I think this is also problematic, when we consider multiple platforms, since each platform will have its own pass-by-value threshold (I believe?).

For one platform, you might not get a warning, while for a different one you will. This is fine for your own exe-code, but warnings like that would be pointless for library code. Library code either always produces these warnings and we can't do anything about them ("unfixable" warnings), or we suppress them for external code, rendering this feature kinda useless.

Since Zig is a low level language, behaviors like this should be explicit and not dependent on compiler optimizations

I disagree. The compiler knows the target platform, and how the code is actually used, better than the programmer. It choosing the passing style is probably a good default, but there should be a way to opt-out and manually specify. We can already force pass-by-reference with *const, but there should be the opposite as well.

tau-dev commented 2 months ago

Why are we calling this a "footgun"? Copy elision is entirely solved by the LLVM optimizer, but Zig is trying to reinvent it in the compiler frontend, in an ad-hoc, unsound fashion that clearly violates the value semantics of the language. It's a bug.

To be clear: the result location semantics as explained in #12251 make sense for returning values, and can be sound with the right constraints. But this should just count as a miscompilation.

mlugg commented 2 months ago

Please don't just post inflammatory takes. This is an intentional part of the language design, and this issue documents the footgun it can cause. If you have run into this issue and can provide a useful data point, do so; otherwise, don't post comments which add nothing to the discussion.

This issue is not related to Result Location Semantics. Instead, the feature leading to this footgun is Parameter Reference Optimization. Copy elision is not, as you assert, "entirely solved" by LLVM, because the cases which may cause footguns here are precisely the cases which any optimizer cannot optimize since semantics are not preserved.

tau-dev commented 2 months ago

This issue is not related to Result Location Semantics

That's what I was trying to say, exactly because these two issues have been conflated in the past :P

the cases which may cause footguns here are precisely the cases which any optimizer cannot optimize

I'm totally on board with language features to help address such cases. But current PRO isn't solving them, it just turns otherwise slow code into broken code.

tau-dev commented 2 months ago

Copy elision is not, as you assert, "entirely solved" by LLVM

I should have measured before I spoke, sorry about that. While one could imagine a good optimizer exploiting fastcc for perfect copy elision, LLVM doesn't do that (turns out LLVM's codegen for passing large types is even worse than passing pointers with explicit alloca+memcpy, but that's besides the point). Still, slow sounds better than wrong until we can solve both.

nokotto commented 1 month ago

I was reading about the Zig language and ended up here. IMHO, as others commented above, it should be explicit:

inner( & state );        // pass-by-reference
inner( & state.thing );  // pass-by-reference
inner_arr( & array );    // pass-by-reference
inner_arr( & slice );    // pass-by-reference

inner( state );       // pass-by-value
inner( state.thing ); // pass-by-value
inner_arr( array );   // pass-by-value
inner_arr( slice );   // pass-by-value

Or the inverse behavior, reference by default and an operator for by-value/copy '>' for example ,

inner( state );         // pass-by-reference
inner_arr( array );     // pass-by-reference

inner( > state );       // pass-by-value
inner_arr( > array );   // pass-by-value

The point is just to make the behavior explicit. 
With this second example, if the programmer use several languages this is gonna break 
his/her logic, so may be should be recommended the classical first one, copy default and 
to be explicit with pass-by-reference '&'

Due to other comment about the compiler limitation with copy (is this what everything is about?), I guess would then need to define a copy method when the structures are declared; I guess done automatically if the structure is simple, and with manual definition if a threshold is exceeded (due to compiler limitation).

I think it is far more interesting to let the programmer know what is happening when reading/writing code, which cannot be done -intuitively, easily- with the current behavior of referencing if its a pointer, referencing if the compiler wants to, or copying if can be done, included when an array or other pointers' structure is passed (one just sees as passing by value if the type its not declared a few lines up). The same when the programmer is reading inside a function, or even method Self.

The current behavior makes difficult to know what is happening and what will happen within the code, including when it is safe to release the memory (because it was referenced and not copied, even vice-versa with Self). For async and threads, it sounds like the current behavior is going to be an unnecessary additional hell that can be avoided.

Turning it explicit will bring a great transparency and intuitiveness to Zig, what will avoid many hard problems to the users at future. I beg to think about it.

mocompute commented 1 week ago

Motivated by this talk about value semantics I thought to explore Zig's approach. In this example, the semantics of the language indicate that a struct is being passed by value, but the compiler instead passes the argument by const reference and inadvertently aliases the other argument, causing the program to generate an incorrect result. Using noalias produces no diagnostic, because only one argument is a pointer.

I think this is the correct issue to track this problem, but the solution does not appear straightforward. Either additional safety checks somehow, or no hidden reference semantics and leave it to the programmer.

const std = @import("std");

const Number = struct {
    x: i32,
};

fn add(a: *Number, b: Number) void {
    a.x += b.x;
}

fn add_twice(a: *Number, b: Number) void {
    add(a, b);
    add(a, b);
}

pub fn main() !void {
    var x: Number = .{ .x = 1 };

    // Expect value semantics with the second argument but it is
    // being passed by reference.
    add_twice(&x, x);

    // Expect x = 3
    std.debug.print("x = {}\n", .{x.x}); // prints x = 4
}

// Zig versions:
// 0.13.0
// 0.14.0-dev.1417+242d268a0

// same in Debug, ReleaseSafe, ReleaseFast
// no change if a declared noalias

// Related issues:
// Safety-check noalias: https://github.com/ziglang/zig/issues/476
// https://github.com/ziglang/zig/issues/5973