ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.14k stars 2.49k forks source link

Specification of behavior for safety-disabled bare unions #20232

Open mnemnion opened 4 months ago

mnemnion commented 4 months ago

An discussion on Ziggit about safety tagging and bare unions prompted me to open this issue, as a sounding board for specifying this corner of Zig's behavior.

Here's the motivating code:

// safety checked union
pub const CheckedUnion = union {
    int: i64,
    float: f64,
};

// unchecked union
pub const UncheckedUnion = blk: {
    @setRuntimeSafety(false);
    break :blk union {
        int: i64,
        float: f64,
    };
};

const unchecked_array: [1]UncheckedUnion = .{UncheckedUnion{ .float = 3.14139 }};
const checked_array: [1]CheckedUnion = .{CheckedUnion{ .float = 2.71828 }};

// evade compile-time safety checking
fn runtimePassUnchecked() []const UncheckedUnion {
    return &unchecked_array;
}
fn runtimePassChecked() []const CheckedUnion {
    return &checked_array;
}

test "safe and unsafe" {
    @setRuntimeSafety(true);
    std.debug.print("\nsize of CheckedUnion is {}\n", .{@sizeOf(CheckedUnion)});
    std.debug.print("size of UncheckedUnion is {}\n", .{@sizeOf(UncheckedUnion)});
    // illustrates that behavior is correct for correct access
    const union_int = UncheckedUnion{ .int = 64 };
    std.debug.print("value of union_int is {}\n", .{union_int.int});
    const union_float = UncheckedUnion{ .float = 3.14159 };
    std.debug.print("value of union_float is {}\n", .{union_float.float});
    const unchecked_union_slice = runtimePassUnchecked();
    // This is VERY ILLEGAL and is meant to illustrate the actual behavior,
    // which is that the program doesn't panic, and prints nonsense
    std.debug.print("UNDEFINED BEHAVIOR {}\n", .{unchecked_union_slice[0].int});
    const checked_union_slice = runtimePassChecked();
    // panic: access of union field 'int' while field 'float' is active
    std.debug.print("panic! at the runtime {}\n", .{checked_union_slice[0].int});
}

This is with 0.12.0, I'll check it with the new release within a few days.

What's going on here is that with @setRuntimeSafety(false), the type generated for the union doesn't have the safety tag, which is why it's smaller. This also illustrates that, at least for this test, the behavior of that type with @setRuntimeSafety(true) is unchecked.

Reading the documentation, I believe this is the only example of unchecked behavior escaping into a safety-checked context.

I feel strongly that this behavior is the correct one, and that the specification (when and as it is written) should make it clear that this combination of operations will have the result which it does: specifying both that using an unchecked union correctly in a safety checked scope will produce correct results, and also specifying that a bare union created in an unchecked scope will exhibit undefined behavior if misused in either a checked or unchecked scope.

As I discuss in the Discourse thread, I'm working on a VM which will rely on this working. I don't want to have to build the entire library in a Fast/Small mode in order to make the union smaller, and disable the runtime check, because it will happen on the dispatch of every instruction. VM dispatch loops have a way of confusing the branch predictor, so I can't count on that switch being cold, and leaving the safety tag on would mean that each instruction was usize * 2 in stride, so I get half as many instructions per cache line.

In other words, not only is this the only case where unchecked behavior can escape into a checked context, but it's good that it does, and its very important that later changes to the compiler don't change this behavior.

How this is expressed in the standard is less important: I would describe the current behavior as "the safety-checked status of a union type is a property of that type, which it inherits from the safety-check status of the scope in which it's defined".

rohlem commented 4 months ago

I didn't expect the constructed type's size to change at all based on the creation scope's runtime safety, interesting that it does.

I feel strongly that this behavior is the correct one

I understand your reasoning, and clearly you've put a lot of thought into this. I still think it might be unexpected in some cases, say you move a type out of a function and inadvertently add or remove runtime safety at every usage site. But I guess it's really intended this way by the langref. Someone wanting to force runtime safety should be using union(enum) anyway.

What's interesting though is that while union(enum) exists for forcing the tag, for untagged unions (regardless of safety setting) status-quo only provides extern union. I assume you prefer the block + @setRuntimeSafety(false) in the example because extern union has some drawbacks in your use case? (Probably something related to some system-C-ABI restrictions extern enforces?) Should we provide a mechanism for forcing a union to be untagged that isn't tied to extern?

(There's also packed union, but packed comes with its own restrictions, and will additionally require a wrapper packed struct to pad all types to the same size, see https://github.com/ziglang/zig/issues/19754 .)

mnemnion commented 4 months ago

Someone wanting to force runtime safety should be using union(enum) anyway.

This is a large part of my reasoning, yes. The use of a bare union is harmful unless the code is intended to be correct without the safety tag, and this implies an intention to turn off the safety check at some point. If you don't, you have a tag following every instance of the type around, getting checked on access, but you can't use the safety tag. The only way it can help you (and it's very helpful!) is by crashing if the code is wrong.

I assume you prefer the block + @setRuntimeSafety(false) in the example because extern union has some drawbacks in your use case?

The usual: you can't put 'fancy' Zig types into something extern, so it's limited in that way. Zig bare union types are even more opaque than extern union, because with no defined memory layout, there's no standard-compliant way of examining a specific part of the punned memory to see which type of union you're dealing with. You can sneak info out of a C union because every variant has a defined layout, you know where all the bits are. Any attempt to do the same with a Zig union would be undefined behavior, brittle, dangerous, and a terrible idea.

But there are occasions when you can know from program structure what's inside a union, without examining it, and I discovered the behavior which prompted this issue while writing one. One variant is a u64, and the other is a complex tagged union(enum) which fits in a machine word. It's for VM instructions, and proper function of the VM will never try and decode the bare variant, which is a bitmask provided for other instructions to use (which advance the instruction pointer past that bitmask).

for untagged unions (regardless of safety setting) status-quo only provides extern union.

I would put this differently I think. extern union can't be safety-tagged, because it has to follow the C ABI contract, Zig unions are safety-tagged in safety-checked because it's possible.

Should we provide a mechanism for forcing a union to be untagged that isn't tied to extern?

This is the mechanism! It works just fine, and my interest is in making sure that future compiler development doesn't undermine the behavior. There's no getting around the tradeoff: without the tag, you can't safety-check the union. Having written some gnarly union-using code in C, being able to wear a seatbelt while getting it right is extremely helpful. Once the code is correct, the performance advantage of removing the tag is compelling, at least for my use case.

For completeness, here's the other side of the equation: What happens if we generate an array of the safety-tagged bare union with runtime safety off, and then address the wrong variant, also with runtime safety off?

This should be run as a postscript to the code in the first post:


const checked_array_2 = blk: {
    @setRuntimeSafety(false);
    const temp: [2]CheckedUnion = .{ CheckedUnion{ .float = 2.7182 }, CheckedUnion{ .float = 3.14159 } };
    break :blk temp;
};

fn runtimePassChecked2() []const CheckedUnion {
    return &checked_array_2;
}

test "unsafe checked enum" {
    @setRuntimeSafety(false);
    const checked_union_slice = runtimePassChecked2();
    // demonstrate that the check tag is present:
    const ptr_0 = @intFromPtr(&checked_union_slice[0]);
    const ptr_1 = @intFromPtr(&checked_union_slice[1]);
    // size is still 16 bytes
    std.debug.print("\noffset width of array is {}\n", .{ptr_1 - ptr_0});
     // no runtime safety == undefined behavior:
    std.debug.print("UNDEFINED BEHAVIOR {}\n", .{checked_union_slice[0].int});
    // runtime safety == panic:
    {
        @setRuntimeSafety(true);
        std.debug.print("time to panic! {}", .{checked_union_slice[0].int});
    }
}

Answer: the tag is still there, but the safety-checking code isn't generated. This is what I expected would happen, and this seems basically correct to me, I don't see how the compiler could do anything else and be in any way consistent.

But this conclusively shows that the safety-tag itself is part of the type, and inherited from the runtime safety setting of the scope in which the type is defined. Without that, the compiler can't generate the safety check, so it doesn't, and conversely, it will generate the tag anyway with safety off, but not the code which checks that tag at runtime.

As an aside, the ability to control safety on a block-by-block level is a stellar language feature. Even with no granularity (all checks are either on, or they're off), it's a level of safety control which hasn't been seen since Ada. I wouldn't mind the ability to flip those features on and off individually, but the status quo is already very good.

rohlem commented 4 months ago

Should we provide a mechanism for forcing a union to be untagged that isn't tied to extern?

This is the mechanism!

Right, I agree that it works correctly and should continue to work this way.

I really just think that additionally having special syntax, like untagged union { ... } would be clearer to read and easier to write than status-quo u: {@setRuntimeSafety(false); break :u union { ... };}. Also less prone to accidentally overlook a @setRuntimeSafety when moving code around.

It was just an idea though, if nobody else wants it then status-quo is seemingly good enough.

EDIT: Per below comment, it's true this is really just about the safety, so maybe the keyword for this shouldn't be untagged but more akin to unchecked.

mnemnion commented 4 months ago

The issue I see with that is that the safety tag is, well, a safety feature, it doesn't affect the union's semantics unless there's an error in the code. So interacting with it through the safety system makes the most sense. Thanks for clarifying what you meant though.