ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.65k stars 2.53k forks source link

Incorrect behaviour when writing to a field of a packed union that is not a multiple of 8 bits #17360

Open kcbanner opened 1 year ago

kcbanner commented 1 year ago

Zig Version

0.12.0-dev.700+376242e58

Steps to Reproduce and Observed Behavior

const std = @import("std");
const testing = std.testing;

const U = packed union {
    foo: u12,
    bar: u29,
    baz: u64,
};

test {
    var u: U = undefined;
    u.baz = 0xbbbbbbbb_bbbbbbbb;
    u.foo = 0xeee;

    try testing.expectEqual(@as(u12, 0xeee), u.foo);
    try testing.expectEqual(@as(u29, 0x1bbbbeee), u.bar);
    try testing.expectEqual(@as(u64, 0xbbbbbbbb_bbbbbeee), u.baz);
}
$ zig test packed_union.zig
Test [1/1] test_0... expected 465288942, found 465243886
Test [1/1] test_0... FAIL (TestExpectedEqual)
/home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.700+376242e58/lib/std/testing.zig:84:17: 0x22462b in expectEqual__anon_1104 (test)
                return error.TestExpectedEqual;
                ^
/mnt/c/cygwin64/home/kcbanner/temp/packed_union.zig:16:5: 0x22478a in test_0 (test)
    try testing.expectEqual(@as(u29, 0x1bbbbeee), u.bar);

Expected is 0x1bbbbeee, but the actual value is 0x1bbb0eee. The failure also occurs in the x86_64 backend (-fno-llvm -fno-lld).

I confirmed with a debugger this is on the write side of things, after foo is written, the backing memory contains 0xbbbbbbbbbbbb0eee.

Expected Behavior

Test passes.

rohlem commented 1 year ago

The issue title says packed struct while the code contains a packed union. Which of those was this issue intended for?

Assuming that it's for packed union it seems like the compiler is translating the store to foo as storing a u16 value 0x0eee, which reminds me of https://github.com/ziglang/zig/issues/11263 (still opened against the old C++ compiler implementation).

EDIT: The one issue / ambiguity with this code I see is that all union-s are only meant to represent one of their states. Langref states: To change the active field of a union, assign the entire union [...] . packed union is defined to be eligible to be in a packed struct, but we currently don't document only to access the bits of one particular field, so touching bits occupied by other states is (as per current documentation) fair game.

IMO to be 100% conformant to current langref you have to wrap each field in packed struct, to tell the compiler these are bits you care about, and not just padding:

const std = @import("std");
const testing = std.testing;

const U = packed union {
    const Foo = packed struct {
        value: u12,
        keep_the_same: u52,
    };
    const Bar = packed struct {
        value: u29,
        keep_the_same: u35,
    };
    foo: Foo,
    bar: Bar,
    baz: u64,
};

fn expectations(u: U) !void {
    try testing.expectEqual(@as(u12, 0xeee), u.foo.value);
    try testing.expectEqual(@as(u29, 0x1bbbbeee), u.bar.value);
    try testing.expectEqual(@as(u64, 0xbbbbbbbb_bbbbbeee), u.baz);
}

test "works" {
    var u: U = undefined;
    u.baz = 0xbbbbbbbb_bbbbbbbb;
    u.foo.value = 0xeee;
    try expectations(u);
}

test "extra-pedantic" {
    var u: U = undefined;
    u = .{ .baz = 0xbbbbbbbb_bbbbbbbb };
    u = .{ .foo = @bitCast(u) };
    u.foo.value = 0xeee;
    try expectations(u);
}

These modified tests actually pass for me with version 0.12.0-dev.297+d2014fe97 in all build modes.

kcbanner commented 1 year ago

Ah, yes I misnamed the issue, fixed.

Indeed, that is what is happening, it's storing a u16 there.

I was confused about the intended behaviour of this as well, but Andrew clarified that it should just be overwriting the 12 least significant bits.

rohlem commented 1 year ago

Andrew clarified that it should just be overwriting the 12 least significant bits.

verbatim message for transparency without Discord access:

yeah so intended behavior is that u.foo = 0xeee overwrites the least significant 12 bits with 111011101110, leaving the rest intact

The current semantics value speed over preserving padding bits (which the user might not care about - packed struct fields that are union-s must be packed union-s). If it's okay I'd like explicit confirmation from @andrewrk regarding this trade-off, since the original context (to me) read like "should this work" and not "which default do we prefer".

Although, since there are actually two syntactic variants of writing this:

u.foo = 0xeee; //(1) overwrite an inactive tag of the union
u = .{.foo = 0xeee}; //(2) overwrite the union with a new value holding a different tag

Syntactically (2) assigns "the whole union", while (1) can be interpreted to "only touch the bits of field foo". So to me it would make sense if (1) preserves padding but (2) keeps current semantics (doesn't have to preserve those bits). That would be a way to optimally support both use cases.

Would that be acceptable? Or is the difference too easily overlooked, which could confuse readers?

kcbanner commented 1 year ago

Although, since there are actually two syntactic variants of writing this:

This reminds me, there is an additional odd behaviour I noticed while working on https://github.com/ziglang/zig/pull/17352, which was that the u = .{.foo = 0xeee}; method (which I assumed would overwrite the whole variable) actually acts like u.foo = 0xeee; (overwrite just the field) when used at comptime. Maybe this is something to do with RLS semantics?