ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.83k stars 2.48k forks source link

make `@offsetOf` and `@bitOffsetOf` runtime-known for types without well defined memory layout #8642

Open andrewrk opened 3 years ago

andrewrk commented 3 years ago
test {
    const S = struct {a: i32, b: i32};
    _ = comptime @offsetOf(S, "b"); // error: unable to resolve comptime value
}

This is marked accepted, because at least one of the two things will happen:

Currently this is marked as accepted with the latter.

andrewrk commented 3 years ago

@fieldParentPtr is still planned to work fine for types without well-defined memory layout. @marler8997 pointed out this allows implementation of offsetof in userland, at runtime, by inspecting pointer addresses.

7888 proposes to make offset-of always happen at runtime, but that's not correct for types with well-defined memory layouts.

This issue is amended to have offset-of return compile-time value for well-defined memory layout types, and runtime value for non-well-defined memory layout types. This gives compiler implementations the ability to delay field layout until the end of compilation.

N00byEdge commented 3 years ago

So saying "I don't care about the memory layout of this struct, but please give me the offset of this member for my inline assembly" isn't a supported use case anymore?

marler8997 commented 3 years ago

@N00byEdge I think you could solve that by marking the struct as extern and/or packed. However, this use case and the proposal brings to light a now missing feature. When a user needs to know the layout of a struct at comptime, but they don't need it to be C abi compatible nor packed.

andrewrk commented 3 years ago

However, this use case brings and the proposal brings to light a now missing feature. When a user needs to know the layout of a struct at comptime, but they don't need it to be C abi compatible nor packed.

this issue remains open for that use case: https://github.com/ziglang/zig/issues/6700#issuecomment-710715422

N00byEdge commented 3 years ago

Okay, so will this be blocked by that then? So that we always have a way to do this in the meantime?

andrewrk commented 3 years ago

Your inline assembly has to run at runtime anyway, so this issue does not affect this use case. If you have existing code that passes the result of @byteOffsetOf to inline asm it will continue working the same.

N00byEdge commented 3 years ago

No, this specific offset value is used as an immediate in an instruction, as I cannot clobber any registers at that point. https://github.com/FlorenceOS/Florence/blob/master/src/platform/aarch64/interrupts.zig#L152-L190

andrewrk commented 3 years ago

Thanks for bringing up specific code so we can talk about a concrete example. Even when using the "i" constraint for assembly, although the value must be known in the machine code generation phase of the compiler, it does not need to be known in a comptime sense. This is similar to the address of globals in that they are technically known by the compiler at compile-time however they are not available to, e.g. Zig code running in a comptime block.

We may want to introduce the concept of "late-stage constant values" in the documentation to represent this conceptually.

Regardless of what happens I will check with you before merging a change that could possibly break FlorenceOS, to make sure your use case is still handled properly. Similarly please do reach out if a different master branch commit breaks your workflow and let's make sure there is a clear path forward for your use cases.

N00byEdge commented 3 years ago

Oh yeah, that would be completely fine, they just can't be runtime values. Any point before the code is sent to LLVM should do.

marler8997 commented 3 years ago

We may want to introduce the concept of "late-stage constant values" in the documentation to represent this conceptually.

Yeah I was unclear about how this would work given I was only considering comptime and runtime, but it sounds like there are more "times" now? Since I don't understand the times I can't come up with a name, but if you were to come up with names what would they be? Something like "zig comptime", "post-analyze-time?", "linktime", "runtime", etc.

SpexGuy commented 3 years ago

There aren't really more "times" necessarily, but there are values that are considered to be comptime known but aren't fully realized until runtime. The result of @ptrToInt(comptime_ptr) is an existing example of this concept. At runtime it's a full integer, but at comptime it is more limited. Despite this, it is still considered to be comptime known and there are some comptime operations that you can still perform on it to get back full comptime values.

marler8997 commented 3 years ago

@SpexGuy I could be misunderstanding you here, but in this particular case, the integer value of @offsetOf may not need to be known at "zig comptime", but it needs to be known before "code generation" because its value is encoded into the instructions as immediate values. The "time" before code generation is distinct from runtime and precedes it since you can't actually get to runtime until after you've generated code in the first place.

Although to be fair, I suppose this could be done after the initial code-generation and done at link-time instead by fixing the instructions with the offsetOf value after-the fact. However this isn't what I thought the plan was, it sounded like this "late-stage constant values" concept was before code-generation time, but that's why I'm not clear on what's going on here as I don't understand the design.

SpexGuy commented 3 years ago

it needs to be known before "code generation" because its value is encoded into the instructions as immediate values

This is true but more of an implementation detail. From the spec perspective (and from the perspective of a programmer using the language, and the documentation), there is only comptime and runtime. You can never run arbitrary code at "post-analyze-time" or "linktime", so it doesn't matter to the spec where in the compiler pipeline these values are resolved. It's just "somewhere between the two observable times".

marler8997 commented 3 years ago

... from the perspective of a programmer using the language, and the documentation), there is only comptime and runtime.

It looks like in this case of FlorenceOS, the distinction between "late stage constant value time" vs "runtime" is critical to making the code work, since in the first case the value can be made available to the assembler but not in the second case. Thus, this distinction is not an "implementation detail" but a requirement on the language/spec.

So the language spec can't say that this value is only available at "runtime" since it needs to be available before final code generation, but it also can't say it's available at "comptime" since you can't use it inside comptime blocks. So what would the spec say about this value?

SpexGuy commented 3 years ago

The spec would say that this is a late-binding constant. It supports a limited set of operations at comptime and a larger set of operations at runtime. We don't need to add any new intermediate times to say this. The answer to "is it comptime known" is yes.

marler8997 commented 3 years ago

But that's not the full story is it? It's true that the value supports a limited set of operations at "comptime", but it also supports a larger set of operations at a later stage of comptime as well, in this case, before assembly is performed, which again is distinct from "run time". So how would the spec define this "later stage of comptime"?

P.S. Also are you amending your statement that this is just an "implementation detail" since you've shown that this is a detail for the spec to define?

P.P.S. I also want to be clear here that if the spec doesn't share the "full story here", then you could implement a compiler that conforms to the spec but does not support the use case FlorenceOS shows us here. The spec as you wrote it does not support that use case.

SpexGuy commented 3 years ago

you could implement a compiler that conforms to the spec but does not support the use case FlorenceOS shows us here

I see what you're getting at, but I don't agree. Using one of these values as an immediate in an asm block is either one of the supported comptime operations or one of the unsupported comptime operations. If the spec doesn't make a decision here, it's incomplete. (though in the case of inline assembly the decision might be "it's target-dependent".) When the value is resolved influences the set of valid operations that the spec describes, but we don't need to actually include information about resolving the value in the spec.

This is what I meant by the "implementation detail" comment. Whether a late-binding value is bound after analysis or after code generation doesn't need to be defined by the spec, instead the spec just says "you may (not) use this value as an immediate in assembly" and compiler authors can do what they want as long as they conform to it.

it also supports a larger set of operations at a later stage of comptime

There are no "stages" of comptime. No comptime code can ever run after these parameters have been fully resolved. The set of operations it supports is constant throughout the entire comptime analysis phase of the compiler. Sometimes that means that in reality the compiler implementation is tracking the operations you perform, so that it can replay them later once it has actually resolved the value. But in terms of the programming model, comptime only happens once. This tracking limits the actual set of operations you can perform at comptime, but the spec only actually defines the limits, not the tracking.

marler8997 commented 3 years ago

@SpexGuy when I'm saying "comptime", I'm meaning "compile-time". So I'm saying that the value is not available during "zig comptime" but has to be available at a later stage in "compile-time", specifically, before assembly in this case. So I was saying that it might help to give a name to other stages of "compile-time".

SpexGuy commented 3 years ago

I understand, but what I'm saying is that I don't think those concepts will help the documentation. They may be useful to the compiler implementation, but to a programmer using the language they are unnecessary. All I care about when writing code is what operations are supported or unsupported. The technical reasons related to resolving things at link time or post-optimization but before codegen are not very important. Going into detail about when things are resolved as part of the definition of the language creates more complexity, and may lead to bad decisions if we want to keep the set of resolution times small. (e.g. "we could implement this operation on late-binding values derived from X but that would mean we have to add "pre-optimization time" so let's not to keep things simple")

marler8997 commented 3 years ago

We had a bit of a chat on discord but I see where @SpexGuy is coming from here. He is basically saying that introducing a detail such as a new stage during compile-time in the spec is not warranted by this single new use case. It adds complexity to the language and should only be added if necessary and/or proven to be helpful. For now, in this case it's enough to say that the value of @offsetOf is not available to zig comptime, but is available to assembly. If we find more cases where it's available and more values that would fall into this same boat then we can re-evaluate whether introducing a new conceptual model would be helpful.

A1Liu commented 1 year ago

Hello, I have some code that uses @offsetOf at compile time to encode type information for use later at runtime. Would this be a supported use case if this proposal were accepted? Is there another API that I'm missing or is this not a supported use case?

I've pasted some code below with stuff omitted, and the actual code that would actually work is here.

// This function runs at comptime to generate the information and store it in the binary
fn makeInfo(comptime T: type, comptime Base: type) []const EncoderInfo {
    comptime {
        const info = switch (@typeInfo(T)) {
            .Struct => |info| info,
            .Int => |info| {
                const type_info: TypeInfo = switch (info.bits) {
                    8 => if (info.signedness == .signed) .pi8 else .pu8,
                    16 => if (info.signedness == .signed) .pi16 else .pu16,
                    32 => if (info.signedness == .signed) .pi32 else .pu32,
                    64 => if (info.signedness == .signed) .pi64 else .pu64,
                    else => @compileError("doesn't handle non-standard integer sizes"),
                };

                return &.{EncoderInfo{ .type_info = type_info }};
            },

            // ... omitting other cases for brevity ...

            else => @compileError("Unsupported type: " ++ @typeName(T)),
        };

        // ... omitting error checking ...

        const spec_info: [2]TypeInfo = switch (@alignOf(T)) {
            1 => .{ .ustruct_open_1, .ustruct_close_1 },
            2 => .{ .ustruct_open_2, .ustruct_close_2 },
            4 => .{ .ustruct_open_4, .ustruct_close_4 },
            8 => .{ .ustruct_open_8, .ustruct_close_8 },
            else => unreachable,
        };

        const val = EncoderInfo{ .type_info = spec_info[0] };
        var spec: []const EncoderInfo = &[_]EncoderInfo{val};
        var b: Base = undefined;

        for (info.fields) |field| {
            const BField = @TypeOf(@field(b, field.name));
            const encode_info = makeInfo(field.type, BField);

            // ... omitting some early exit code ...

            var iter = EncoderInfoIter{ .info = encode_info };
            while (iter.peek()) |sa| : (iter.index = sa.next_index) {
                const offset = sa.offset + @offsetOf(Base, field.name);
                const new_info = EncoderInfo{
                    .type_info = sa.type_info,
                    .offset = offset,
                };

                spec = spec ++ &[_]EncoderInfo{new_info};
                if (sa.slice_info) |slice_info| {
                    spec = spec ++ slice_info.spec;
                }
            }
        }

        return spec ++ &[_]EncoderInfo{.{ .type_info = spec_info[1] }};
    }
}
nektro commented 1 year ago

you can make the layout comptime-known by using extern struct. the default .Auto layout allows the compiler/optimizer to reorder the fields at-will eg https://github.com/ziglang/zig/issues/168

N00byEdge commented 1 year ago

you can make the layout comptime-known by using extern struct. the default .Auto layout allows the compiler/optimizer to reorder the fields at-will eg #168

Yes but sometimes you're okay with reordering, reducing memory usage, using zig types and such, but still need the offsets for passing to external things.