ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.98k stars 2.48k forks source link

Proposal: Comptime Memory Islands #7770

Open SpexGuy opened 3 years ago

SpexGuy commented 3 years ago

This proposal is a solution to #382, and allows aliasing to be detected at compile time without forcing the compiler to decide exactly how it will lay out globals in memory before the optimizer runs. This was discussed in a design meeting.

At compile time, you can only create new memory by declaring a global value or a local variable/constant. Each of these values is independent from the others. At compile time, you can use @ptrToInt on one of these values, but the resulting value isn't actually comptime known. It will cause a compile error if you try to do anything meaningful with it at comptime.

Similarly, offsetting an array pointer outside of the bounds of a variable and reading the value will never read anything meaningful (and will always be a compile error). So there is no way to offset a pointer to one value to point into a separate value.

In this way, each unique comptime value is its own memory island, completely detached from the others. It doesn't make sense to compare pointers to separate islands, so the language currently disallows any form of pointer comparison at compile time. However, aliasing within an island is possible, and without pointer comparison it cannot be detected.

Once we get to runtime, all globals get put into a single "memory island" (ignoring for the moment the memory spaces proposal, which would make multiple islands at runtime also).

So let's get to the proposal:

Introduce a new builtin, @arePointersComparable(a, b). This builtin accepts two pointer values, and returns a comptime known boolean value. If either value is runtime known, it returns true, because comparison would happen at runtime. But if both a and b are comptime known, it returns true only if they are both pointers to the same memory island.

If this returns true, then @ptrToInt values derived from these pointers may use <, >, <=, >=, and -. So this is a safe aliasing check that works at compile time:

inline fn slicesAlias(comptime T: type, a: []T, b: []T) bool {
    // slices count as a pointer type
    if (@arePointersComparable(a, b)) {
        // the condition above is comptime known, so
        // this is only compiled if it returns true
        return @ptrToInt(a.ptr + a.len) <= @ptrToInt(b.ptr) or
               @ptrToInt(b.ptr + b.len) <= @ptrToInt(a.ptr);
    }
    return false;
}

And this code will safely convert a pointer to an element of a slice to an index at comptime:

inline fn ptrIndex(comptime T: type, slice: []T, ptr: *T) usize {
    // At comptime, if these two values come from separate islands,
    // this is an error.  At runtime it may be UB, depending on how
    // our aliasing rules end up working out.
    // But as long as the pointer is actually in the slice, this is
    // well defined and even works at comptime.
    const offset = @ptrToInt(ptr) - @ptrToInt(slice.ptr);
    const index = @divExact(offset, @sizeOf(T));
    assert(index < slice.len);
    return index;
}
kyle-github commented 3 years ago

This kind of reminds me of the guarantees that the C specification places on pointers. I.e. the whole concept of "provenance" in the spec. In this case every comptime variable has a different provenance.

Note that with multiple address spaces, pointers are not necessarily comparable there either. So some of this would apply at runtime if this is adopted.

lemaitre commented 3 years ago

To me, if you go at the very bottom of this, you end up with a new type: comptime_address. A comptime_address value would be a pair: a base (or provenance, or island) and an offset.

If we go even further, we could have a run-time equivalent of this type: address. The base would remain a compile-time known identifier, but the offset would be run-time known. This would allow to easily bring more of the compile-time safety into run-time. At this point, @ptrToInt and @intToPtr would be changed to @ptrToAddr and @addrToPtr.

Of course, there is still a need to fully run-time known address (like dynamic allocators), and plain integer would still be good for that.