ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.76k stars 2.54k forks source link

comptime string literal values #9056

Open marler8997 opened 3 years ago

marler8997 commented 3 years ago

I propose that Zig add support for accepting "comptime variable-length string literal values" with the following syntax:

pub fn foo(comptime s: [_]u8)

comptime s: [_]u8 would be a "comptime array" as opposed to the current convention which is to use "comptime slices" (i.e. comptime s: []const u8). There are important semantic differences between "comptime slices" and "comptime arrays". Comptime slices carry with them extra information, namely, the "memory region" they are pointing to. Two "comptime slices" that contain the same content but come from different memory regions are not the same. One reason for this is that code can access memory outside the bounds of a slice so long as it stays within its containing "memory region".

The "extra information" that comes with a "comptime slice" can be problemantic because of the nature of comptime. Unlike a runtime function which is only instantiated once, a comptime function must be re-instantiated for every unique set of parameters it is passed. This means that the "extra information" that comes with a "comptime slice" about the memory region causes it to instantiate a new function even if its not being used. This caused an infinite recursive instantiation loop in std.fmt (see https://github.com/ziglang/zig/issues/7948).

There is a proposal to mitigate this problem by "de-duplicating" const comptime slices

https://github.com/ziglang/zig/issues/7948#issuecomment-844635939

However, it does not solve the cases where the slices do actually come from unique memory regions or where they are mutable. This solution must handle odd corner cases and puts some complicated constraints on the language such as ensuring that all unique string literals have their own distinct memory region. IMO, it's a complicated solution that is hard to justify given that it still doesn't solve the problem in many cases.

I believe the simplest solution is clear when we consider what the developer's original intent is. In most cases, the intent is for the function to only be instantiated once for each unique string based on its content, not the memory region it comes from. Zig already has a way to represent this intention, namely, with "arrays". The problem is the ergonomics of accepting arrays.

One set of functions that fall into this category are the functions in std.fmt. I have created 2 alternative PR's that modify the formatType function to take fmt "comptime slices" and convert it to a "comptime array" before analyzing the rest of the function.

https://github.com/ziglang/zig/pull/8839 https://github.com/ziglang/zig/pull/8846

In the first PR, I create a wrapper function that just takes the comptime slice and forwards it to the real function as a comptime array.

pub fn formatType(
    value: anytype,
    comptime fmt: []const u8,
    options: FormatOptions,
    writer: anytype,
    max_depth: usize,
) @TypeOf(writer).Error!void {
    // NOTE: can't pass comptime sliceToArray directly, needs to be set to a local variable
    //       this might be an issue with the compiler
    const fmt_array = comptime mem.sliceToArray(u8, fmt);
    return formatTypeImpl(value, fmt.len, fmt_array, options, writer, max_depth);
}
pub fn formatTypeImpl(
    value: anytype,
    comptime fmt_len: usize,
    comptime fmt: [fmt_len]u8,
    options: FormatOptions,
    writer: anytype,
    max_depth: usize,
) @TypeOf(writer).Error!void {
...
}

And in the second, formatType accepts anytype and detects whether it got slice, and if so calls itself recursively after it converts that slice to an array:

pub fn formatType(
    value: anytype,
    comptime fmt: anytype,
    options: FormatOptions,
    writer: anytype,
    max_depth: usize,
) @TypeOf(writer).Error!void {
    if (comptime !std.meta.isArray(@TypeOf(fmt))) {
        const fmt_array = comptime std.meta.asArray(u8, fmt);
        return formatType(value, fmt_array, options, writer, max_depth);
    }
    ...
}

Each solution has its pros and cons. The first solution requires that every function create a wrapper function around their real function. This violates the principle that we want to make it "easy to write the correct code and hard to write the incorrect code". Since it's easier not to create a wrapper function, and it still works in some cases, it's likely it won't be done correctly a lot of the time. The seconds solution is smaller but falls victim to the same problem and introduces an additional drawback in that the fmt argument use the underspecified anytype in its signature instead of an explicit comptime string type.

With the proposed feature, we can avoid both of these problems. The correct code is now easy to write and we can still specify what type we are expecting in our signature:

pub fn formatType(
    value: anytype,
    comptime fmt: [_]u8,
    options: FormatOptions,
    writer: anytype,
    max_depth: usize,
) @TypeOf(writer).Error!void {
    ...
}

TypeInfo

For now the [_]u8 type will be behave like anytype when it comes to TypeInfo. Its arg type will be null unless we find reason to enhance TypeInfo to represent it.

P.S. The syntax [_]u8 was chosen in case we come up with a use case for a more general [_]T syntax. If we determine that such a general case is unwanted, then something like comptime_string would also be fine. This also leave the possibility for [_:0]u8.

zigazeljko commented 3 years ago

The syntax [_]u8 was chosen in case we come up with a use case for a more general [_]T syntax.

There is an obvious use case for such syntax: representing arbitrarily-sized comptime arrays of arbitrary types. There is no reason to limit ourselves to just [_]u8, since the same logic would apply to e.g. [_]u32 or [_]comptime_int.

marler8997 commented 3 years ago

@zigazeljko yes that's why I chose the [_]u8 syntax in case we want to extend it. However I haven't thought of an actual use case for this yet, do you know of any?

InKryption commented 2 years ago

To bring up a use-case for arbitrary Ts in the proposed [_]T syntax, there does happen to be a bit in a project where I see it being beneficial: vulkan-zig, here. The generated {}Wrapper type functions take as a parameter a comptime slice of an enum with a matching name {}Command; this has the effect of instantiating multiple instances of the type for equivalent slice content, because of the described characteristics of comptime; with a change like described, the situation there would be greatly improved.