ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.71k stars 2.53k forks source link

Proposal: Inline Parameters #7772

Open SpexGuy opened 3 years ago

SpexGuy commented 3 years ago

This issue came from a discussion of #425 at a recent design meeting. This proposal is accepted as an experiment. We think it's probably a good idea, but think that the issues with it won't appear until after it's really in use. So we're planning to add it and see how it affects the balance of the language, and then reevaluate.

Motivation

In stage 2, inline functions are planned to have different semantics from standard functions. If arguments are comptime known at the call site of an inline function, those parameters will be comptime known within that instantiation of the function. Similarly, if the returned value is comptime known, that is known at the call site as well.

This means that you can write inline functions which accept either comptime known or runtime values as input, and produce a value which may be comptime known as a result. A good example of where this is useful is std.builtin.Target.isDarwin(..). Not all targets are comptime known, but if a target is comptime known, the result of calling isDarwin() should also be comptime known.

The downside of inline for this is that it also forces the machine code to be inlined into every call site, bloating the generated machine code. This is undesirable for large functions.

Proposal

Inline Parameters

Allow the inline keyword on individual parameters of a non-inline function. If a parameter is inline, then multiple instantiations of the function may be generated - one for each unique comptime value that is passed to it, and one extra one for "any runtime value".

Functions accepting inline parameters are not necessarily considered to be generic. If you take a runtime function pointer to such a function, it will use the version for which all inline parameters are runtime values.

Inline Return Values

Similarly, we could consider having inline return values, where the function's return value is comptime known at the call site if the return expression in the function is comptime known. For the experiment, we've decided that inline returns should not require a keyword. All direct function calls will have inlined returns if possible.

A downside of inline returns is that the called function must be semantically analyzed before the caller can be semantically analyzed, in order to know if the return value is comptime known. This is especially difficult in the case of recursion. For the purpose of the experiment, we will allow the compiler to break these loops by deciding that the return type is runtime known, but if we decide to keep this behavior we will need to develop more concrete rules for how this works.

Calls through function pointers will not be able to inline their return values. To prevent having to generate multiple versions of a function, with or without return code, we can just generate one function which has return code, and have call sites ignore the return value if it is comptime known. In optimized builds, we could do more aggressive analysis to determine if a function needs return code, or consider emitting a separate version of the function for direct calls.

Another implication is that this allows functions which return a comptime-known value but have runtime side effects. This is kind of weird, but it already exists with inline functions.

kyle-github commented 3 years ago

Hmm, interesting. I feel that the meaning of inline is getting overloaded.

Let me see if I can fumble my way through this...

So the difference is that with comptime you must have the parameter be known at comptime, period. With inline if it is comptime-known, then the compiler can generate specific code for that case and if it is runtime known it calls the regular function. So in this case inline means "optimise with a local inlined copy if we know the value at comptime otherwise call it like any other normal function." Is that close?

SpexGuy commented 3 years ago

Yeah, but it's not just optimization, it's also semantic. The value from the call site is inlined into the function, and then the function is semantically analyzed. So it's possible to have code that behaves differently depending on whether the parameter is comptime known.

I feel that the meaning of inline is getting overloaded.

I think inline is precisely the correct term here -- the comptime from the call site is moved to as if it were in the parameter line.

ghost commented 3 years ago

Edit: I misunderstood the intent of the proposal, so this comment is mostly off-topic.

Couldn't the same thing be done with callsite inlining? For example, given const y = inline foo(x) the compiler guarantees that:

  1. the body of foo is inlined
  2. comptime calculations and constant propagation are performed.

Thus, if the input x is comptime-known and nothing within the body of foo prevents the result from being evaluated at comptime, then y will be comptime-known as well. foo may also contain a runtime portion and even side-effects, in addition to the comptime result. All of this is standard inlining semantics. If inline is left out, the discretion to inline and optimize passes back to the compiler, as usual.

This approach has the disadvantage of having to type inline in more places, but has the advantage of greater control and avoidance of unnecessary code bloat. I think it also makes the semantics of this feature simpler to explain and more predictable.

ghost commented 3 years ago

@zzyxyzz Obligatory not OP. It is stated in the issue why this will not work.

ghost commented 3 years ago

Ah, yes. I may have misread the issue. Let me try again:

fn foo(inline x: i32) i32 {
    // ... do something with x

    return 0; // "inline" return value, because constant
}

fn bar(y: i32) {
    const _ = foo(10); // non-inlined call to specialized function `foo_10()`,
        // shared with other use-sites of `foo(10)`.

    const z = foo(y); // Ordinary call to non-specialized function.
        // Result is known at comptime, even when `foo()` is neither inlined nor eliminated.
}

Is that correct? @SpexGuy, @EleanorNB

ghost commented 3 years ago

Unless I misunderstood something again, the inline parameter proposal is in essence a return to C inlining semantics: instead of being forced to inline a function or throw an error, the compiler is given discretion to inline, specialize or call the function normally. Only partly related to this is the inline return value proposal, which requires the compiler to analyze function bodies and see if the result is comptime-known independently of the arguments. Which would ordinarily almost never be the case, but in curried functions it might be.

AFAICT, it is that last part that actually solves the original issue with isDarwin() and other functions that you would want to precompute at compiletime if possible, but leave alone otherwise. The problem with precomputation in general is that it is a recipe for long and maybe even non-terminating compiles, often without much benefit. It is even worse in Zig, because comptime makes the difference between successful and failed precomputation potentially semantic, so it must be enabled even in debug builds. This is why disallowing constant propagation across function boundaries is at the same time annoyingly limiting and imminently sensible.

It is also true that globally inlining a function is a much too heavy-handed way of peeling off a function boundary for the benefit of precomputation. What we really want is a way to tell the compiler that precomputation of a particular function is both important and likely to be successful. This is useful for ensuring that build configuration constants get propagated and all unnecessary code statically eliminated. It is also useful for math functions, simple string parsers and surely many other areas.

In short, I'm proposing to scrap the inline parameter idea, because its behavior is too vague and there is not enough evidence for its usefulness. I'm also proposing to scrap the implicit "inline return" idea, because it is too expensive relative to the benefit it is likely to provide on an average function. Instead, I propose to have explicit precomputability annotations:

fn isDarwin() inline bool { ... }

fn cos(x: f64) inline f64 { ... }

fn parseDate(str: const[] u8) inline ?Date { ... }

A better keyword would be welcome, though.

SpexGuy commented 3 years ago

You're close, but the isDarwin case is not quite solved by inline returns alone. Sorry, I feel like I should have expanded more on this in the original post and included some examples. Let's take a detailed look at the behavior here. I'll use an explicit inline on returns to make it more clear what I'm talking about:

pub fn isDarwin(tag: Tag) inline bool {
    return switch (tag) {
        .ios, .macos, .watchos, .tvos => true,
        else => false,
    };
}

Since the tag parameter is not marked comptime or inline, the compiler is only allowed to create one instantiation of this function. In that instantiation, it must assume for analysis purposes that tag is runtime known. So there is still a boundary by default between call sites and functions. In order for the return value here to be inferred comptime known, the tag must also infer whether it is comptime known from the call site. We need a way to do this without forcing the value to be comptime known, since runtime targets are also common, especially in build scripts. This prompts the need for inline parameters:

pub fn isDarwin(inline tag: Tag) inline bool {
    return switch (tag) {
        .ios, .macos, .watchos, .tvos => true,
        else => false,
    };
} 

Because of this boundary, inline return values are not very useful without inline parameters. This also limits the amount of work the compiler needs to do. The inline parameter can be specified on only some parameters, which also helps limit the number of permutations the compiler needs to generate and analyze. For example:

pub fn applyTriangleOp(inline op: SomeBigEnum, start: u32, end: u32) u64 {
    return switch (op) {
        // using #7224 syntax to make this a smaller example, but this could be written out.
        inline else => |ct_op| applyTriangleOpInner(ct_op, start, end),
    };
}

pub fn applyTriangleOpInner(comptime op: SomeBigEnum, start: u32, end: u32) u64 {
    var i = start;
    while (i+1 < end) : (i += 1) {
        var j = i+1;
        while (j < end) : (j += 1) {
            switch (op) {
                // do something dependent on op
                // for performance it's very important that op is comptime known
            }
        }
    }
}

Here, applyTriangleOp(.add, 4, 6) will use the same function instantiation as applyTriangleOp(.add, 6, 8), because the two integer parameters are not inlined. If the op parameter is runtime known, the applyTriangleOp function will generate a large number of cases and probably should not be inlined, so this is a place where standard inline functions are not the right choice.

The inline keyword is calling for the (stage 2) semantics of inline functions to be applied to this parameter only, so I think it's the right word.


Personally I also still feel like the extra work needed to make all returns implicitly inline may be a problem for compilation time, and maybe also in other ways. However, Andrew believes that this can be implemented in stage 2 without running into significant problems, so we've decided to go ahead with the experiment. If there are significant problems here, we are prepared to back it down to an explicit keyword or even remove it entirely. For more info on the potential problems we discussed, please check out the design meeting notes.

ghost commented 3 years ago

@SpexGuy, Thanks for the explanation. Though if I may bother you with more questions...

Since the tag parameter is not marked comptime or inline, the compiler is only allowed to create one instantiation of this function. In that instantiation, it must assume for analysis purposes that tag is runtime known. So there is still a boundary by default between call sites and functions.

If we have already given the compiler licence to try to compute a comptime result, why do we need to separately give it permission to determine which of the arguments are comptime at a particular call-site? How exactly the compiler goes about its business (re-analyze the function at every call site; produce multiple instantiations; memoize known results, etc.) should be an implementation detail. Or so it seems to me.

I'm still trying to wrap my head around your second example. Not so much what it does, but where I would realistically want to use something like that. The use cases where neither a normal nor inline nor generic function is appropriate, but a lazily instantiated one is, seem to be targeted rather narrowly towards unrolled switches like that.

ghost commented 3 years ago

Regardless of whether inline parameters are a good idea, I think that a lot of potential would be left on the table if the inline return feature is implicitly tied to it. For example, trig functions and other math are good candidates for having inline returns, so that const x = cos(0.7) could be reasonably expected to be comptime. But if it isn't, then the call should go to the ordinary cos function. Under no circumstances would I want a cos_0_7() cluttering the binary. This is one more reason to require explicit inline return values and not implicitly limit them to functions with inline parameters.

SpexGuy commented 3 years ago

If we have already given the compiler licence to try to compute a comptime result, why do we need to separately give it permission to determine which of the arguments are comptime at a particular call-site?

By default, we have only given it permission to try to compute a comptime result assuming all arguments are not comptime-known (unless marked comptime). Doing more than that runs into the problems you posed above, that every function must be analyzed many times and compile time will certainly suffer.

How exactly the compiler goes about its business (re-analyze the function at every call site; produce multiple instantiations; memoize known results, etc.) should be an implementation detail. Or so it seems to me.

The optimizer is certainly allowed to do this. The limit here is specifically for semantic analysis. It limits when the compiler is allowed to formally label the returned value as "comptime known", and e.g. use it as the length of an array type. If you instantiate a type inside the function, it determines how many versions of that type there are. These things are not implementation details, they are important to know when writing code because they change the meaning.

For example, trig functions and other math are good candidates for having inline returns, so that const x = cos(0.7) could be reasonably expected to be comptime. But if it isn't, then the call should go to the ordinary cos function. Under no circumstances would I want a cos_0_7() cluttering the binary.

This is reasonable and should work. In this case the compiler should be able to recognize that the cos_0_7 function has no runtime code and not emit it, even in debug mode. Though if the implementation is just return @cos(param); a standard inline function may make more sense.

This is one more reason to require explicit inline return values and not implicitly limit them to functions with inline parameters.

I may have been unclear here as well. All functions are planned to have inline return values, not just functions with inline parameters. It's just that in most cases return values will not be comptime known unless some parameters are marked inline. The optimizer is free to inline the function and do constant propagation as it sees fit, but semantically the returned value is not comptime known if it depends on a runtime parameter.

SpexGuy commented 3 years ago

Not so much what it does, but where I would realistically want to use something like that.

Here's another example from some code I wrote recently:

fn doWholeArray(comptime mode: Mode, block_len: u32, array_len: u32) void {
    // junk_len is either comptime or runtime known based on mode
    const junk_len = if (mode == .junk_buffer) block_len else 0;

    var block_start = junk_len;
    while (block_start + block_len < array_len) : (block_start += block_len) {
        doBlock(mode, block_len, block_start);
    }
}

fn doBlock(comptime mode: Mode, block_len: u32, base_index: u32) void {
    const junk_len = if (mode == .junk_buffer) block_len else 0;

    // ... do more stuff that's not important.
    // we don't use mode or block_len here.

    if (junk_len != 0) {
        swapJunk(base_index, junk_len);
    }
}

Here, there is currently no way to pass the junk_len value from doWholeArray to doBlock, without converting it to a fully runtime value. If we could do that, we could get rid of the mode and buffer_len parameters. A better version of the code looks like this:


fn doWholeArray(comptime mode: Mode, block_len: u32, array_len: u32) void {
    // junk_len is either comptime or runtime known based on mode
    const junk_len = if (mode == .junk_buffer) block_len else 0;

    var block_start = junk_len;
    while (block_start < array_len) : (block_start += block_len)  {
        doBlock(junk_len, block_start);
    }
}

fn doBlock(inline junk_len: u32, base_index: u32) void {
    // ... do more stuff that's not important

    if (junk_len != 0) {
        swapJunk(base_index, junk_len);
    }
}
ghost commented 3 years ago

The optimizer is certainly allowed to do this. The limit here is specifically for semantic analysis. It limits when the compiler is allowed to formally label the returned value as "comptime known", and e.g. use it as the length of an array type. If you instantiate a type inside the function, it determines how many versions of that type there are. These things are not implementation details, they are important to know when writing code because they change the meaning.

Sorry for dragging out this discussion, but is it really more efficient or more "semantic" to require functions to be instantiated with specific comptime arguments before trying to compute their comptime result? In terms of implementation, the calls with specific comptime parameters need to be analyzed lazily in both cases, at which point it shouldn't make any difference whether the function is "instantiated" with the given arguments or merely "computed". Below is an example to illustrate the issue.

Status quo:

fn twice(x: usize) usize { return x * 2; }

inline fn thrice(x: usize) usize { return x * 3; }

fn foo() {
    buffer1: [twice(100)]i32 = undefined; // error! twice() is not formally comptime
    buffer2: [thrice(100)]i32 = undefined; // Works, but what if we don't want to make thrice inline?
    ...
}

New behavior:

fn twice(inline x: usize) usize { return x * 2; } // inline argument version

fn thrice(x: usize) inline usize { return x * 3; }  // inline return version

fn foo() {
    buffer1: [twice(100)]i32 = undefined;
    // compiler actions:
    // 1. twice() has an inline argument and 100 is comptime
    // 2. Does twice_100() already exist?
    // 3. No, generate twice_100() = fn() usize { const x = 100; return x * 2; }
    // 4. Analyze twice_100(); Note the fact that the result is always 200.
    // 5. Since twice_100() is comptime, rewrite statement as `buffer1: [200]i32 = undefined;`

    buffer2: [twice(100)]i32 = undefined;
    // Same as above, except that steps 3 and 4 can be skipped

    buffer3: [twice(200)]i32 = undefined;
    // Do steps 1 to 5 and instantiate twice_200() in the process.

    // ----------

    buffer4: [thrice(100)]i32 = undefined;
    // compiler actions:
    // 1. Is the array length expression comptime?
    // 2. Well, 100 is comptime and thrice() has an inline return value. Try to compute that.
    // 3. thrice(100) == 300. Memoize this result for future reference. Or not.
    // 4. Rewrite expression as `buffer4: [300]i32 = undefined`

    buffer5: [thrice(100)]i32 = undefined;
    // Retrieve memoized result for thrice(100) or recompute if none exists.

    buffer6: [thrice(200)]i32 = undefined;
    // Recompute in any case
    ...
}
daurnimator commented 3 years ago

If a parameter is inline, then multiple instantiations of the function may be generated - one for each unique comptime value that is passed to it, and one extra one for "any runtime value".

Why should this not be what happens by default?

The value from the call site is inlined into the function, and then the function is semantically analyzed. So it's possible to have code that behaves differently depending on whether the parameter is comptime known.

I think we should discourage this.

ghost commented 3 years ago

Concerning the doWholeArray/doBlock example, the intent seems to be that doBlock should exist in two versions: one that performs the swap at the end and one that doesn't. The former needs an extra length parameter to do that, while the latter doesn't. The improved implementation of doBlock takes an inline length parameter with the understanding that it will be a comptime zero if the swap does not need to be made and a runtime length otherwise.

Personally, I think this is a solution that is too subtle and brittle. Unlike the original implementation with the comptime mode parameter, it is not apparent that doBlock is intended to exist in just two different versions. In fact, if a comptime nonzero parameter is passed to it by accident, the compiler will happily instantiate a third version of doBlock. And a forth. And so on.

For cases like this, it would be safer and more expressive to have manual argument inlining, akin to Jai's #bake. In this particular case, callsite inlining would work too:

fn doBlockSwap(junk_len: u32, base_index: u32) void {
    // ... do more stuff that's not important

    if (junk_len != 0) {
        swapJunk(base_index, junk_len);
    }
}

fn doBlock(base_index: u32) {
    doBlockSwap(inline 0, base_index);
    // or: inline doBlockSwap(0, base_index);
}

A bit longer, but clear as day and without any unnecessary instantiations. (Edit: you would also have to insert a omptime if into doWholeArray to dispatch to doBlock or doBlockSwap as appropriate. This is the price for eliminating the length parameter from doBlock. May or may not be worth it.)

So color me unconvinced.

ghost commented 3 years ago

If a parameter is inline, then multiple instantiations of the function may be generated - one for each unique comptime value that is passed to it, and one extra one for "any runtime value".

Why should this not be what happens by default?

Er, I hope you are not suggesting that that for all ordinary functions f, calls with comptime arguments, e.g. f(0), should lead to separate instantiations? :smile:

daurnimator commented 3 years ago

Er, I hope you are not suggesting that that for all ordinary functions f, calls with comptime arguments, e.g. f(0), should lead to separate instantiations? smile

yes actually.

To reduce the work done, I've proposed various forms of de-duping based on the characteristics of the argument used. e.g. if you only use @sizeOf() on it, then we can re-use that instantiation for all arguments of the same size.

SpexGuy commented 3 years ago

That's an interesting idea but I think it needs to be backed up by some proof that the exponential explosion caused by the cartesian product of parameter values is controlled by these de-duping traits. It's not obvious to me that this is the case, and if it's not then Zig will lose the ability to produce small code. This would also completely break things like std.once, which depend on being called from a single instantiation point, as well as any function which uses static variables (by declaring a type with global vars). In general, functions that declare globals can never be deduplicated.

ghost commented 3 years ago

To reduce the work done, I've proposed various forms of de-duping based on the characteristics of the argument used. e.g. if you only use @sizeOf() on it, then we can re-use that instantiation for all arguments of the same size.

That sounds like on of those supercompiler projects. To my knowledge, such attempts have never really succeeded even in pure functional languages. Uncontrollable compile times and code bloat being the main problems.

ghost commented 3 years ago

Maybe I just don't know enough about Zig compiler internals. The separate instantiation thing is just a hack to be able to forward the computation from the comptime stage to the semantic analysis stage, isn't it? I sort of assumed that the two can be interleaved.

Cons-Cat commented 3 years ago

This would make Zig come a bit closer to the expressive capabilities in C++. Since constexpr was added, the language has been able to express a function that maybe evaluates at compile time, if actually possible. Users were miffed that this was added before C++ could express a function that is actually guaranteed to evaluate at compile-time, but the uncertainty in constexpr is leveraged for the benefit of C++ codebases all over the place nowadays. Personally, I like the extra control this inline proposal gives. It reminds me of a similar proposal for constexpr function parameters in the other lang.

16hournaps commented 7 months ago

This proposal does not make sense when you read the code and requires a lot of mental power to understand what is happening.

Maybe its a cool compiler feature and maybe it has a niche use-case, but the proposed syntax will confuse 99% of people when they see it for the first time, especially since I think zig has a lot of ex-C developers.

Maybe another syntax/keyword can be used.

Update: I understand that seeing something for the first time is a weak argument. I guess the main problem is that it re-uses industry-wide established "inline" terminology in a bit different/new way.

Banchee16931 commented 5 months ago

This proposal seems pretty cool. I am very, very new to zig but from a decently outside perspective (with a background in other languages that use the inline keyword), I agree with previous comments that the word inline is really confusing (especially to a new user such as myself).

I think the use of the suffix "-time" is pretty well understood so it would be an improvement over inline. Something like "trytime" (try comptime and if not default to runtime) or "autotime" (automatically decide on which to pick dependent on situation) is more understandable. There are probably better "-time" suffix words but hopefully my point comes across. This also leave the inline keyword open for future use in actual inline based functionality.