Open andrewrk opened 6 years ago
With the new coroutine semantics, what reason is there to prefix calls with async
? There is no longer a need to provide an allocator at callsite so the prefix async is redundant. Related #1676
It would not be strictly necessary. However it makes it obvious that the function call does not return the return type of the function but instead returns the coroutine frame.
It would also be possible for async functions to call other async functions without any special syntax at all; the question becomes, is it better to force the suspend point to be explicitly pointed out, or not?
The computed stack size requirements are included in generated .h files, in comments, since C has no way to annotate stack size requirements for functions.
It should be a new attribute: __attribute__((__stack_size__(256)))
This would enhance the existing no_split_stack
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
One must use
@ptrCall(worst_case_stack_size, function, ...)
, whereworst_case_stack_size
is a post-comptime value.
This would now (since #3856) be a new option to @call
.
What would we use as the stack size for vdso functions? If the kernel doesn't already, we could propose that they include a valid PT_GNU_STACK
p_memsz
field?
would it be required to heap allocate stack frames of functions that are tail call optimized/stack size is bounded at comptime?
Re. post-comptime: I think this is an idea worth exploring on its own. In trying to work out how to better optimise struct layout, I've stumbled upon another use case; see #7888.
I thought you guys might want to know there is a huge discussion about automatic transformation of recursive algos to heap allocated rather than stack allocated, on Hacker News.
I think Zig, being so progressive, and being more of a "safe" language than C could benefit by fixing the stack overflow flaw in C.
https://news.ycombinator.com/item?id=28684098
Cheers @andrewrk . Thanks for pioneering Zig
The syntax for recursive functions via coroutines seems both confusing and extremely annoying to write, making recursion incredibly tedious to use. Using async/await to write recursive functions seems very confusing (and would also hamper performance). The minimal idea I have to make it better would be to have a builtin alternative to @call
like @alloc_call
which takes an allocator along with the usual arguments and can return an error (this could be a standard library function if the generics system Zig has allows it, I haven't written enough to know off the top of my head if you can forward variadic arguments). So the Fibonacci function would look like
async fn fib(allocator: *Allocator, x: i32) !i32 {
if (x <= 1) return x;
return try @alloc_call(allocator, fib, x - 1) + try @alloc_call(allocator, fib, x - 2);
}
It would still be messy, have the overhead of heap allocation, and mark the function as async for some reason even though it practically isn't, but at least it would provide a nice way to write it.
My alternative suggestion, however, is to provide some form of explicit stack recursion support. If it's okay for the function to say it's out of memory, then it's okay for the function to fail because it hit the stack limit and return error.WouldOverflowStack
or whatever, as long as this can be efficiently detected at runtime.
I have two main ideas for the syntax for this, both with their tradeoffs. One is a rec
keyword applied to functions, like async
. Functions which are declared rec
need to also be prefixed rec
at the callsite and a rec
call can return an error.WouldOverflowStack
and otherwise evaluates to the value of the function called.
rec fn fib(x: i32) !i32 {
if (x <= 1) return x;
return try rec fib(x - 1) + try rec fib(x - 2)
}
From an implementation perspective, I think rec
functions would have a @max_frame_size
instead of an @stack_size
. This describes the max stack depth they can reach without making another checked, recursive call, which has size 0 in the normal stack analysis because it's runtime checked instead. This would factor in any non-recursive functions they call. When you make a call with rec
, it checks that current_stack_pointer - @max_frame_size(func)
is still above min_stack_pointer
and returns error.WouldOverflowStack
if this check fails, otherwise it calls the function.
An alternative option is to move the burden of dynamically allocating a stack frame inside the recursive function. This might remove the need for a rec
keyword, I'm not sure exactly how you'd implement it. I'm imagining it would look something like this:
fn fib(x: i32) !i32 {
try @build_next_frame();
if (x <= 1) return x;
return try fib(x - 1) + try fib(x - 2);
}
The @build_next_frame()
builtin could be placed anywhere in a function, and even multiple times if you wanted to, and it would segment the stack frame into several sections, and it would extend to the next one at each call, returning error.WouldOverflowStack
on failure. The compile time stack size of a function with any number of @build_next_frame()
calls in it is the size up to the first one. Notably, this means that the above function would have a compile time stack size of 1 or 2 words (depending on the CPU architecture) for pushing the return pointer to the stack. This means that the size of the rest of the frame can include any space used (at compile time) by the recursive calls, which is again just 1 or 2 words. This has several advantages over the previous suggestion, such as making recursive functions into just normal functions that can return an error. It makes recursive functions arguably cleaner to write because they only have to acknowledge that they're recursive in one place (and otherwise just handle that they can return an error) instead of in the definition and in every place where they make a recursive call. It would also allow you to make conditional stack allocations that don't factor into the compile time stack size of the function, which could be useful for large buffers that aren't always used.
fn example(x: i32) i32 {
// do stuff
if (x = 0) {
@build_next_frame() catch @panic("");
const y: [100000]i32 = undefined;
}
}
The obvious disadvantage over the previous suggestion is that a rec
keyword makes relative sense and is fairly intuitive while this try @build_next_frame()
BS is something nobody has ever seen. Additionally, something would need to be done to ensure that the return value of @build_next_frame
is actually handled, because obviously just continuing into the rest of the function would be invalid. A potential alternative formulation would be some kind of block, like
fn fib(x: i32) !i32 {
try @with_dynamic_frame {
if (x <= 1) return x;
return try fib(x - 1) + try fib(x - 2);
};
}
This variant would probably need to be a keyword (like dynamic_scope
) or whatever rather than a builtin, since builtins are things with function-like syntax, but nonetheless.
The rec
keyword would also be a better hint to the compiler when performing static analysis; ideally, the compiler would be able to determine that fib(x)
has a call depth of x
and then it can elide all the checks except at the final callsite, where it just checks that the space used by the entire thing is fine. This would be inconsistent at best (e.g., handling recursive list traversal this way would probably be impossible), but it's worth considering and rec
might be better suited to it.
Either way, even without static analysis like that, I think either of these is more readable, easier to write, and more performant than coroutine-based recursion on the heap, while being exactly as safe.
It would also I think be reasonable to supplement this with an @tailcall
built in, which tells the compiler that if it can't compile this recursion to a loop, it should throw a compiler error. This allows you to write a recursive function while guaranteeing that it actually has a known stack size. The most basic version of this would be extremely simple on the implementation side in that it would not try to do anything clever. If it's literally a tail call already and you can just replace call .label; ret
with jmp .label
, you do that and succeed, and otherwise you fail. So
fn fact(x: i32) i32 {
fn helper(x: i32, acc: i32) i32 {
if (x <= 1) return acc;
return @tailcall(fact, x - 1, acc * x);
}
return helper(x, 1);
}
Would compile just fine, because it's a literal tail call and trivially optimizable to a loop while
fn fact(x: i32) i32 {
if (x <= 1) return 1;
return @tailcall(fact, x - 1) * x;
}
would not be supported in a most basic version because an accumulator variable would need to be added by the compiler. That said, it would be nice to try to support some stuff like the latter, it seems like a rather low priority. Either way, the crux would be that it lets you write a "recursive-like" function which uses a compile time known stack space.
The final issue I see is that the current handling of extern functions would make it extremely difficult to work with C interop, while also not actually being safe anyway. One of the big draws of Zig is being able to automatically generate C bindings, but this would make it impossible to call a C function on any system where the stack size is limited to the architecture default or less, I believe? I'm not sure what happens if you run an executable that asks for a specific stack size on a system with ulimits set below what it asks for. Providing accurate stack size annotations to C functions would be extremely difficult. On the flip side, the default stack size is not actually guaranteed to be safe for C functions anyway, because those can have unchecked recursion whose depth depends on the arguments. It's unclear to me how best to handle this because you can't actually check a non-cooperative function's stack usage. At best, you can protect the page at the end of the stack and set a signal handler to make a stack overflow runtime checked undefined behavior instead of just "undefined behavior, good luck." Unlike with the recursive functions, I wasn't able to come up with anything that feels like a good solution, but it felt worth noting that extern functions, since they can be recursive, are still inherently unsafe even if you allocate the amount of stack size they're supposed to run within.
From an implementation perspective, I think rec functions would have a @max_frame_size instead of an @stack_size. This describes the max stack depth they can reach without making another checked, recursive call, which has size 0 in the normal stack analysis because it's runtime checked instead. This would factor in any non-recursive functions they call. When you make a call with rec, it checks that current_stack_pointer - @max_frame_size(func) is still above min_stack_pointer and returns error.WouldOverflowStack if this check fails, otherwise it calls the function.
Something similar to this^. Everything should be gucci if you are able to tell a function how many times it can loop around itself, and if it goes beyond that then throw a runtime error that we can check against. The compiler should guarantee that there is enough stack allocated for any defered operations so that the developer defined recursing index can be reached without a stack overflow.
This is to mimic what we could be doing manually, i.e allocate arrays of maxNestSize for each required variable in the function, add an extra array of maxNestSize to store identifiers of different branches of entry with different deferred operations, confirm that the arrays don't overflow, and create a reversed loop to process the defferred operations, but in some cases it feels clunky, so would be nice if the compiler were finally able to automize the fix for the out of date design of deferred operations.
My alternative suggestion, however, is to provide some form of explicit stack recursion support. If it's okay for the function to say it's out of memory, then it's okay for the function to fail because it hit the stack limit and return error.WouldOverflowStack or whatever, as long as this can be efficiently detected at runtime.
I was having exactly the same idea. Furthermore, maybe this would allow @alloca
to be reintroduced, since it will be able to fail in the same fashion.
TBH I'm rather sceptical about the feasibility of static stack analysis. There are definitely cases where it could work, but I doubt it can cover a sufficiently wide spectrum of situations in a reasonable/usable way, so it'd be nice to be able to not count on it.
segmented stacks/stack splitting seems like a simpler approach to the recursion issue vs converting to a coroutine and then doing heap allocations.
This is a proposal to solve #157.
First we must introduce a new kind of value, one that is neither
comptime
nor runtime. It is a value that is known at compile-time, but only after all semantic analysis is complete, and so it cannot participate incomptime
code. For the purposes of this proposal, this kind of value is called a "post-comptime" value. An actual comptime value can be used where a post-comptime value is expected. Simple arithmetic such as+
,-
, and*
can be used on post-comptime values, to produce another post-comptime value. There also will be some way to do amax()
operation on post-comptime values.External Functions
extern
function declarations have a default stack size requirement, which is dependent on the target. For example, on x86_64 Linux, the default is8388608 bytes
(8 MiB). This matches the default stack size on Linux for executables as well as default pthread stack sizes. Zig treats everyextern
function as if it needs at least this much stack space.When an external function is known to require a larger or smaller amount of stack space, it can be annotated:
Here,
foo
is known to require only 912 bytes of stack space. In this example, we provided a comptime value, but the expression can be a post-comptime value.Function Pointers
One does not simply call a function pointer.
One must use
@ptrCall(worst_case_stack_size, function, ...)
, whereworst_case_stack_size
is a post-comptime value.This inserts a runtime safety check - once compilation completes, all function stack size requirements are known, and inserted into the binary. A function pointer call checks if the provided
worst_case_stack_size
value is correct.This ensures that during semantic analysis, the compiler can rely on this value when calculating worst case stack size.
Most APIs will want to accept
comptime
functions rather than function pointers, as this avoids@ptrCall
. For example,std.mem.Allocator
,std.io.InStream
,std.io.OutStream
, andstd.rand.Random
.Recursion
Recursion, whether direct or indirect will be solved with the coroutines rewrite. See #1260.
Call graph analysis detects loops:
This will be fixed with coroutines:
Calls to this function grow heap memory, not stack memory. The function can return
error.OutOfMemory
. This is a leaf node in the call graph analysis; it does not store any information on the stack.Annotating Additional Stack Usage (Inline Assembly)
In some cases, it may be necessary to annotate that more stack is used, even when it does not come from a function call. One such example is inline assembly that directly accesses the stack.
For these cases there will be a function
@declareStackUsage(byte_count)
, wherebyte_count
is a post-comptime value.Because Zig is not able to trace how this builtin call moves with LLVM's function inlining optimization passes, the value is globally added to the root node of the call graph.
Threads
Currently in
std.os.spawnThread
, Zig allocates a hard coded 8 MiB per thread.With this proposal, it will use a new builtin function
@stackSize(function)
. This builtin returns a post-comptime number of bytes of the worst case stack size required to callfunction
.Main Function
Zig knows the start symbol for the target. This is how it avoids including
std/special/bootstrap.zig
when a program manually exports the start symbol.During semantic analysis, Zig uses
@stackSize(start_function)
internally to determine stack size requirements, and passes this to the linker.Exported Functions
Zig internally runs
@stackSize
on all exported functions, in order to run the call graph analysis and emit compile errors for recursion, etc. The computed stack size requirements are included in generated .h files, in comments, since C has no way to annotate stack size requirements for functions.