ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.07k stars 2.49k forks source link

async/await/suspend/resume #6025

Open andrewrk opened 4 years ago

andrewrk commented 4 years ago

This is a sub-task of #89.

linkpy commented 1 year ago

I'm willing to try implementing async/await/suspend/resume for stage2 as i require them for a project i'm working on.

The issue is that i dont really know where to start. It seems like AstGen supports them. Sema doesn't (calls to failWithUseOfAsync, so here I know where I need to work) so I'll start with that.

The AIR only have async_call, async_call_alloc, suspend_begin, and suspend_end instructions. By looking at stage1 it seems like the await/suspend/resume instructions are missing. Should I try to just add the instructions and replace calls to failWithUseOfAsync by looking at how stage1 implements them?

Futhermore, is async implemented in a similar way in stage2 as stage1? (basically stage1 being a good representation of how stage2 implements and uses frames, async calls, suspends, resumes, etc)

Edit: I've been using the stage2-async branch, assuming that's where the async development is being done.

kuon commented 1 year ago

I've been following the WASI development and it seems to be going great! That being said, I am currently working on a new project and I am using some specific stage2 features. I am not using async yet, but I'd love to introduce it soon. Can you provide a very rough estimate of when this is planned to be merged in master? It is just for general planning (no pressure). Cheers!

dotnwat commented 1 year ago

Looking forward to this.

andrewrk commented 1 year ago

https://ziglang.org/news/0.11.0-postponed-again/

ethindp commented 1 year ago

Can I use async primitives in 0.11.0 or will they be hard errors? I'd like to use them for interrupt handling.

lem0nify commented 1 year ago

@andrewrk Could you give a realistic estimate (not in releases, but in time) when asynchronous functions will return to the language?

maxzhao commented 1 year ago

0.12

lem0nify commented 1 year ago

@maxzhao 0.12 is kinda in one more year (or maybe half-year in best case). It will come much earlier into master I suppose. That's why I asked for estimate in time, not in releases.

KenjiTakahashi commented 1 year ago

I suppose then it would be useful to update the docs? They still state Async functions are being temporarily regressed and will be [restored before Zig 0.11.0 is tagged](https://github.com/ziglang/zig/issues/6025). in the Async Functions section.

mlugg commented 1 year ago

I was querying some implementation details of this with @andrewrk and he suggested moving the conversation here so it's recorded on the issue tracker. For reference, my questions are as follows:

Q1. Frames

What exactly does the @Frame of a function store? I've heard Andrew mention before that it contains all values spilled across suspend points, as well as a value indicating the index of the last suspend, with the idea being that the function effectively becomes a switch on this index to continue the function where we left off (accessing spilled locals also from the struct). However, this doesn't align with the idea that async can be used to avoid stack overflow in the case of recursion (see #1260 and #1639), for which we want all stack values to end up in the @Frame. As far as I'm aware, LLVM doesn't provide us with a way to move the stack allocations it creates during codegen, nor to even know how big they are. So, short of adjusting the stack pointer in a platform-dependent way (which still wouldn't explain how we determine the frame's size) how is this goal achieved?

Q2. Function Pointers

This question is about the language specification. When taking a pointer to a function, we may not yet know whether or not it is async (as this is determined after semantic analysis). More to the point, even if we did know this, it's not a part of the function's type, so we can't differentiate between an async and non-async function when calling a runtime-known function pointer. Since the code generated for calling async and non-async functions is necessarily different, how is this handled? As far as I can tell from looking through code from the old C++ compiler implementation, this just wasn't handled at all before, and you'd probably just get a crash if you attempted to call an async function through a pointer. The best solution I can think of to this is just that you can't take a reference to an async function (a compile error is emitted if you try) - we could emit these errors retroactively after determining a function is async - but perhaps there's a better solution I'm not seeing.

andrewrk commented 1 year ago

I agree with myself that @Frame stores all values spilled across suspend points. Safe recursion (#1006) was never accomplished in practice, and it's still something I want to pursue but I don't think it needs to be a requirement of landing async functions back into the compiler. I think I probably just didn't consider that fact about everything needing to be spilled. Or maybe I did and figured out some trick to make it work, like detecting that the function is recursive and forcing spillage. That might be a bad idea though. Still, there are other ways to tackle safe recursion, and that can be solved separately.

As for your second question, you hit the nail on the head. The compiler did not handle that case at all, and it caused all kinds of terrible problems. Async functions were certainly of experimental quality. I suggest to make it a compile error for now and we can go from there.

mlugg commented 1 year ago

Thanks! Some minor follow-up questions:

andrewrk commented 1 year ago

Aside from spilled locals, what needs to be stored in an async frame?

I do recommend to follow the codegen.cpp source here. There wasn't anything unused there. Off the top of my head it's:

I think there was a function near the bottom of analyze.cpp that spelled it out pretty linearly. resolve_async_frame or something along those lines.

The first part of the @Frame(foo) type has the same memory layout as anyframe->T which can be awaited without knowing exactly which function is being awaited, and the first part of that has the same memory layout as anyframe, which can be used with resume, without knowing the return type of the function being resumed.

I suppose make the return address null until the await?

You should look at how codegen.cpp lowered return and await, there is an atomicrmw that makes this stuff threadsafe.

What's the deal with @Frame of a non-async function?

The idea is that you can use async and await keywords on functions that do not have any suspend points, and they still work correctly even though the function has already fully returned at the async site.

Recommend to look at test/behavior/async_fn.zig.

What's the deal with std.builtin.CallingConvention.Async?

This is needed to differentiate runtime-known function pointers to async functions vs non-async functions. Functions which have nonzero suspend points require the async calling convention. Functions with zero suspend points may be lowered with many different calling conventions, including the async calling convention.

Every function call could potentially be an async function call, and the compiler does not find this out until the function call graph is fully analyzed. Thus every function call must have access to a result location. e66190025ffab39527da601980b7e3211069b6f5 basically must be reverted in order to implement async functions.

mlugg commented 1 year ago

This is needed to differentiate runtime-known function pointers to async functions vs non-async functions.

I'm a little confused on what you mean here, since to my initial Q2 you said that async function pointers weren't really a thing that was handled at all in stage1, and as you say, we don't know until after semantic analysis whether a function we took a reference to was async or not. I understand that the actual lowering of an async function must use a consistent calling convention (so that it can be resumed by e.g. an event loop after being type-erased), but why is this concept relevant to the frontend?

[...] Thus every function call must have access to a result location.

Hm, but a plain call foo() to an async function doesn't actually use its result location - the frame is (I presume) implicitly allocated into the caller's frame. So, surely we would only need a result location if explicitly providing the frame pointer via async foo() syntax? I think I'm missing something here. Perhaps it'll become clearer to me after reading the stage1 logic (I'm just about to pull it up).

andrewrk commented 1 year ago

It was possible to use async function pointers with @asyncCall, however, coercing an async function pointer to a non-async function pointer was incorrectly allowed.

the frame is (I presume) implicitly allocated into the caller's frame

Sorry, I was thinking of async calls, not regular calls. This was the main motivation for result location semantics:

var static_frame: @Frame(foo) = undefined;

test {
    static_frame = async foo();

    const heap_frame = try std.testing.allocator.create(@Frame(foo));
    heap_frame.* = async foo();

    var stack_frame = async foo();
}

You are correct that plain calls to foo() will implicitly allocate in the caller's frame. So my commit message there was correct in identifying #2765 as the collateral damage there rather than async functions.

Async calling convention does not make foo() behave like async foo(); it makes the function have a compatible runtime function pointer type with functions that have suspend points. There is certainly an async calling convention, I mean think about how async function calls are lowered very differently than normal function calls.

mlugg commented 1 year ago

Ah, okay, that all makes sense. So, to define these semantics a little more formally, here's my understanding: any comptime-known function [pointer] with default (Unspecified) callconv can coerce to a function [pointer] with Async callconv - for async functions this is the only valid way to call the function through a pointer (and we can hopefully make having a runtime-known async function pointer without Async callconv a compile error), while for non-async functions, I assume it generates a trivial wrapper matching the Async callconv? i.e. which takes a trivial frame and just "unwraps" the call to the normal function, then puts the result into the frame.

andrewrk commented 1 year ago

All correct.

To be clear: async foo() on a function with unspecified calling convention and no suspend points does not need a wrapper; instead it returns @Frame(foo), a trivial wrapper around its return value which is then unwrapped with await.

mlugg commented 1 year ago

Sure, that makes sense. But if we instead did this:

var runtime: *const fn () callconv(.Async) void = &someNonAsyncFunction;
const frame = @asyncCall(frame_buf, null, runtime, .{});
await frame;

This would require a wrapper function, yes? Since we must make someNonAsyncFunction comply to the Async callconv where it previously did not.

andrewrk commented 1 year ago

Agreed, that would require a wrapper function.

mlugg commented 1 year ago

Okay, I think I know everything I need to to get started on this now - thanks for the help!

paeifbnaeufbpae commented 9 months ago

Is the whole async thing something that's strictly necessary or just some syntax/language sugar? I.e. is there a workaround or are there certain things that simply aren't possible in Zig currently without proper async support?

Keith-Cancel commented 8 months ago

Q2. Function Pointers

This question is about the language specification. When taking a pointer to a function, we may not yet know whether or not it is async (as this is determined after semantic analysis). More to the point, even if we did know this, it's not a part of the function's type, so we can't differentiate between an async and non-async function when calling a runtime-known function pointer. Since the code generated for calling async and non-async functions is necessarily different, how is this handled? As far as I can tell from looking through code from the old C++ compiler implementation, this just wasn't handled at all before, and you'd probably just get a crash if you attempted to call an async function through a pointer. The best solution I can think of to this is just that you can't take a reference to an async function (a compile error is emitted if you try) - we could emit these errors retroactively after determining a function is async - but perhaps there's a better solution I'm not seeing.

I believe there are likely better options than simply crashing or disallowing pointers of an async function. An async function could have two entry points. For instance, the first entry point could adhere to the normal function calling convention and determine where to jump to continue execution, along with how to restore the frame state. Essentially, it would be akin to wrapping an async function in a normal function, which is not unusual since Zig, to the best of my knowledge, it aims to be color-free. The second entry point would be invoked if the function was directly called as an async function.

If you fix the code size of the first entry point, it should still be possible to cast a function pointer back to an async function, as you could always calculate the position of the second entry point from that. This doesn't pose a significant issue; for example, the first entry point could always be a backward jump to a larger chunk of machine code specific to that function. Then, the second entry point would consistently proceed after that jump instruction.

Further having such a jump instruction is not unheard of at the beginning of a function. For instance hot patchable functions often would just have nop instructions or a jump instruction jumping x bytes forward at the beginning. These could then could be later modified to a jump to the new patched function.

ethernetsellout commented 8 months ago

Is the whole async thing something that's strictly necessary or just some syntax/language sugar? I.e. is there a workaround or are there certain things that simply aren't possible in Zig currently without proper async support?

On paper async does enable things you couldn't otherwise achieve. Coroutines allow the 'pausing' and 'resuming' of functions, something you really can't achieve without inline assembly; you can only emulate this in various ways. Async allows control of the memory location of a function's frame. This is important to the goals of the project, as one can use this to prevent a recursive function from taking up an arbitrary amount of stack by allocating its frames on the heap.

It's possible to use zig's async for generators. I am not sure if this is the intended use case though, who knows how efficient or 'nice' this may be.

In practice many things are possible without async. Async for event loops and thread pools can be replaced by function pointers w/ state. I'm struggling to think of any practical examples where async code is strictly necessary.

applejag commented 8 months ago

Async is super useful when you have lots of smaller tasks.

Such as if you lint/validate a bunch of files. Doing each file in a separate thread is wasteful. Writing your own scheduler with semaphores is tedious. Being able to just rely on a language feature for this is super useful. (Edit: oh and especially when you want parts to depend on other parts via await, such as await lintFilesInDir("foo"))

Similar to, we don't strictly need for loops when we have while loops. But they're such a nice quality of life feature.

kuon commented 8 months ago

Is the whole async thing something that's strictly necessary or just some syntax/language sugar? I.e. is there a workaround or are there certain things that simply aren't possible in Zig currently without proper async support?

I will not enter the "philosophical" part of it, as there are many benefits, but I will give you a concrete example.

I use zig in WASM in the browser, and sometimes I need to use JS api only available with promise.

With async, it would look like this (pseudocode):

// from zig
suspend {
current_frame = @frame()
my_js_function_called_from_zig(js_function_arguments, current_frame)
}
// from JS
function my_js_function_called_from_zig(args, frame) {
  mypromise_func(args).then(() => zig_resume(frame)) 
} 
// back in zig
fn zig_resume(frame) {
  resume frame
}

At present, I implement this with a JS worker that do a blocking loop with Atomic, but it is far from ideal.

noonien commented 7 months ago

Please, don't go the horrendous path of async. It'll be a massive time sink, the ABI for function calling will never be the same, resulting in function colors, and the entire ecosystem will either be split, or all async.

Could you look into stackful continuations and effects instead?

kuon commented 7 months ago

Please, don't go the horrendous path of async. It'll be a massive time sink, the ABI for function calling will never be the same, resulting in function colors, and the entire ecosystem will either be split, or all async.

Could you look into stackful continuations and effects instead?

Do you mean something like setjmp and longjmp in C? And for effect what do you have in mind?

I think that in general this is a good discussion to have. I did plenty of rust, and I hate their futures as a user even if many of the design parts make sense.

To me, it ends up to uses cases. We must find what problem we (the developers that use the language) want to solve. In https://github.com/ziglang/zig/issues/6025#issuecomment-1914725896 I explain a use case that would be covered by a non local jump. But there are more uses of async/await.

As a note, in practice, if I look at how I work with JavaScript as it is there I do the most async now. I realize that 99% of the time, I only want to write sequential imperative code. But I must have await everywhere because one function down the stack calls a JS api that must be async because the result is not immediate.

I think Apple's API such as dispatch and run loop can be of inspiration when thinking around concurrent API design.

mlugg commented 7 months ago

[...] resulting in function colours [...]

Why does a function's ABI introduce function colours in any practical way? The only thing it really means is that you can't get a default-callconv function pointer to an async function; if you need to support async functions somewhere that you're using function pointers, then you can use callconv(.Async) pointers which also support non-async functions. There is technically some colouring here, but given how rare function pointers are... not in any practical sense.

Per my understanding, the goal of Zig's colourless async can be framed as that if I have a non-async project, I can throw an async call somewhere into it - potentially causing hundreds or thousands of functions to in turn become async - and it'll basically Just Work.

[...] the entire ecosystem will either be split, or all async.

A huge benefit of colourless async is that it should help avoid this problem. If Alice writes an async version of a package, and Bob a synchronous version, they should both be able to be plugged straight in to any project - regardless of whether or not it is already async - and everything should work with only minor changes.

noonien commented 7 months ago

@mlugg

Why does a function's ABI introduce function colours in any practical way? The only thing it really means is that you can't get a default-callconv function pointer to an async function;

It seems you have answered your own question

A huge benefit of colourless async is that it should help avoid this problem. If Alice writes an async version of a package, and Bob a synchronous version

Again, you've proved my point, the ecosystem would be divided into async and normal code.

@kuon

Do you mean something like setjmp and longjmp in C?

Continuations can be implemented by using setjmp and longjmp, yes.

The Wikipedia page for Continuations does a good job of explaining them, as well as providing a list of languages that support them.

It's important to understand what async is and what problems it solves. I would say there are two parts of what people consider async to be:

Delimiting computation means splitting up a function into multiple parts, providing the ability for each part to be executed in multiple ways, and allowing for control flow to be more flexible.

Async/await is what we usually call an implementation of a subset of stackless coroutines.

Coroutines are an abstraction that helps with delimiting computation, however, compared to just async/await they have the additional benefit of being able to yield multiple times. These are also usually stackless.

Continuations are a much lower level of abstraction for delimiting computation, however, they are much more powerful, providing the ability to capture the control state of the current computation as a first-class value. This can be used to implement coroutines, generators, and even effects.

Continuatinuations are usually implemented by capturing the stack, or by having a split stack system, this means that there is just one ABI for calling functions, the same one everyone else uses, this allows calling into C, or any other language, without any issue.

And for effect what do you have in mind?

There is also a good Wikipadia page for Effect systems.

Effects are in essence a control flow method.

They can be used to implement cooperative scheduling (as with async + Future/Promise), but also much more than that, like function purity because they can be used to abstract over "colors".

Here are some examples of effects: Concurrency, I/O, Error handling, Allocation Because effects can be abstracted upon, the user can plug-in their own functionality for either of those, and more.

Continuations and effects complement each other very well, both have been very well studied for a long time, and continuations are also implemented in quite a few mainstream languages, while effects are just now gaining more attention in languages like OCaml, JavaScript has a proposal for algebraic effects, and there are new languages that experiment with them, like Koka, Eff, Unison.

omentic commented 7 months ago

Please, don't go the horrendous path of async. It'll be a massive time sink, the ABI for function calling will never be the same, resulting in function colors, and the entire ecosystem will either be split, or all async.

Is not the whole point of Zig's async/await implementation and its focus on @Frames to not introduce this split?

mrschyte commented 7 months ago

Please, don't go the horrendous path of async. It'll be a massive time sink, the ABI for function calling will never be the same, resulting in function colors, and the entire ecosystem will either be split, or all async.

Is not the whole point of Zig's async/await implementation and its focus on @Frames to not introduce this split?

You would still have to write basically two versions (usually interspersed into a single file / function) to account for the different behavior between async and non-async code. As an example, running a producer/consumer type async function in blocking mode would result in a deadlock if you don't have if/else statements to handle this case.

It would be really nice to have continuations built into the language as it is a simple, but very powerful primitive. I've played around with call/cc before in scheme and you can build all sorts of effects on it as @noonien points out.

ethernetsellout commented 7 months ago

It would be nice to see what is being proposed explicitly w.r.t. effects and continuations, rather than links to wikipedia articles and such. If it's anything that would show up in a function signature, that would clearly constitute coloring. It seems that effects, for instance, would have to show up in the return type of a function.

Why does a function's ABI introduce function colours in any practical way? The only thing it really means is that you can't get a default-callconv function pointer to an async function;

It seems you have answered your own question

The question is whether this shows up in practice. It seems this is the one area where async might introduce coloring, unless taking a function pointer to async code isn't all that common anyway.

mlugg commented 7 months ago

Yeah, a concrete proposal would be nice. If you (anyone here!) have a specific idea which you believe to be superior to async, please open a separate issue with a proposal (use the "blank issue" button), which should detail:

Without a proposal, an plea to eliminate async will not be taken seriously, since it is a suggestion to remove a useful language feature with no good alternative. If you think there are fundamental problems with async but do not have a specific alternative in mind, you can open a proposal to remove async, but it will require heavy justification.

Regardless, let's cease discussion of this here - if a proposal is made, its merits and problems can be talked about there. This is a tracking issue for the re-implementation of async, and hence not really the right place for this conversation.

andrewrk commented 7 months ago

anyone here

If you don't have any open source project written in zig with 100+ commits, don't bother

mobiuscog commented 7 months ago

I'm not smart enough to write a proposal, I'm too old-school that I much prefer green/virtual threads over async where it makes sense, and I certainly don't have the influencer levels that @andrewrk demands on projects ;)

However, I do value the vision that Zig holds so far and want to mention one thing that has been alluded to already and that I think is really important in the 'real' world:

A language should never be async all the way down.

One reason I moved away from a certain other 'modern' language is that the majority of 3rd-party libraries became async-only. They required you to use async, as the authors only cared about that and "why wouldn't you write async".

I didn't want to use async code, and didn't have a need to but suddenly I was having to make changes to use a popular library. Sure, it was possible to workaround but then the code became more complex to maintain. Not great for something that promised 'zero cost'.

I trust Andrew to make the right decisions - it's his language after all and that's why zig is so easy to read whilst being as powerful as it is. I just think it's also important to think about the 'average' user in all of this, not just the compiler or language experts, or the college theorists.. but the people that may want to use the language and not find that their code becomes less maintainable over time because one feature now means multiple ways to write code.

One day, when I've fully gotten up to speed with zig, I'd hope to contribute on a more formal level, but for now I just want to see it become the language I'm growing to love. There is so much potential, and doing things 'right' is so important.

I learned about zig zen the other day. May it forever guide.

folago commented 4 months ago

I also don't have enough knowledge on Zig (or on levels this low) to write a proposal, but I saw a parallel between tasks scheduling and memory allocation.

To allocate you first need an allocator of some kind, like an arena or bump allocator. To run async tasks you first need a "scheduler" (for lack of a better name) of some kind, like an event loop or something more N:M like in Go. Either the allocato/scheduler is passed down to you as a parameter or you can create your own of the type you need.

Then async becomes a method of a scheduler, or a subtype, that returns a handle to the task that can be used to await() for the result or cancel() the task or something else depending on the scheduler (maybe you can yield to another handle like a coroutine).

Here is an extremely rough, short, and not well thought example.

const std = @import("std");

pub fn main() void {

    var scheduler = std.concurrent.Async.init(std.concurrent.event_loop);
    defer scheduler.deinit(); // waits for all tasks to be terminated or canceled, maybe can take a timeout param

    var task_runner = scheduler.runner() // can run async functins but not for example deinit() so it is safer to pass around

    var task_handle = task_runner.async(do_things());

    // do other things 

    var result = task_handle.wait()
    // could be task_handle.Cancel()
    // maybe different methods depending on the type of scheduler
}

fn do_things() !resutl{
    // do all the concurrent things
}

I am not sure if mirroring the allocation paradigm/pattern will result in function coloring by explicitly passing "schedulers" as is done for allocators. But I assume the functions you call via scheduler.async(fn) can be regular functions, so no coloring there.

At the beginning I thought this was too wild and did not post it, but then I stumble upon this post about nurseries as concept for structured concurrency that somewhat I feel validated this idea, if you squint the "scheduler" in the example above looks like a nursery in the linked post.

I like how deterministic the nursery concept is and I think it would be a good fit for Zig.

I am sure there are thousands of small devilish details to consider that I am not even aware of, but maybe this path is worth investigating unless it has been already discarded.

brymer-meneses commented 4 months ago

I really like the async_func().await syntax in rust. It's so much cleaner than having to do something like

var some_value = (await (await async_func()).some_other_thing()).finally()
var some_value = async_func().await.some_other_thing().await.finally()

Would it be possible to adopt this kind of syntax?

applejag commented 4 months ago

I really like the async_func().await syntax in rust. It's so much cleaner than having to do something like

var some_value = (await (await async_func()).some_other_thing()).finally()
var some_value = async_func().await.some_other_thing().await.finally()

Would it be possible to adopt this kind of syntax?

The original async syntax zig had would not be affected by this, as explained in this blog post from 2020: https://kristoff.it/blog/zig-colorblind-async-await/

Don't know if the syntax idea has changed, but I really liked that Zig just flipped the async/await function call usage, so that for the common case of calling a non-async function and an async function would have the same syntax.

const some_value = async_func().some_other_thing().finally()

// Equivalent to:
const frame = async async_func()
const other_frame = async (await frame).some_other_thing()
const some_value = (async other_frame).finally()

// Equivalent to:
const some_value = (await async (await async async_func()).some_other_thing()).finally()

Though, in this reversed case where you want to grab the async frame, maybe then it could be a field named async on the async function and an await field on the frame, just to get rid of the wrapping parentheses

const frame = async_func.async()
const other_frame = frame.await().some_other_thing.async()
const some_value = other_frame.await().finally()

// Equivalent to:
const some_value = async_func.async().await().some_other_thing.async().await().finally()
mlugg commented 4 months ago

That syntax has not changed, and (if async is re-implemented) will not change, because it's required for colorless async. So, yes, we don't need await to be postfix for convenient chaining, because you just write async_func().async_method().synchronous_method().another_async_method() and everything works.

revskill10 commented 3 months ago

I think you could make async as the actual function, to transform a sync function into async one.

const asyncfn = std.async(syncfn);
const result = asyncfn()
xphoenix commented 1 month ago

Good day, any ETA for this?

wooster0 commented 1 month ago

https://github.com/ziglang/zig/wiki/FAQ#what-is-the-status-of-async-in-zig