Open disruptek opened 3 years ago
he simply wants the implementation to be unfettered by limitations in macro spec or compilation bugs.
Yes, that's why. While I have many (good?) ideas how to improve Nim's macro system to be easier to use, more stable, etc etc, I think async/CPS/concurrency is so important that you might as well put it into the language directly, or pretend that it is "just a macro" that delegates most of its work to a compiler transformation.
However, that didn't work well for spawn
at all and there are rumors that Nim's closure iterators can be better as a library (@timotheecour ) so it's probably a good idea to keep it as a macro. For a prototype implementation that's even more true.
While I feel CPS is quite powerful, I'd like the typed implementation to work,
It's not only your feeling, it's true. Nim's untyped
macros do not compose, typed
ones do. We're moving to typed
macros but currently these are less developed.
Except that the position that "it's ORC's job to be leak-free under async" is silly.
With 1.4.2 all known leaks and crashes have been fixed but it remains true that:
GC_fullcollect
in a loop, memory consumption: 7MB.) I expect it's rather easy to improve this though.However, these facts change nothing: Ideally our concurrency story doesn't depend on a cycle collector.
Ideally our concurrency story doesn't depend on a cycle collector.
I think you're agreeing with me, but to be clear, I'm not speaking of CPS, which was designed from the outset to exploit gc:arc
. I'm speaking of async/await, which perhaps due to its position in the standard library, promotes the expectation that it should work without fault, especially when gc:orc
becomes the default. For all their "complexity", threads don't present anywhere near the same challenges.
Ideally, our memory story doesn't depend on a concurrency implementation.
I don't see anyone volunteering to bring async/await to the same standard of quality under gc:orc
as the rest of the standard library. If faced with deprecating it or delaying gc:orc
as the default, well... I'd rather enable easier integration of Nim into foreign codebases than carry a flawed concurrency model that turns off newcomers before they even attempt FFI. But I'm open to a third way.
FWIW, gc:arc
narrows the performance delta between threads/async/cps in the above examples. Threads and async/await are slower under gc:refc
while cps seems unaffected.
For all their "complexity", threads don't present anywhere near the same challenges.
I agree completely, threads have 3 problems:
I know how to solve 2-3, but 1 is a nasty problem that made us look into async back then.
Threads do not offer the choice of doing asynchronous I/O on a single thread - making sure that no threads are involved can make code much easier to reason about.
I don't see anyone volunteering to bring async/await to the same standard of quality under gc:orc as the rest of the standard library. If faced with deprecating it or delaying gc:orc as the default, well...
Getting our async to work with --gc:orc was a painful exercise but I consider it completely done now. And it doesn't delay making gc:orc the default, not anymore.
Threads do not offer the choice of doing asynchronous I/O on a single thread - making sure that no threads are involved can make code much easier to reason about.
Threads, in fact, do offer the choice of doing async I/O on a single thread. If you don't want to reason about multiple threads, sure, don't use multiple threads. But I certainly take issue with the assertion that reasoning about our async/await dispatcher is somehow easier. There's lots of evidence that this is not the case at all.
To put that another way, no one doing advanced concurrency is going to choose the async/await dispatcher over threads because it's easier, and that's also not merely a feeling, it's truth. :grin:
There is a sweet spot for async/await, and it's in memory mutability. Not having to use locks is a win in some cases. But note that we didn't need the locks in the above example, either, and I would argue that locks generally make threads easier to reason about, not harder.
To put that another way, no one doing advanced concurrency is going to choose the async/await dispatcher over threads because it's easier, and that's also not merely a feeling, it's truth. 😁
What does advanced concurrency here mean?
As far as I'm concerned async/await is much easier than threads in Nim right now for practically all use cases involving IO, but that can be attributed to the thread-local heaps requirement and all the associated mechanisms to make that work (gcsafety). The evidence of this is in terms of usage, if threads were really easier then people would be using them for this purpose, but I know of very few that do.
What does advanced concurrency here mean?
It means concurrency that is a design requirement and not merely a convenience.
As far as I'm concerned async/await is much easier than threads in Nim right now for practically all use cases involving IO, but that can be attributed to the thread-local heaps requirement and all the associated mechanisms to make that work (gcsafety).
I did say going -- we're talking about the gc:orc
future we all aspire to -- but it sounds like you're agreeing with me. I await the much-needed gcsafe
RFC.
The evidence of this is in terms of usage, if threads were really easier then people would be using them for this purpose, but I know of very few that do.
How are you measuring this?
CPS use Concepts, Concepts are Experimental, Async/await is not Experimental, therefore Concepts must be made to Stable before CPS replaces Async/await.
CPS does not strictly need concepts, my initial implementation worked fine without. It's just part of disrupteks implementation but not essential.
ORC/ARC Leaks is not fault of Async/await.
CPS will not replace Threads, people may want to use Threads anyway.
Do you own a Porsche?.
How does a {.multisync.}
function with CPS looks like ?.
Whether something is asynchronous or not is a function of the dispatcher you define, so you either supply a dispatcher like the one above -- effectively .sync.
-- or we will supply (or generate) one for you.
Then {.multisync.}
must start its Deprecation process soon (?).
Or there will be a macro that adapts {.multisync.}
code to CPS somehow (?).
Why do you think async/await should be deprecated?
To put that more accurately; you can run the same CPS code with any dispatcher you want, at any time. When you invoke your continuation, you choose which dispatcher dispatches it -- in the same way that you may choose between waitFor
and await
-- so you could have a dozen different shades of asynchronicity if you really wanted. You could even use async/await for dispatch, or something that is purely thread-based. So there's no technical reason that CPS cannot compose with the existing concurrency methods.
@Araq
However, that didn't work well for
spawn
at all and there are rumors that Nim's closure iterators can be better as a library (@timotheecour ) so it's probably a good idea to keep it as a macro. For a prototype implementation that's even more true.
While I do agree it is because of other non-technical reasons, I don't think we can conclude that from technical reasons alone.
Nim is a very young language, compiler developers and developers in general cannot know everything and in particular what kind of new patterns will emerge when a language introduces features that were never seen before, in Nim the macro and pragma system. Case in point, even in Rust the compiler devs use to provided an encode/decode serialization library in the standard library but the traits+macro system of Rust enabled a high-performance, very-convenient API serialization library called Serde that became the defacto standard. I've already said this but as the community grows, the standard library should have a process to deprecate previously core libraries to external packages and ultimately if necessary only provide an API so that API "producers" and API "consumers" are compatible with each other.
There are many domains where there is no unique way to address an issue https://github.com/nim-lang/RFCs/issues/247#issuecomment-680997258 @mratsim
- Marshal/serialization/deserialization
- Crypto API (initContext, hash, sign, verify)
- Streams
- Async
- Threads/threadpools/executors
- Coroutines
- Unittests
- Logging
- BigInt
- Database
This might need more fleshed out concepts/interfaces though we can use duck typing with generics https://github.com/nim-lang/RFCs/issues/243, https://github.com/nim-lang/RFCs/issues/168
@Araq
It's not only your feeling, it's true. Nim's
untyped
macros do not compose,typed
ones do. We're moving totyped
macros but currently these are less developed.
Are people using typed macros? My strategy when being handed over typed AST despite asking for untyped (thanks generics) is to rebuild the untyped AST because it's way easier to handle.
For me the one thing that makes macro hard to compose is the type system: https://github.com/nim-lang/RFCs/issues/44, https://github.com/nim-lang/Nim/issues/8677
@Araq
For all their "complexity", threads don't present anywhere near the same challenges.
I agree completely, threads have 3 problems:
1. Overhead of stack memory management. (Cannot have millions of them easily). 2. Sharing introduces data races. 3. Locks can produce deadlocks.
I know how to solve 2-3, but 1 is a nasty problem that made us look into async back then.
I solved that in Weave and you can have trillions of tasks each with their own stack https://github.com/mratsim/weave/blob/e5a3701/weave/datatypes/sync_types.nim#L18-L54
To solve this in a portable way I introduced the following limitations:
@disruptek
To put that another way, no one doing advanced concurrency is going to choose the async/await dispatcher over threads because it's easier, and that's also not merely a feeling, it's truth.
I don't agree, debugging threads is really painful in particular because of non-determinism coupled with memory/ownership issues. Until now, threads = no strings, seq or ref objects which makes it a non-starter for plenty of use-cases.
@dom96
To put that another way, no one doing advanced concurrency is going to choose the async/await dispatcher over threads because it's easier, and that's also not merely a feeling, it's truth. grin
What does advanced concurrency here mean?
As far as I'm concerned async/await is much easier than threads in Nim right now for practically all use cases involving IO, but that can be attributed to the thread-local heaps requirement and all the associated mechanisms to make that work (gcsafety). The evidence of this is in terms of usage, if threads were really easier then people would be using them for this purpose, but I know of very few that do.
I don't agree either, threads in Nim have even less documentation than async/await. Also the facilities are extremely brittle: threadpool, spawn and the parallel statement only work for some cases but trying to go beyond is fraught with peril. I read the channels.nim implementation and I don't understand what's happening with RTTI as well.
What actually works is using raw createThread
call and doing your threading and threadpool C style. In that case I must admit that Nim handling of the base OS primitives is excellent and it's much more comfortable to use than in C where you need to deal with Posix/Windows differences.
Regarding the documentation, there is a lot more code, for example jester or httpbeast and even your book that people can use as a reference for async/await. There is no equivalent for threads.
I am quite impressed and very excited about your work on CPS, that is really awesome.
Looking into the type:
type
Payload = tuple[a, b: int]
C = ref object of RootObj # a continuation
fn: proc(c: C): C {.nimcall.}
I assume those are ref
objects only now but in the future they will become Isolated
https://github.com/nim-lang/RFCs/issues/244 otherwise they can't be moved across threads and a continuation should be owned by a single thread anyway right?
Also one of the main issue with Asyncdispatch and Chronos right now is that they require all Futures to be waited and manually handled. It is very easy to introduce future leaks and resource leaks by introducing a timeout and then forget to dispose properly of the cancelled future, i.e. we need to do manual future management. Is there a sketch of how would cancellation look like with CPS?
@mratsim : "Is there a sketch of how would cancellation look like with CPS?"
CPS itself does not know about cancellations or timeouts, it only provides the building blocks to make continuations from flat and plain Nim code. You can simply choose not to call a continuation, but instead throw it away, which will effectively "cancel" your flow.
Cancellation and timeouts would be a concept of the asynchronous I/O abstraction implementations built on top of CPS.
@dom96
To put that another way, no one doing advanced concurrency is going to choose the async/await dispatcher over threads because it's easier, and that's also not merely a feeling, it's truth. grin
What does advanced concurrency here mean?
As far as I'm concerned async/await is much easier than threads in Nim right now for practically all use cases involving IO, but that can be attributed to the thread-local heaps requirement and all the associated mechanisms to make that work (gcsafety). The evidence of this is in terms of usage, if threads were really easier then people would be using them for this purpose, but I know of very few that do.
I don't agree either, threads in Nim have even less documentation than async/await. Also the facilities are extremely brittle: threadpool, spawn and the parallel statement only work for some cases but trying to go beyond is fraught with peril. I read the channels.nim implementation and I don't understand what's happening with RTTI as well. What actually works is using raw
createThread
call and doing your threading and threadpool C style. In that case I must admit that Nim handling of the base OS primitives is excellent and it's much more comfortable to use than in C where you need to deal with Posix/Windows differences. Regarding the documentation, there is a lot more code, for example jester or httpbeast and even your book that people can use as a reference for async/await. There is no equivalent for threads.
@mratsim
It sounds to me like you are agreeing that async/await is easier in Nim right now than threads. Just not for the same reasons I outlined. Am I understanding you correctly?
I think you're factually wrong here about documentation for threads:
httpbeast
uses Nim threads, so that's at least one example library using them effectively.I would also argue that threads are much easier to understand for newcomers, they are after all a fundamental aspect of programming. Nim threads being so low-level means that the barrier for entry isn't super high for newcomers.
Also one of the main issue with Asyncdispatch and Chronos right now is that they require all Futures to be waited and manually handled. It is very easy to introduce future leaks and resource leaks by introducing a timeout and then forget to dispose properly of the cancelled future, i.e. we need to do manual future management.
@mratsim
Couldn't the same be said about threads and/or lightweight green threads/tasks? You can very easily have a thread that has been spawned and never finishes, and get a resource leak this way. I'm curious whether Weave allows cancellation of the threads it spawns, I know that it's not typically possible to destroy a running OS thread, so how does Weave handle this?
I agree however that this is indeed an issue, it has bitten me in the past which is why I have implemented hooks in the async dispatcher to see which futures are "pending". Combined with prometheus it works insanely well for finding resource leaks.
Cancellations of futures would go a long way to help, and are definitely a necessity for some scenarios. If I had the time I would love to look into implementing them. I recall that there were plans for Chronos to get them? Did this not happen?
It sounds to me like you are agreeing that async/await is easier in Nim right now than threads. Just not for the same reasons I outlined. Am I understanding you correctly?
Yes, I think that async/await are more likely to be used than threads not because one is easier/harder than the other but because more code/examples/tutorials exist with the former in the first place.
httpbeast
does use threads but that's more the exception than the rules. As soon as you use threads, at least with the old GC, suddenly most of the standard library cannot be used as easily because strings, seq or references should be use with way more care and few example exist (while say waitFor
vs await
can be found) to demonstrate how. And then some critical structures people expect like threadsafe hashtables are missing (from Facebook Folly for example).
I'm curious whether Weave allows cancellation of the threads it spawns, I know that it's not typically possible to destroy a running OS thread, so how does Weave handle this?
Weave distinguishes between threads and tasks. init(Weave)
and exit(Weave)
will create and destroy Weave threads. Destroying Weave threads can only be done from the root thread and will block until all enqueued tasks are finished.
Weave does not support cancellation of tasks. I have yet to come on that use case in CPU-bound tasks. If something needs to be interrupted, it comes either from a kernel signal, a timeout or some kind of IO/message, in that case the IO scheduler or async/await dispatcher should be the better place to handle this. If it interfaces with a threadpool or Weave it either should enqueue the task or should let it finish and properly dispose the resources afterwards.
I don't agree, debugging threads is really painful in particular because of non-determinism coupled with memory/ownership issues. Until now, threads = no strings, seq or ref objects which makes it a non-starter for plenty of use-cases.
Well, it is now now. :grin:
But, yeah, if you understand what threads are and what they are not, then it's sorta on you if you try to shoehorn them into a role they are not suitable for.
I assume those are
ref
objects only now but in the future they will becomeIsolated
#244 otherwise they can't be moved across threads and a continuation should be owned by a single thread anyway right?
You're correct that we'll want to switch to Isolated
, but honestly, I'm not convinced there's a good reason that a continuation should be owned by a single thread. To put that another way, the limitations of the memory model are orthagonal to the limitations of CPS itself. We don't want to adopt these limits until it's strictly necessary.
As @zevv says, what you do with a continuation is up to you. That's the beauty of the approach; you can trivially use as your dispatcher anything that can invoke a function. And even the continuation type itself is yours to define. All we're doing is rewriting the control-flow for you and maybe giving you a little syntax sugar. The bits you actually care about are yours to implement however you want.
Cancellations of futures would go a long way to help, and are definitely a necessity for some scenarios. If I had the time I would love to look into implementing them. I recall that there were plans for Chronos to get them? Did this not happen?
I thought Chronos got cancellations like, I dunno, a year ago.
I would also argue that threads are much easier to understand for newcomers, they are after all a fundamental aspect of programming. Nim threads being so low-level means that the barrier for entry isn't super high for newcomers.
So we agree. I'm happy to move past that debate. :grin:
FWIW, I don't use threadpool
, spawn
, or parallel
. Maybe that's why I don't have threading issues. I understand that Yuri's threadpool
is "better" and "should be made stdlib" but I couldn't tell you why. I looked at channels
and unless I'm missing something, they are vestigial and useless in a gc:arc
world. If these extensions to threading need work or deprecation, so be it. They are hardly the only parts of the language that need attention. Having fewer holes to step in helps newcomers and old hands alike.
And, look, this isn't an effort to codeshame contributions. It's an effort to coalesce goals on where to take CPS from here so that we know where to work. It's been stagnant for months simply because it wasn't clear whether we'd be throwing effort down the drain to continue fixing typed
macro bugs in the compiler versus just cheating via a transform pass.
Here is my proposal: @disruptek, @mratsim, @zevv and everybody else who feels left out, please join forces and make CPS the missing part of Weave so that:
async
frameworks. But I am not up to date with its current status -- I don't know what is missing from Weave.--gc:orc --threads:on
mode.In addition to that, mark macro/concept/ORC bugs that are blocking the progress with a new "CPS" tag, myself and other compiler devs promise to prioritize these bugs, or at least guide you how to fix them. Well that's my personal promise, I don't speak for other compiler developers.
If you don't need static T
support, the concepts that I implemented in my branch could already be very helpful, I'm willing to merge it sooner rather than later if it helps you.
Talk is cheap, confronting all the unanticipated "critical" feature requests of end users is what makes software engineering hard (=messy). I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I watched C-beams glitter in the dark near the Tannhäuser Gate. Anyway count me in, where can I recruit? And if dom96 would play the devil's advocate, nothing is impossible.
Now that the holidays are here, I finally got a chance to sit down and re-read this RFC. So there will be a bit more talk (sorry Araq), but hopefully not on this RFC (doubt it :)), rather a brand new one I have created: https://github.com/nim-lang/RFCs/issues/304.
It even takes inspiration straight from this RFC:
You may even be motivated to improve async/await.
Now to get onto the remainder of the essence of what's in this RFC. There is nothing here that shows the power of CPS, all that's here is a bunch of synthetic benchmarks comparing a highly stripped down implementation to a full-blown async IO dispatcher. This is very similar to the V programming language benchmarks. In particular their many claims of being able to compile V code faster than any language out there, when their compiler can only compile the simplest of constructs.
This causes unnecessary concern inside the community. Let's be clear, there are plenty of valid criticisms to be put against async/await and this post does list some. Many of these however are relatively minor and can be readily fixed (see https://github.com/nim-lang/RFCs/issues/304 and let us know what else you would like to see improved).
Some highlights that I wanted to pick out:
/home/adavidoff/nims/lib/pure/asyncmacro.nim(144, 3) Error: 'yield' only allowed in an iterator
Yes, documentation for async/await is definitely lacking, and my book is no replacement. These kinds of errors are also annoying, especially since this particular one is a regression, as before await
was turned into a template you would have gotten a much better error. I'd love to fix this but I'm not even sure if it's possible, a template cannot know where it is invoked, can it?
And a valgrind, which probably means nothing.
Yes, async/await leaks with --gc:arc
, as expected. This is why --gc:orc
is a thing. It's not fair to use this as a criticism of async/await in the way that you did.
It is fair to say that --gc:arc
works with CPS, but then you need to demonstrate why that is important. Why not just use --gc:orc
when it's working well?
So at the end I strongly urge yourself and others to commit to improving async/await. For all its faults it is still a very good way to write IO-heavy code, and probably one of the best tested libraries in Nim's standard library. Exploring new routes and doing research on these things is important, and I would encourage you to continue if you find it fun and interesting. But there is no reason to be bringing down async/await in the process and trying to make the community feel like async/await has no future.
We are a small community, we cannot be reimplementing these things constantly and asking our users to rewrite their code on every whim unless there is a very good reason to do so. It's easy to think that your solution will be perfect, but it will undoubtedly have it's own faults which will take years to iron out, by that point someone else might come around and want to rewrite things yet again.
Finding my previous comments a bit cryptic, let me recapitulate:
I have some use cases in mind that should enable comparison of existing and proposed implementations in terms of both ergonomics and performance:
In other words, I'm agnostic (and incompetent) in terms of how these things should be implemented, but I have ideas of what should be possible, and I'm willing to act as a tester.
And, now that I'm here, about the social aspects of software engineering: It is both fun and instructive to invent new algorithms, but to fix somebody else's inventions that are in production and where you might break things - it's more like a job than a hobby. Therefore I understand that people are more willing to try out their own alternatives than to fix existing stuff. But this is a small community - everyone can learn in the process and in the end everyone is just genuinely trying to make the end product better.
Cool, dude. We've mostly been discussing this stuff in real time on the #nim irc channel. Sometimes we even deign to address discord users. :wink:
To make it clear, I see CPS as a long-term replacement for our closure iterators, our async can build on top of CPS once it's mature. I have no idea how far away CPS is from this state. At the same time I welcome contributions to our current async/await implementation, never let the perfect be the enemy of the good. Async/await is our standard and people rely on it even if others moved on to implement their own frameworks.
Can you explain what you feel is wrong for you about closure iterators? I have my own grip with them, mostly on syntax but I'm curious about the implementer point of view since syntax can be sugar-coated.
A closure iterator is yet-another language builtin feature which is not required:
async
is implemented as a complex macro transformation anyway, it might as well do the remaining state machine generation and lifting of locals. Instead we should embrace liftLocals
and goto state
as the fundamental primitives.Closures didn't age all that well IMHO, they really are a poor man's object system implementation.
I am currently dumping my research and design thoughts here:
I am evaluating approaches in other languages, namely:
I have a couple of questions to restrict the design space:
1,2,3: I think getting this to work in JS and the VM is pretty essential - if we make something that doesn't work "everywhere", we are pretty limited in other usable things built on top of this. So this would also exclude the use of arch-specific assembly.
4: No-alloc is a nice-to-have, but probably not essential. I guess that for your purposes it would be alloc-agnostic.
6: if the primary use case is doing I/O "a bit of overhead" is acceptable. If this is going to be used to implement things like iterators, no more overhead then a function call would be acceptable to be competative with normal iterators.
That totally depends on the use case. For I/O hundreds of thousands would likely be more then enough, but for more generic use cases like CSP I think having millions would make a very nice story.
For me a cool use case would be the Lua-style coroutines showing some nice producer/consumer pipe flow. Disruptek is likely to persue his CSP ideas. And of course being able to run continuations on a effective scheduler properly making use of all CPUs in a machine.
9, 10, 12: I don't know
11: No, the concept of cancellation does not exist at this level. You just "don't call" your continuation.
That totally depends on the use case. For I/O hundreds of thousands would likely be more then enough, but for more generic use cases like CSP I think having millions would make a very nice story.
4e9 / 10e6 (assuming 10 millions on 32-bit) gives us a budget of 400 bytes of overhead per 'routine' which is a comfy size to work with.
Note that the overhead will significantly depend on the scheduling, if it's done depth-first we will have significantly less simultaneous 'routines' scheduled while if it's done breadth-first many 'routines' might be started.
It seems silly to mess around with assembly when we can have a native Nim implementation that runs everywhere and compiles to nearly the same code. Maybe after it's stable we can optimize the C version, but I think the interesting development is pretty obviously above CPS and not below it.
There's zero additional overhead to calling a continuation, since that is in fact the same thing as calling a function. Anything more is the fault of the dispatcher. I personally don't care much about performance, since it would be very hard to make this stuff slower than async/await or threads. There's just nothing there to slow down. We're talking about AST transforms, here.
Regarding alloc-free operation, I pushed a demonstration version today that doesn't use the heap. CPS won't care, but this obviously does impact dispatch and environment construction. I'd really like to see an example use case beyond toy iterators, etc. because otherwise, it just feels like a gratuitous development constraint.
9, 10, 11, 12 ... Yes, we already have these features and we don't want to lose them. :wink:
Remember that it's not merely the case that you can write a custom dispatcher for your continuations. You will. It will be a common step in implementing your control-flow. So, you don't have to make these choices about BFS/DFS or overhead of the scheduler right now. A typical program may have 100 different dispatchers implemented inside it; this is as common a composition as, say, a destructor. Think of the design of the dispatcher as similarly malleable and intentional to the role of the continuation...
The past couple days, I have been playing with CPS and researched a lot what other languages do and alternative approaches, namely coroutines and fibers. For the really motivated explorers, you can have a glimpse of my research here: https://github.com/weavers-guild/weave-io/tree/master/research.
Now for those who wants to understand what's going on under the hood, this is the mandatory talk:
In short continuations allow us to implement resumable function. When we reach a suspension point control flow is split into the current instruction and its continuation (the rest of the statements). First-class continuation support allow us to package that in a variable and resume it when we want. Other names include "callbacks" or "named gotos".
Kotlin doesn't expose continuations directly and prefer to leave devs at the level of coroutines but I think there is value, especially to wrap low-level IO to have first class continuations.
Before going full on CPS, I tried to design how a competing approach, revamped closure iterators, might look like at, i.e. they would likely look like coroutines which are resumable functions. They are more general than functions because for control flow functions do:
call
= suspend the caller, jump into the subroutine.return
= terminates the subroutine, jump back to the caller.Coroutines (of the asymmetric kind) support that and adds:
yield
= suspend yourself and return to the callerresume
= caller jumps into the coroutine after the previous yield point.The design draft is at https://github.com/weavers-guild/weave-io/blob/master/design/design_1_coroutines.md, I've merged some of the ideas from the CoroutineTS from C++ which were merged in C++20 before getting stuck at:
proc process(path: string,
buf: var array[4 * 1024, byte],
expensiveWork(buf: var array[4 * 1024, byte]){.nimcall.}): int =
let f = openFile(file)
var curIdx = 0
while true:
let numBytes = f.readBytes(curIdx, buf)
curIdx += numBytes
expensiveWork(buf, numBytes)
if numBytes == 0:
break
return result
Then I looked into continuations, which are lower-level primitives than coroutines to see how things would look like. Suffice to say they show more than great promise. Language theorists have proven that continuations of the delimited kind are able to represent any kind of monads.
A compelling argument for why delimited continuations are elegant is the paper Representing Monads. Two very short one liners are enough to make any monad native to your programming language. For example your language doesn't support exceptions? Define a maybe monad and use monadic reflection to make them feel like native exceptions. Your language doesn't have mutable state? Define a state monad and use monadic reflection to make it feel like native mutable state. Your language doesn't have coroutines and requires you to write callback manually as in Node.js? Define a monad similar to F#'s async workflows (=~ continuation monad), and now it feels like your language has native asynchronicity. The same goes for monadic parsers. In short continuations can even be used in Haskell to represent mutable state.
What does it mean? For Nim there is no point to implement mutable state or exceptions with continuations (use native C or C++ primitives), however we would be able to shape all our control flow story to any model. Furthermore just like monads in Haskell, they would be composable.
Concretely, we can reimplement:
dup
and sink to reuse the buffers but you still make multiple passes over the data. with an unified suspend
/resume
internals that would compose cleanly.
I plan to produce a design document to shape up the syntax and showcase how it can be used for those various use-case.
Now the very interesting thing is that this can be the primitive and interface that anyone using those can use: want to change the standard library parser? if it has a suspend/resume anyone can use it.
Regarding async/await which is the main drive behind those. It will not disappear. The suspend
/resume
are scheduler agnotisc, async
will become a very short syntactic sugar to "suspend this, register a file descriptor, store the continuation in a scheduler" and await
"pop the matching continuation, run it". As a transition period I will show how to reimplement closure iterators on top of continuation for a smooth transition period. I expect though that this extra overhead will be discarded and async/await dispatchers will use continuation directly.
For user code, that means that there won't be asyncdispatch vs Chronos anymore but a scheduler vs another (still with its Future/Timer/Transport abstraction). Note that you can implement transport and future as thin wrapper on top of continuations as well.
Also for those into performance, I am obviously looking to make continuations:
I have full confidence they are suitable for Weave and would even bring significant benefits: bounded memory usage and I might even be able to drop my very clever memory management scheme/memory pools thanks to new scheduling techniques they allow me (more at: https://github.com/disruptek/cps/discussions/40#discussioncomment-241263)
So my next steps:
FWIW, I gave up on untyped
CPS for my cloud-based continuation scheduler. To me, it's not worth pursuing -- I would rather put the time into typed
, because it's clear that typed
is possible (and quite a bit more exciting and ergonomic) if we can nail down the spec.
I scanned through the Kotlin design and it's not really as elegant as our implementation, but it does seem to offer most of the semantics. Just, y'know, on the JVM and without the ability to reach in and implement your own machinery. So if you like Kotlin's version, you'll love Nim's.
This RFC would be a good place to talk about syntax and composition with other language features like isolation, effects, expr pragmas, and so on.
@mratsim you can also look at my work on dodo with regards to continuations
What will happen to CPS now that @disruptek has been banned from the community?
What will happen to CPS now that @disruptek has been banned from the community?
I don't plan dropping CPS work, it's just that there is so little time to do everything.
@mratsim looking at the CPS implementation in C and realizing that the continuation structure seems bigger than what Weave uses now (at least for tasks). The difference seems rather non-negligible so I'm not sure whether using CPS as the underlying "assembly" for nearly everything related to parallelism & concurrency is the best idea.
The continuation structure should just be (function pointer, environment capture), see https://github.com/disruptek/cps/blob/e8b168f8997924e651b198588e0630d95c061918/talk-talk/manual1_stack.nim#L93-L95
type
C = object
fn: proc(c: var C) {.nimcall.}
storage: Env_foo_storage
And for coroutine there is an extra promise at least and maybe an extra bool for termination (I think we can avoid it, it does solve a very unintuitive behavior of current closure iterators).
Yep - and I'm afraid of the size of Env_foo_storage
(or jmp_buf
). But maybe I misunderstood something in Weave.
Yep - and I'm afraid of the size of
Env_foo_storage
(orjmp_buf
). But maybe I misunderstood something in Weave.
In Weave this corresponds to TaskDataSize which is an arbitrary 144 bytes (https://github.com/mratsim/weave/blob/30baced/weave/datatypes/sync_types.nim#L19).
As a comparison, according to https://tel.archives-ouvertes.fr/tel-01549158/document, Akka uses 600 bytes per actor and Erlang 2500 and in Pony 240 bytes
Note that theoretically Tasks can be anything as the scheduler implementation relies on the following concept/interface: https://github.com/mratsim/weave/blob/30baced/weave/datatypes/prell_deques.nim#L10-L20
type
StealableTask* = concept task, var mutTask, type T
## task is a ptr object and has a next/prev field
## for intrusive doubly-linked list based deque
task is ptr
task.prev is T
task.next is T
# A task has a parent field
task.parent is T
# task has a "fn" field with the proc to run
task.fn is proc (param: pointer) {.nimcall.}
A fixed size was chosen because for efficiency and overhead reason, a memory pool trumps malloc
/free
.
Unless continuations are capturing large hashes or big integers on the stack, the environment should be way smaller than Weave's. There aren't many situations were you need to pass 18+ local variables across suspension points.
Unless continuations are capturing large hashes or big integers on the stack, the environment should be way smaller than Weave's. There aren't many situations were you need to pass 18+ local variables across suspension points.
And this is exactly what I absolutely did not anticipate. I'd expect continuations to have the environment about the same size as Akka (or bigger) on modern CPU architectures. That's why I thought continuations can't be faster than Weave. But I'm glad it's the other way around :wink:.
This RFC seemed to show a lot of promise and benefits of CPS. I have recently started using it, and now understand that a rift in the community seems to have delayed it's integration into nim? At least that's what it seems to be on the surface.
AFAIK cps would be greatly improved with compiler support; I am just a lowly user so forgive my stupidity in asking, but I am really interested in trying to advocate for any progress in CPS after using it: What is progress/the timeline of advancing CPS integration?
It does seem really daunting at first, and I have made some stupid things on it as pointed out by others, but it is an absolute joy to use and I find it to be much easier to reason about the code that I produce with it. It's also pretty bloody fast. I truly feel like I am more in control using CPS. Mostly because I can so easily tie it together with other dispatchers etc.
Thanks.
It's an open secret that I've some criticisms of async/await:
Fear of Threads
From the perspective of a newcomer who is looking to transfer existing knowledge or code from another language to Nim, or someone who has zero experience with concurrency in any language, I recommend threads.
Here follows the first example from the
threads.nim
documentation. I tweaked the syntax a touch -- it was written in 2011 -- but that is all.Here's the compilation output:
Here's the codegen:
The resulting binary is 52,472 bytes in size and links against:
Here's an invocation:
A quick run of golden:
And a valgrind, which probably means nothing.
Macro Complexity
or, if you think you're smart,
Poor Documentation
This is my port of the above program to async/await:
Here's the compilation output:
Oops. I didn't
import asyncmacro
and I never usedyield
. Curious.A quick look at the documentation for
asyncdispatch
yields a single occurrence of "asyncmacro"; in fact, it's the last sentence of the documentation, under a section helpfully titled "Limitations/Bugs":asyncdispatch
module depends on theasyncmacro
module to work properly.I have no idea what this means. As far as I can tell, the
asyncdispatch
module depends on theasyncmacro
for compilation failures. 😁But I know the problem must be in
await
, right? There's no documentation for theawait
template and the documentation that does exist for usingawait
is remarkably unhelpful.Take a critical read of the documentation for
asyncdispatch
some time. It's worthy of another long thread, so to speak, because it's pretty hard to understand.Thankfully, there are experienced users on IRC that point out that I should use
waitFor
instead ofawait
. I remember reading aboutwaitFor
but since it merely said it blocks the current thread, I figured it wasn't what I wanted at all. I feel stupid, now.But at least I can compile my program:
Here's the codegen:
The resulting binary is 125,704 in size and links against:
Here's an invocation:
A quick run of golden:
And a valgrind, which probably means nothing.
CPS
This example uses a so-called
untyped
implementation from August; it's a bit more verbose because it demonstrates the machinery explicitly, in case you are new to continuation passing style.The newer
typed
implementation is better, in terms of both syntax and semantics (read: performance and performance), but it is currently blocked by at least one compiler bug.Here's the compilation output:
Here's the codegen:
The resulting binary is 46,000 bytes in size and links against no novel libraries. Here's an invocation:
A quick run of golden:
And a valgrind, which probably means nothing.
Performance
Obviously, only the threaded version is actually asynchronous. 😁
And, I mean, sure, this isn't a real program. So why is it so large?
And CPS...
yield
,void
, ordiscard
novelties, nogcsafe
, novar
orclosure
limits, etc.typed
implementation of CPS has fewer allocations still. In fact, more optimization is possible but we currently favor supporting fewer continuation types instead -- laziness, really.or this version with a callsite pragma:
or this version using an effect:
...which brings me to...
Composition
threads:on
I can't say that thread support under async/await is arcane, exactly, but it's clearly not a natural extension of the design the way it is with CPS.
color:on
If you aren't familiar with function color, you'll enjoy the following:
https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/
leaks:on
Look, let's just assume for the sake of argument that async/await is leak-free under ORC; though by the time you read this, there may be another leak. Or maybe there isn't. It doesn't matter. I don't care, and neither should you.
gc:orc
Except that the position that "it's ORC's job to be leak-free under async" is silly. We don't want to have to choose between deploying new change-the-world features that improve virtually every aspect of the language... and supporting a concurrency implementation (even if the code were good).
We want everything to work, all the time, everywhere, right? Right?
asyncInside™
This async implementation, which is either stable (except for bug fixes) or stagnant ("feature-complete"), depending on who you ask, should not be the gatekeeper for compiler development. This is an argument against tying the implementation into the compiler. In fact, this tie was broken with
chronos
.asyncOutside™
So, yeah, now the community has to support two different async implementations which largely suffer the same flaws, are incompatible in subtle ways, and of which neither are reliable under the best memory manager for Nim.
except:
Exceptions are handled in separate and unequal ways...
handler = proc()
callback.What CPS
is
CPS is research-grade; I would say it's above alpha-quality.
CPS is pure Nim. It probably compiles as slowly as it does because it uses concepts, which allow you to specify the details of your continuation implementation and those of your dispatcher. Everything is exposed to you, and the deep access you have allows for novel forms of control flow, concurrency, scope, and abstraction.
What CPS
isnot
I'm coming up on the 2 year anniversary of my async/await divorce 💔: https://irclogs.nim-lang.org/29-04-2019.html#16:31:55
CPS is not CSP (communicating sequential processes), but it starts to give us the bones with which we might build such an elegant beast.
To wit, I think I'm prepared to help with a CSP implementation to rival that of Go's channels and Clojure's async.core, but as @Zevv likes to point out, it'd be a lot easier to implement CSP if we had a solid CPS working.
git blame
Threads
I'm really happy with threads. They work, they are cheap from a cognitive perspective, the memory semantics are clear, and they are easier to use in Nim than I expected. I regret not using threads for all my concurrent code.
The implementation is simple, it seems pretty stable, and threads will clearly need to compose with future concurrency implementations for both hardware and software reasons.
Async/Await
As @Araq says, "I don't care."
This is not the Nim concurrency story I want to tell to newcomers, and I think if you're honest, it's not the one you want to share, either. Never has been. It does not do justice to the elegance of the language, its platform support, or its performance. It doesn't even have useful documentation.
It's kind of like the Porsche gearbox -- charming or even cute, from a technical perspective, but to be pragmatic: easily the worst part of the car.
CPS
If it makes you feel better, you can blame me for believing CPS is a game-changer for Nim and you can blame me for criticizing async/await. Please first spend a couple years helping (or even just watching) innumerable people using software that is an objectively poor choice from the perspective of performance, documentation, simplicity, and compromise. Now you can also blame me for sending too many people down the wrong path.
Action Items
I've tried to convey a tone of authority but don't take it too seriously -- that's just how I talk about facts. 😉
I know how we implemented CPS (and in particular, what sucks about it) but I really don't know much about how it should be exploited. I've tried to capture some ideas in the repository, but @Zevv has done most of the design work and has a better grasp of the sorts of novel behaviors that CPS enables.
Most of my experience with coroutines is limited to Python, where they were reliably a huge PITA. Go was great. Anyway, please take a look at the code:
https://github.com/disruptek/cps
Look at the examples (in
stash
). Look at the tests. Read the papers (the1011
one is a thriller, I promise). Get excited about this: it's not science fiction; it already works, today, and it can work even better tomorrow.You are going to invent new ideas on how we can exploit this style of programming. You are going to see problems with our work (especially @Zevv's) that you can report or fix. You may even be motivated to improve async/await. All to the good.
You are going to complain about design choices and demand that they are fixed. Bring it on. This software is ready to be taken further while there are no users to worry about breaking. @Zevv has even promised to do some typing for you. At least, I think that's what he said. 🤔
I believe @Araq would prefer that CPS was implemented in the compiler. I don't know if he is motivated by the fear that it will be stolen if it is exposed in a library, or whether he simply wants the implementation to be unfettered by limitations in macro spec or compilation bugs. I personally think these are valid concerns, and of course there are probably other good reasons to target the compiler directly, as well.
Problem is, more users are intimidated by compiler code, and they have every justification to feel this way. Most of the compiler is undocumented and new code rarely receives comments. It's hard to know where to begin and embedding CPS in the compiler will cause it to require at least some compiler expertise to service -- a commodity in poor supply.
Limiting CPS to the compiler may also make it less accessible to users of earlier versions of Nim. It may even fracture CPS support across different versions of the compiler. If you agree that CPS should not be implemented in the compiler, please give the RFC a 👎.
While I feel CPS is quite powerful, I'd like the
typed
implementation to work, as it promises a much better experience. I believe this is a practical prerequisite of future work, so building a similarly powerful abstraction for CSP is blocked until we choose the next step for CPS.