Open mratsim opened 4 years ago
Also Rust coupled both IO and a task parallelism in the past and decided to avoid that:
https://github.com/aturon/rfcs/blob/remove-runtime/active/0000-remove-runtime.md
https://github.com/rust-lang/rfcs/blob/master/text/0230-remove-runtime.md
An idea on how to play well with Asyncdispatch or Chronos or any future async/await library.
They all offer a poll()
function that runs their event loop.
We can add a field pollHook*: proc() {.nimcall, gcsafe.}
on each worker.
It would be setup by setPollingFunction(_: typedesc[Weave], poll: proc() {.nimcall, gcsafe.})
before Weave initialization (at first, can be relaxed later).
Then we modify loadBalance()
, sync()
, syncScope()
to interleave pollHook
calls before and after executing a task.
Note that loadBalance()
is called in-between each parallelFor
iterations:
if not pollHook.isNil:
is very predictable and should be costlessNote that worker threads will sleep if they have no tasks, but it does not make sense for them to try to handle IO events without a task.
A potential issue is that a task can be migrated or for a parallel loop, it can even be split and then executed on 2 different threads, i.e. are the async libraries using {.threadvar.}
to manage some global state? Because that will not work.
RFC #132 and its implementation with Weave as an independent background service is probably a better path forward #136
Weave / Project Picasso focuses on CPU-bound tasks, i.e. those are non-blocking and you can throw more CPU at it to have your result faster.
For IO-bound tasks the idea was to defer to specialized libraries like asyncdispatch and Chronos that use OS primitives (epoll/IOCP/kqueue) to handle IO efficiently.
However even for compute bound tasks we will have to deal with IO latencies for example in a distributed system or cluster. So we need a solution to do useful work in the downtime without blocking a whole thread.
That means:
Research
Reduced I/O latencies with Futures
Kyle Singer, Kunal Agrawal, I-Ting Angelina Lee
https://arxiv.org/abs/1906.08239
The paper explores coupling a Cilk-like workstealing runtime with a IO runtime based on Linux epoll and eventfd.
A practical solution to the Cactus Stack Problem
Chaoran Yang, John Mellor-Crummey
http://chaoran.me/assets/pdf/ws-spaa16.pdf
Fibril: https://github.com/chaoran/fibril
While not explicitly mentioning async I/O, the paper and the corresponding Fibril library are using coroutines/fibers-like tasks to achieve fast and extremely low overhead context switching. Coroutines are very efficient building blocks for async IO.
Implementations
Go scheduler mixes IO and compute, though Go is not that known for its compute throughput (probably because goroutines optimize for fairness/latency and not throughput)
https://assets.ctfassets.net/oxjq45e8ilak/48lwQdnyDJr2O64KUsUB5V/5d8343da0119045c4b26eb65a83e786f/100545_516729073_DMITRII_VIUKOV_Go_scheduler_Implementing_language_with_lightweight_concurrency.pdf
Julia PARTR mixes IO via libuv event loop per thread and a Parallel Depth First Scheduler: https://github.com/JuliaLang/julia/blob/f814301bd9503e243276b356d0cdbfcaa5ae0b8a/src/partr.c#L260-L295
boost::fiber has work-stealable fibers
https://www.boost.org/doc/libs/1_71_0/libs/fiber/doc/html/index.html