Open mratsim opened 2 years ago
This will probably only work with ARC/ORC but then so be it.
I think std/tasks already requires either non-thread-local GC (so Boehm would work as well) or value types.
Unless you use tasks in a non-multithreading context, but libraries that might want to do that (asyncdispatch/chronos) use closure iterators.
In the current implementation, task is allocated in the stack and is cleaned up before toTask
so that the address of task.buffer
is lost.
We should probably pass task.buffer
in invoke procs, which requires to change the signature of invoke procs (breaking changes)
proc invoke*(task: var Task) {.inline, gcsafe.} =
## Invokes the `task`.
assert task.callback != nil
if task.args == nil and task.destroy != nil:
task.callback(addr task.argsBuffer)
else:
task.callback(task.args)
(though Task is always passed by reference internally)
See discussion on Discord https://discord.com/channels/371759389889003530/768367394547957761/933307741462753300
The changes indeed work https://github.com/status-im/nim-taskpools/commit/76a8593a80a756814205952d331c55883f6f7d02 but only if the spawner doesn't exit its scope before returning.
AKA, structured parallelism: spec https://github.com/nim-lang/RFCs/issues/347#structured-parallelism-optional
Structured parallelism [OPTIONAL]
Structured parallelism ensures that all tasks and their descendants created within a scope are completed before exiting that scope.
template syncScope(ex: MyExecutor, scope: untyped): untyped template syncScope(scope: untyped): untyped
References:
- X10 Parallel Programming Language (2004), the Async-Finish construct
- Habanero multithreading runtime for Java and Scala, the Async-Finish construct
- Habanero C, the Async-Finish construct
More recently, concurrency frameworks also converged to similar "structured concurrency" (https://en.wikipedia.org/wiki/Structured_concurrency).
One way to make it work is in =move
: if the task being moved is pointing to its argsBuffer, change to point to the new argsbuffer
proc `=sink`(dst: var Task, src: Task) =
`=destroy`(dst)
dst.callback = src.callback
dst.destroy = src.destroy
if src.args == cast[pointer](src.argsBuffer.addr):
# The arguments are stored inline the task
dst.argsBuffer = src.argsBuffer
dst.args = cast[pointer](dst.argsBuffer.addr)
else:
# The arguments are stored on the heap
dst.args = src.args
This also implicitly assumes that Tasks are {.byref.}
types.
block:
var num = 0
proc hello(a: int) = inc num, a
let b = toTask hello(13)
b.invoke()
echo num
assert num == 13
doesn't work with POC
0
Error: execution of an external program failed
It generates c code like this:
task is zero memoryed after toTask is called. Then task.argsBuffer become all zeros And b.args still points to a zeros array (task.argsBuffer) that seems why echo num gives 0
Currently std/tasks is implemented the following way:
Any function call with an argument will trigger an allocation. This is something that runtimes that use std/task can't avoid. Given that runtimes also have to allocate a future/flowvar, scheduling a task always incur allocation overhead even if there are just 1 or 2 arguments that needs to be passed.[1]
GCC and Clang have an optimization where std:function can store up to 8 (?) bytes intrusive to the data structure and uses the heap otherwise.
It would significantly reduce memory overhead and memory fragmentation for long-running tasking runtimes if this was implemented:
--
[1]: Runtime actually don't have to allocate a future/flowvar if those don't escape, they can use the caller stackframe if the compiler helps with heap allocation elision: (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2477r0.html#design-and-example, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0981r0.html)
Proposal
For a to be decided size N, change the type to:
args
would point to either the buffer or the heap, change would be minimal and would mainly require changing this:to something in that vein.
at https://github.com/nim-lang/Nim/blob/d102b2f54c41f19a0545f28109d0a93550b5d886/lib/std/tasks.nim#L187
Discussion of the buffer size
The current task object has a size of 24 bytes.
proc(a, b): result
orproc(r, a, b)
, this would cover those efficiently.map
or a pointer to a result location) Working on large openarrays is often a very compelling reason to divide work in tasks.