Open arnetheduck opened 8 months ago
I wanted to do that a while ago, and still think I might, but such change will potentially break existing code in 2 ways:
:state
field will change from int to function pointer. It's a low level abi change, that anyone will hardly notice, and whoever relies on it (like me) will likely be ok with the breakage.Well, 1. is not defined behavior, is it? ie the internal fields are internal for a reason, no?
re 2., this feels like it falls into UB territory, and is dubious at best to begin with - is it documented or just an implementation detail that some have started using? basically, the address of a local is never guaranteed to be unique or not to be overwritten, why would one expect that in a closure environment? consider that if you have a local whose liveness is bounded, one is typically allowed to reuse that memory location for a different local as long as the liveness analysis shows no overlap - there is no expectation in general that memory addresses that are not on the heap will be stable - just like you wouldn't return the address of a local from a function. Also, the addr
thing, why is it hacky? It simply expands the liveness to "undefined" which pessimistically means it can be moved to the env..
addr
analysis is hacky because it will be done only within the closure iter, while addr
op itself can be in the callees, and again addr
doesn't automatically mean that it is expected to live through the yield, but all addrs
will be unfairly unoptimized. But in the end I agree that probability of old code failing is pretty low, and I would gladly neglect it.EDIT: Memory-unsafe example is safe. So now I'm not aware of any live code that will fail.
while addr op itself can be in the callees,
I think in nim, we generally assume that if something is passed by var
into a function (and not by ptr
), there is no lifetime expectation of the address of the var
beyond that function call, ie storing addr
inside a function of a var
parameter is UB beyond that one call.
Correct.
When experimenting with manual CPS, I've looked into the storage problem, it's a raw but this works, except with JS backend: https://github.com/nim-works/cps/blob/7e6ed65/talk-talk/manual1_stack.nim
Every yield section of a CPS-ed functions or closure iterator gets it's own env object
Env_foo_0 = object
n_gensymmed: int
i_gensymmed: int
Env_foo_1 = object
n_gensymmed: int
i_gensymmed: int
Env_foo_2 = object
n_gensymmed: int
i_gensymmed: int
j_gensymmed: int
The full state is stored in an {.union}
to guarantee that space is the biggest state used. For compat with JS this can be changed to a case object, but the states and transitions are known at compile-time so it's not really needed.
# This is an object which is large enough to hold any of the above, and is
# used for the initial allocation.
Env_foo_storage {.union.} = object
stor_Env_foo_0: Env_foo_0
stor_Env_foo_1: Env_foo_1
stor_Env_foo_2: Env_foo_2
stor_Env_foo_3: Env_foo_3
stor_Env_foo_4: Env_foo_4
stor_Env_foo_5: Env_foo_5
stor_Env_foo_6: Env_foo_6
stor_Env_foo_7: Env_foo_7
C = object
fn: proc(c: var C) {.nimcall.}
storage: Env_foo_storage
# Original function
proc foo(n: int) =
var i = 0
while i < n:
# sleep(1)
var j = 0
while j < n:
echo i, " ", j
# sleep(1)
inc j
inc i
var j = "done"
# sleep()
echo j
template injectVar(T, id: untyped) =
template id(): untyped = (c.storage.`stor _ T`.`id gensymmed`)
proc foo_0(c: var C) =
injectVar(Env_foo_0, n)
injectVar(Env_foo_0, i)
i = 0
c.fn = foo_1
proc foo_1(c: var C) =
injectVar(Env_foo_1, n)
injectVar(Env_foo_1, i)
if i < n:
c.fn = foo_2
noop(c)
return
c.fn = foo_6
proc foo_2(c: var C) =
injectVar(Env_foo_2, n)
injectVar(Env_foo_2, i)
injectVar(Env_foo_2, j)
j = 0
c.fn = foo_3
And interesting part is that if the continuation/closure:
You can allocated it fully on the stack (called halo - heap allocation elision in Clang++). And if you allocate it fully on the stack, the C compiler can do constant propagation and folding:
Summary
The current implementation of closure iterators sees the implementation use a duff's device to implement the state machine between each
yield
.A more efficient approach would be to put each state in its own function and change states by assigning a function pointer (instead of an index).
Description
A nim function like:
currently gets transformed into a giant switch like so:
A more efficient approach would be to put the code of each state into a separate function, roughly:
Because each state is independent, this would allow the C compiler to better analyze the function and for example perform more aggressive inlining which is more difficult when all states share a single function.
This would also open up another feature, namely that of moving variables out of the heap-based environment and putting them in the "small" local functions, where applicable (ie where a variable does not cross state boundaries).
Such small functions are also candidates for dead-code-elimination - ie if it turns out that a particular state never is used due to inlining, constant propagation and other optimization, that state can be removed entirely instead of still taking up space - this happens commonly in generic code and/or macro-generated code.
In the example, there is of course little benefit, but closure iterators are the mechanism that power chronos and other
async
implementations, where code is often much more complex, leading to very large state transition functions.Alternatives
No response
Examples
No response
Backwards Compatibility
No response
Links
No response