Open RaasAhsan opened 4 years ago
Sorry for taking forever to loop back to this…
Broadly speaking, I agree with you wholeheartedly. Having some notion of fiber identity is important for a lot of reasons, but particularly for things like fiber locals and such. Doing this in a sane way is immensely complicated though and I haven't had time to think deeply about what it should look like. There's also a whole series of real risks when you do something like that (see: all the abuse ThreadLocal
gets). Also how to do it in a way which is amenable to abstraction.
But in general I do think this is very much a problem worth tackling. Some of the things I'm thinking about:
join
to be an inbound edge, in which case it could actually be a fully cyclic graph) The answer to this question also dictates how a hypothetical FiberLocal
would behave in the presence of start
self
fiber introspection give? Should you be able to get a reference to the Fiber
instance? If so then you would be able to do things like join
or cancel
on yourself (the former being a deadlock, the latter likely leading to deadlocks indirectly in CE2).IO
specifically (and I would argue, a useful thing), but it's quite another talk about what it means in the broader context of abstractions. I'd really like to be able to tackle that problem, but then we have to try to reason about this kind of state and surrounding laws. It's tricky.start
is through the lens of a supervisory fiber controlling a child fiber. This metaphor holds pretty consistently, since join
and cancel
are in the hands of the supervisor. In that metaphor, I think fiber identity corresponds to the self
actor ref. I'm not sure if this is a useful line of thought, but it's a line of thought.Anyway, I very much think this falls under the scope of CE, we just have to figure out what it looks like and what properties it should have. I believe that both Monix and ZIO support fiber locals, so looking at what their answers are to the above (where relevant) is probably a good place to start in terms of figuring out what this should be. I also noticed you have a PR open, which is awesome and I'll try to look at it as soon as I can.
Overall, thank you for bringing this up! It's a really really important thing and well worth pushing on.
How does fiber identity relate to its ancestry tree?
Another concern I had here: if a fiber tracks its ancestry, or the runtime tracks the entire graph, how deep should it go? We should be careful to avoid memory leaks that could arise from patterns like fork loops.
What kind of control and information does self fiber introspection give?
In the PR I've got open right now, you only get access to the local state of the fiber. I could see fiber ID/ancestry tree, trace storage being exposed here as well. Placing that data Fiber
handle is interesting: holders of the Fiber
handle could also access it (which also brings up implications for synchronization and thread safety).
This metaphor holds pretty consistently, since join and cancel are in the hands of the supervisor.
Is it improper usage for a fiber other than a parent of a child to invoke join
or cancel
? Thinking the join/cancel tokens are placed in a Ref
and somebody else invokes it.
I certainly agree with you here, it might be difficult to find suitable abstractions for these behaviors. I think figuring out what Monix and ZIO do in this realm is a good start, so I'll try to start scoping that out.
What is the ancestry of a fiber created by combining via the semigroup? That's a way to break the tree model without join
.
What is the ancestry of a fiber created by combining via the semigroup? That's a way to break the tree model without join.
That's a good point although the Semigroup instance does use join
internally
Ah, right, that goes through map2
, which join
s.
I assume pure Fiber
s would have identity? I wouldn't think of those as participating in a supervisory metaphor, but I also can't think why not.
In #836 , identity/state is associated with an execution of IORunLoop
rather than a Fiber
. Ancestry tree is pretty straightforward here: whoever calls start
is the parent. Join graph feels more complicated since the parent/supervisor isn't necessarily the only one who can call join
/cancel
. You could easily delegate supervisory to another fiber:
for {
f1 <- IO.delay(fibonacci(12)).start
f2 <- f1.join.start
a <- f2.join
} yield a
A pure Fiber
is an interesting case: it isn't logically associated with an execution of the run loop, so it could never accumulate things like state or trace stacks. But if it needs identity, maybe Fiber.apply
has to be suspended in F
to get access to run-time/global state
I've done some research on the locality features of ZIO and Monix. Here's what I've come up with from an initial pass.
ThreadLocal
that is updated across asynchronous boundariesLong
. Also exposes fiber creation unix time.FiberRef
supports copy-on-fork and merge-on-join semantics via a binary function (A, A) => A
ThreadLocal
to support interop with unsafe code. References are updated on asynchronous boundaries accordingly.So there are some stark differences in how local state is propagated between parent and children fibers, unsure yet how we will unify that.
There is a law that both Monix and ZIO (as well as the CE PoC) satisfy today: identity/state must be preserved across asynchronous boundaries for the same fiber. So something like:
fiberId.get <-> F.shift *> fiberId.get
fiberRef.get <-> F.shift *> fiberRef.get
Some initial ideas for type classes (need better names):
LocalIdentity[F[_], A]
- captures identity from fibers. In addition to the one above, I could see a law that says a fiber and a fiber it forks can't share the same IDLocalState[F[_], A]
- somehow unify local state implementations for monix Task and ZIO. or maybe different classes for different kinds of fork/join semantics. LocalUnsafe[F[_], A]
- something to capture ThreadLocal
integration?I just realized that Monix behavior does resemble copy-on-fork semantics, so maybe another law here:
fiberRef.get <-> F.start(fiberRef.get).flatMap(_.join)
This along with the law preserving state across asynchronous boundaries described above would suffice to support most local context propagation needs.
Ive tried looking into this but very quickly hit a bump.
From what I can tell, a fiber-identity should be passed at least on the following functions: Fiber.join
, Ref.get
, and Defer.get
.
To add this on the Fiber.join
, seems to be possible with something as:
private[effect] case class WithFiberIdentity[+A](identity: UUID, ioe: IO[A]) extends IO[A] {
def tag = 24
}
For Ref.get
, it seems to be pretty tricky. Since it is written in terms of the Sync
interface, I dont see any way to refactor it to properly pass the identity. Here I am stuck on how this can be achieved in cats-effect?
It seems like Cats Effect today doesn't support any notion of fiber identity, at least via the public API. This has several interesting use cases:
Some other effect types already support fiber identity to some degree, so maybe there's room for some effect class here, but I don't really know how much abstraction can be captured.
Does any of this make sense and does it even fall under the scope of Cats Effect IO or its classes? Would be interested in fleshing it out more if so.