typelevel / cats-effect

The pure asynchronous runtime for Scala
https://typelevel.org/cats-effect/
Apache License 2.0
2k stars 513 forks source link

Fiber identity #828

Open RaasAhsan opened 4 years ago

RaasAhsan commented 4 years ago

It seems like Cats Effect today doesn't support any notion of fiber identity, at least via the public API. This has several interesting use cases:

  1. Fiber logging: There is a disconnect between existing JVM logging libraries and fiber-based runtimes (or more generally, asynchronous code). The loggers typically output the ID of the thread on which it is running, but oftentimes that name is useless since fiber execution can take place on potentially many threads. Being able to aggregate the logs produced by a fiber could be pretty powerful.
  2. Fiber-local state: Useful in situations where you want to pass along state for dealing with cross-cutting concerns, like passing along a trace ID that is exchanged in a chain of microservice calls. There is an argument that it suffers from similar problems as thread-local storage.
  3. Fiber ancestry: fiber forking generates a directed acyclic graph where there are directed edges between a parent and its children. Unsure if there are any compelling use cases for this but it could provide for some interesting analysis.
  4. Fiber tracing: There's already an issue open for this and it might not strictly fall under fiber identity, but I could see some overlap here in tracking state.

Some other effect types already support fiber identity to some degree, so maybe there's room for some effect class here, but I don't really know how much abstraction can be captured.

Does any of this make sense and does it even fall under the scope of Cats Effect IO or its classes? Would be interested in fleshing it out more if so.

djspiewak commented 4 years ago

Sorry for taking forever to loop back to this…

Broadly speaking, I agree with you wholeheartedly. Having some notion of fiber identity is important for a lot of reasons, but particularly for things like fiber locals and such. Doing this in a sane way is immensely complicated though and I haven't had time to think deeply about what it should look like. There's also a whole series of real risks when you do something like that (see: all the abuse ThreadLocal gets). Also how to do it in a way which is amenable to abstraction.

But in general I do think this is very much a problem worth tackling. Some of the things I'm thinking about:

Anyway, I very much think this falls under the scope of CE, we just have to figure out what it looks like and what properties it should have. I believe that both Monix and ZIO support fiber locals, so looking at what their answers are to the above (where relevant) is probably a good place to start in terms of figuring out what this should be. I also noticed you have a PR open, which is awesome and I'll try to look at it as soon as I can.

Overall, thank you for bringing this up! It's a really really important thing and well worth pushing on.

RaasAhsan commented 4 years ago

How does fiber identity relate to its ancestry tree?

Another concern I had here: if a fiber tracks its ancestry, or the runtime tracks the entire graph, how deep should it go? We should be careful to avoid memory leaks that could arise from patterns like fork loops.

What kind of control and information does self fiber introspection give?

In the PR I've got open right now, you only get access to the local state of the fiber. I could see fiber ID/ancestry tree, trace storage being exposed here as well. Placing that data Fiber handle is interesting: holders of the Fiber handle could also access it (which also brings up implications for synchronization and thread safety).

This metaphor holds pretty consistently, since join and cancel are in the hands of the supervisor.

Is it improper usage for a fiber other than a parent of a child to invoke join or cancel? Thinking the join/cancel tokens are placed in a Ref and somebody else invokes it.

I certainly agree with you here, it might be difficult to find suitable abstractions for these behaviors. I think figuring out what Monix and ZIO do in this realm is a good start, so I'll try to start scoping that out.

rossabaker commented 4 years ago

What is the ancestry of a fiber created by combining via the semigroup? That's a way to break the tree model without join.

SystemFw commented 4 years ago

What is the ancestry of a fiber created by combining via the semigroup? That's a way to break the tree model without join.

That's a good point although the Semigroup instance does use join internally

rossabaker commented 4 years ago

Ah, right, that goes through map2, which joins.

I assume pure Fibers would have identity? I wouldn't think of those as participating in a supervisory metaphor, but I also can't think why not.

RaasAhsan commented 4 years ago

In #836 , identity/state is associated with an execution of IORunLoop rather than a Fiber. Ancestry tree is pretty straightforward here: whoever calls start is the parent. Join graph feels more complicated since the parent/supervisor isn't necessarily the only one who can call join/cancel. You could easily delegate supervisory to another fiber:

for {
  f1 <- IO.delay(fibonacci(12)).start
  f2 <- f1.join.start
  a  <- f2.join
} yield a

A pure Fiber is an interesting case: it isn't logically associated with an execution of the run loop, so it could never accumulate things like state or trace stacks. But if it needs identity, maybe Fiber.apply has to be suspended in F to get access to run-time/global state

RaasAhsan commented 4 years ago

I've done some research on the locality features of ZIO and Monix. Here's what I've come up with from an initial pass.

Monix

ZIO

So there are some stark differences in how local state is propagated between parent and children fibers, unsure yet how we will unify that.

There is a law that both Monix and ZIO (as well as the CE PoC) satisfy today: identity/state must be preserved across asynchronous boundaries for the same fiber. So something like:

fiberId.get <-> F.shift *> fiberId.get
fiberRef.get <-> F.shift *> fiberRef.get

Some initial ideas for type classes (need better names):

RaasAhsan commented 4 years ago

I just realized that Monix behavior does resemble copy-on-fork semantics, so maybe another law here:

fiberRef.get <-> F.start(fiberRef.get).flatMap(_.join)

This along with the law preserving state across asynchronous boundaries described above would suffice to support most local context propagation needs.

milanvdm commented 2 years ago

Ive tried looking into this but very quickly hit a bump.

From what I can tell, a fiber-identity should be passed at least on the following functions: Fiber.join, Ref.get, and Defer.get.

To add this on the Fiber.join, seems to be possible with something as:

private[effect] case class WithFiberIdentity[+A](identity: UUID, ioe: IO[A]) extends IO[A] {
    def tag = 24
  }

For Ref.get, it seems to be pretty tricky. Since it is written in terms of the Sync interface, I dont see any way to refactor it to properly pass the identity. Here I am stuck on how this can be achieved in cats-effect?