typelevel / cats

Lightweight, modular, and extensible library for functional programming.
https://typelevel.org/cats/
Other
5.25k stars 1.21k forks source link

Proposal: A cats-effect project #1617

Closed djspiewak closed 7 years ago

djspiewak commented 7 years ago

Behold… cats-effect!

As a few people were already aware, I – along with @mpilquist, @rossabaker, and @tpolecat – have been collaborating on a practical IO type for cats. We believe that the lack of a "one true and holy" IO has been a significant unfortunate factor in the cats adoption curve. Additionally, it has caused no small amount of grief for the middleware ecosystem, herein represented by the dynamic duo of Ross and Rob. With cats 1.0 fast approaching, we all felt the time was right to address this issue once and for all.

I'm quite proud of this implementation of IO, and I would take it into production, but there are some problems solved by other implementations in this space which are simply not addressed by IO. Monix in particular has a significantly more featureful Task, and even fs2's Task addresses some points (notably relating to parallelism) which are intentionally ignored by IO. This type is really just intended to hit the sweet spot of a practical effect type for the JVM and JavaScript, while providing sufficient generic machinery to enable a smooth compatibility story and instantiation of generic types where relevant (e.g. fs2.Stream[IO, A] is a real thing and actually works just as well as Stream[Task, A] does).

To that end, IO supports lazy, nonmemoized suspension of both synchronous and asynchronous effects (similar to scalaz's, and really everyone's, Task). It supports running those effects as side-effects using various unsafe functions. It provides functions for converting to and from Scala's Future (unsafely, since Future both memoizes and runs eagerly). And it provides a hierarchy of typeclasses for characterizing generic effects and effect stacks (a feature aimed squarely at those for whom 🤘 is pronounced "M-T-L"), as well as a very thorough set of associated laws.

Why the Proposal?

A valid response to this issue would be, "Why is this an issue on cats?" Good question, hypothetical reader. The reason I'm filing this as an issue is to solicit feedback from the cats contributors. You'll notice that the fully-qualified name of this type is cats.effect.IO, and I have somewhat-speculatively set the maven groupId to org.typelevel. I wouldn't do something like that without asking for permission first. So… I'm asking permission!

I intend for cats-effect to be a repository separate from cats, with its own release cycle. It obviously has an upstream dependency on cats-core (and cats-effect-laws similarly depends on cats-laws), but this side-steps the possibility of IO causing version-breaking changes in cats itself. Also there are plenty of people who want to use cats but probably don't care about pure effects. No need to over-burden their classpath and incompatibility space!

I intend to apply the same rigor as cats in the areas of versioning, binary compatibility, code reviews, documentation, etc. Just because it's a separate repo doesn't mean it gets to play fast-and-loose with releases. If everyone is amenable to this proposal, I would consider cats-effect to be part of cats, just in a separate repository.

Hence, the more level-headed approach to project creation.

What Next?

Please, everyone weigh in with comments, remarks, insults, drama, and cat gifs. Ideally more of the last one than its precedent. I really want people's feedback, even if it's just "this name sucks" or "I hate this type signature".

I'm also particularly interested in further feedback from @alexandru, who is traveling over the next few days, but who I expect will have particular insight here. This project is very close in spirit to his recent effects4s effort, though the cats-core dependency as well as some differences in laws sets this project on a somewhat different trajectory.

If everyone is more or less amenable to cats-effect in its eventual state, I will make a formal proposal on typelevel/general to move the project under that umbrella. But, as de facto owners of the cats package namespace, y'all get first veto!

longcao commented 7 years ago

I really want people's feedback, even if it's just "this name sucks" or "I hate this type signature".

If you say so!

From an admittedly naive but interested party's perspective: what do you think is the relationship between cats-effect and effects4s going forward re: the aforementioned project trajectories? Would cats-effect integrate with effects4s? Or a consolidation in efforts?

djspiewak commented 7 years ago

@longcao It depends a bit on what the goals of effects4s are. At present, effects4s has a stated goal of abstracting over several third-party effect types, including scalaz.concurrent.Task and even potentially scala.util.Try. That goal is somewhat incompatible with the cats-core dependency in cats-effect. And indeed, that dependency is itself inextricable from cats-effect, since everything derives from MonadError!

If we moderate the goals of effects4s a bit and just look at the subset of the ecosystem which is likely to accept a cats dependency, then I think effects4s and cats-effect are actually pretty close, and could perhaps be brought together fully. Both cats-effect and effects4s provide a set of typeclasses for characterizing effects. One of the many benefits of those typeclasses, as effects4s points out, is the ability to implement seamless, dependency-free conversion functions between effect types (e.g. from fs2.Task to monix.Task). These are obviously quite cool! At present, the effects4s typeclass hierarchy is structurally quite different from the cats-effect hierarchy, but I don't feel that the differences are fundamental to the space.

I think Alex would be able to speak to this point more fully than I can, but as I said, he's traveling for Scala Days and probably won't be able to weigh in on this issue for a little while.

wedens commented 7 years ago

I think I'd generalize LiftIO to MonadBase:

trait MonadBase[B[_], M[_]] {
  def liftBase[A](base: B[A]): M[A]
}

But it will make this typeclass better suited for cats or new transmogrifier/cats-mtl library.

djspiewak commented 7 years ago

@wedens Thought about it. Two one problem with that. First, it doesn't extend Monad, so while it would be more appropriate for the new cats-mtl library, it becomes much less appropriate for something defining typeclasses for use in "non-MTL style". (e.g. MonadError) (edit: uh… apparently my memory sucks and LiftIO doesn't extend Monad either; so there's only the "second" problem)

The second problem is a bit more subjective: it's a lot less parametric. What you have there is literally FunctionK. There's not a lot implied by that type, directly, and there are certainly no laws we can apply to it other than associativity (in composition). LiftIO explicitly states "your type has to manage side effects" because it's forcing you, by way of parametricity, to either handle those side-effects or produce an error, because IO[A] literally means "here be side-effects!"

So I think the specific type is actually more useful than the general one.

wedens commented 7 years ago

Yeah, but type LiftIO[M[_]] = MonadBase[IO, M] gives you the same parametricity, doesn't it?

How fs2.Task.start will look like with IO?

djspiewak commented 7 years ago

@wedens But what does MonadBase give you? By that standard, type LiftIO[M[_]] = FunctionK[IO, M]

def start[F[_]: Effect, A](fa: F[A])(implicit EC: ExecutionContext): F[Unit] =
  fa.shift.runAsync(_ => IO.pure(())).liftIO[F]
djspiewak commented 7 years ago

For those interested in the argument against adding concurrency functions directly to IO (or written solely in terms of IO), I wrote a thing: https://gist.github.com/a775b73804c581f4028fea2e98482b3c

wedens commented 7 years ago

@djspiewak I may've misunderstood something, but fs2 uses Stream.bracket for resource safety, why can't IO.bracket be used?

djspiewak commented 7 years ago

@wedens That only works if you're avoiding the situation I describe, where resource acquisition is done in parallel with something else, and the resource scope is small, sequential and lexically-bounded. Also you would end up observing differences between left- and right-associated monadic binds, which is terrifying. Stream implements bracket using a much more powerful algebra under the surface, which avoids all of these issues. IO doesn't have a more powerful algebra under the surface, which is the whole problem.

alexandru commented 7 years ago

Hi folks,

The cats-effects proposal is interesting. Some notes ...

Copyright Header

/*
 * Copyright 2017 Daniel Spiewak
 *

I would change this copyright header. I know it's partly your work Daniel and I know that a couple of important Scala projects do this, however there are several problems:

  1. you might not actually own the whole copyright
    • IANAL, but such things get tricky and it's better to not claim it only in your name, especially because this type is inspired by other implementations
    • if you require that header to be added to all files in the project, which you should, what will happen if another contributor starts a new file? He will clearly own the copyright for that file, unless he assigned the copyright to you by some contract, which I'm fairly sure it won't happen
  2. I'm generally against adding your name on a source file, because for other contributors it's like a marked territory

As a piece of advice, I would say something along these lines:

/*
 * Copyright 2017 by its authors. Some rights reserved.
 *

Stack Safety

IO is stack-safe… to a point ... any IO constructed with async will not be stack-safe.

That's actually regrettable and no matter how you look at it, or what rationale you can come up for it, this is a genuine booby trap for users. Scalaz's Task is filled with booby traps and one reason for that is the behavior of Task.async, because even the authors of such operators as Nondeterminism[Task].mapBoth have forgotten about stack safety.

This is why in my proposal I've defined an Async type that has both create and unsafeCreate, one with stack safety mandatory, one without: https://github.com/effects4s/effects4s/blob/master/core/shared/src/main/scala/effects4s/Async.scala

This is then required by Async's laws: https://github.com/effects4s/effects4s/blob/master/laws/shared/src/main/scala/effects4s/laws/AsyncLaws.scala#L41

And if you have an unsafeCreate you can actually build a safe version out of it by using a little hack, see TrampolinedContext and async.safeCreate.

Resource Safety & Concurrency

@wedens @djspiewak

For those interested in the argument against adding concurrency functions directly to IO (or written solely in terms of IO), I wrote a thing: https://gist.github.com/a775b73804c581f4028fea2e98482b3c

Note that this argument does not apply to Monix's Task because of its Cancelable nature. Here's how a race condition would look like:

val fa = Task.pure("annoyingly fast computation")
val fb = Task.deferFuture(openSocket())

Task.chooseFirstOf(fa, fb).map {
  case Left((a, futureB)) => 
    futureB.cancel()
    a
  case Right((futureA, b)) =>
    futureA.cancel()
    b
}

What happens here is that you get a choice - whether to cancel the losing task, or to use its result sometime later. And here we are canceling it.

And this is in fact how Task.timeout is implemented. See the documentation.

Of course, this debate can go on forever. I've had an encounter with Daniel at NEScala and he really believes that for resource safety streaming is a better approach, while at the same time dismissing the cancelable trait of this Task implementation as being unsafe, like the one in Scalaz ... well Monix doesn't do what Scalaz does, and this debate can go on forever.

Personally I don't mind the IO in Cats remaining simple and not including concurrency features. In fact I welcome it, because it gives an incentive for other implementations to still exist.

Conflict with Existing Implementations

As mentioned above, I like the idea of this IO implementation to remain simple. That's not going to be easy, especially because with these projects we are doing a sort of "design by committee" and users will demand more.

When I first pushed for Monix to be a Typelevel project, I also started its Task implementation, because at that time there were no other implementations. FS2 went its separate way because of a difference in philosophy, but also because of a zero dependencies approach, also breaking up with Scalaz.

Now we find ourselves asking whether an official implementation should exist, however the obvious is this:

To put this in perspective, we currently have a healthy set of projects: http://typelevel.org/projects/ But where does one draw the line on standardization?

Will Http4s be the standard Typelevel HTTP toolkit? What if another HTTP toolkit is pushed as a Typelevel project? Will it get rejected, or will we need cats-http?

Just to be clear: I'm totally fine with a cats-effect exposing a standard IO type.

And I like the boundaries that you're trying to define (e.g. no concurrency stuff because it is unsafe and it's better left for alternative implementations). But just as when developing micro-services and shit like that, boundaries are incredibly hard to define and get eroded over time.

The Actual Conflict

Due to received feedback Monix can evolve to depend on Cats directly, but this comes with sacrifice because libraries like Quill, which might want to depend on Monix, won't necessarily want a Cats dependency. So a Typelevel project being integrated with Cats, while at the same time a standardization of one of its core features happens - means that Monix now has to wait to see what actually happens with cats-effect, because a Cats dependency might be like throwing the baby out with the bathwater.

This is just food for thought. I like the effort and will try to provide more helpful feedback the next time (currently writing this on airplane Wifi :))

djspiewak commented 7 years ago

would change this copyright header. I know it's partly your work Daniel and I know that a couple of important Scala projects do this, however there are several problems

The problem with changing the copyright header to be something like "by its authors" is that US copyright law dictates two things. First, copyright cannot be assigned without a "work for hire" agreement, which literally requires that money has changed hands. This doesn't happen in open-source, basically by definition. Second, in the event of a legal challenge on the IP, by law a minimum of 50% of the license holders must be formally notified. Mozilla ran into this buzzsaw a few years ago, and they literally had to track down thousands of people, since they hadn't arranged for licensing aggregation.

Now to be clear, the copyright header as it stands is also wrong, since it incorrectly claims that I own the copyright on all of it. I don't.

The correct thing to do is to assign licensing (though remember, copyright itself cannot be assigned) to a legal entity which is not an individual, such as Typelevel, which everyone agrees will represent all constituent copyright owners in the event of a legal challenge, without actually owning the copyright on anything within. But, I as a third party cannot just arbitrarily assign Typelevel this responsibility, since it is in fact a legal commitment that literally indemnifies.

So this is quite complex, largely because US copyright law is backwards (ironically, in a good way) from how most of the world does it. It's not as simple as an authorship or even a copyright question. For now, I'm erring on the side of "pretty much everyone does this", even if it's wrong. (and it is wrong!) When/if cats-effect becomes a Typelevel project, the copyright headers will be changed to remove my name.

That's actually regrettable and no matter how you look at it, or what rationale you can come up for it, this is a genuine booby trap for users.

Please see the readme for cats-effect. I explain two very realistic and common scenarios in which a default-forked async would be the booby trap, either from a performance or from a literal correctness standpoint. At the end of the day, if you're creating a bunch of instances using async on the same thread, you're very much doing it wrong. There is no reason you should do that: use apply (synchronous construction) instead! In any case where you are not affine to a single thread, stack safety is guaranteed by virtue of the fact that you're bouncing the stack when you shift.

So in other words, I'm choosing to put the booby trap in the "you shouldn't be doing this anyway" use-case, as opposed to putting it in the "you might plausibly do this" case. IMO, it's a saner default.

Note that this argument does not apply to Monix's Task because of its Cancelable nature.

This is absolutely correct. From a subjective standpoint, I don't want my effect primitive to have that functionality, but if it does have it (as Monix's does), then it most certainly can implement concurrency safely.

IMO, this difference is a feature, not a bug. Monix's Task isn't the same as IO, and their coexistence is not only justifiable, it is well motivated.

As mentioned above, I like the idea of this IO implementation to remain simple. That's not going to be easy, especially because with these projects we are doing a sort of "design by committee" and users will demand more.

I have absolutely no problem being an opinionated simplicity nazi. :-) Agreed that it's always going to be a battle.

mpilquist commented 7 years ago

FYI, FS2 1.0 will adopt cats and cats-effect. See https://github.com/functional-streams-for-scala/fs2/issues/848 for more details.

fanf commented 7 years ago

For reference, that comment (and the one below and the one after) helped me A LOT to understand the trade-off and the scope of cats-effect (and IO) respectively to Task (and esp. monix Task): https://github.com/typelevel/general/issues/66#issuecomment-292581402 With that knowledge, I much better understand the whole proposal.

alexandru commented 7 years ago

FYI, I gave a 👍 on this other thread: https://github.com/typelevel/general/issues/72#issuecomment-296229535

At the end of the day, if you're creating a bunch of instances using async on the same thread, you're very much doing it wrong.

The problem is that you also need an async boundary when calling the provided callback (so 2 async boundaries in total). And this is what catches users by surprise.

I disagree with this opinion, I still think this type-class should provide 2 versions of async because the "safe" version is possible to implement for any unsafe version and it would be cool to allow implementations to optimise that safe version.

However this isn't a blocker.

djspiewak commented 7 years ago

The problem is that you also need an async boundary when calling the provided callback (so 2 async boundaries in total). And this is what catches users by surprise.

This is very true, and I've made this argument myself! It's a tradeoff. I think the shift function does a good job (IMO) of walking the line and making it possible and indeed easy for users to get the semantics they want in either case. It's a far, far sight better than the old, broken fork function on scalaz's Task.

I disagree with this opinion, I still think this type-class should provide 2 versions of async because the "safe" version is possible to implement for any unsafe version and it would be cool to allow implementations to optimise that safe version.

However this isn't a blocker.

Yeah, let's thread-shift this (pun intended) over to cats-effect and work in terms of issues and PRs. Definitely a design decision that we can and should dig into more deeply. We still might never actually agree, but exploring the space is never a bad thing. :-)

tpolecat commented 7 years ago

It seems to me that there is sufficient support to move forward with this. @alexandru has stated that his concrete concerns are not blockers, and he is now a project contributor so I think we're in great shape. cc: https://github.com/typelevel/general/issues/72

johnynek commented 7 years ago

This seems great to me. One note: maybe it would be nice to add effects as a module in this repo. The reason is, the deeper the dependency chain becomes for upstream folks, the more pain you encounter syncing all the versions and getting through a publish cycle.

For things that depend on cats and maybe no other dependencies, I think adding them to this repo in a submodule is something we should strongly consider.

alexandru commented 7 years ago

@johnynek I would also like cats-effect to be a sub-module of Cats, but the concern here is that it is going to delay the Cats 1.0.0 release ... although I'm not sure if this would be a bad thing.

Maybe we should have cats-effect for Cats 1.0.0

djspiewak commented 7 years ago

@johnynek If we do that, then minor changes to IO affect the compatibility surface of cats-core, even when the actually core part of cats-core isn't changing. Clearly the inverse is always going to be true, but the more we couple things together, the worse the versioning story becomes. As a separate repo, we clearly have to do a breaking revision whenever cats-core does a breaking revision, but the reverse is not true.

Though I do agree that the publish cycle is the main downside to this. It's effectively poly- vs monorepo, and I'm very very opinionated as to which side of that debate I fall. :-)

tpolecat commented 7 years ago

My preference would be to add it as a module post-1.0 so we don't get tangled up just yet. Just as a matter of near-term risk management.

alexandru commented 7 years ago

cats-effect also has to try really hard to preserve binary compatibility - to make an analogy, what makes Scala's Future really valuable is that it is standard, doesn't break compatibility and this IO is the same thing: a standard type that users can rely on to be there and to be stable.

So tying it to cats-core is IMO desirable, but I agree that maybe it should be done after it gets stabilised, with all bugs and quirks solved and after Cats 1.0.0.

mpilquist commented 7 years ago

Totally agree with @alexandru that cats-effect must be very, very stable from binary compatibility perspective. I want a minimum of 1 year between breaking releases of cats-core and cats-effect. Ideally more like 2 years. More frequent changes will severely hurt fs2 ecosystem.

djspiewak commented 7 years ago

Since everyone seems 👍 on this in general, we're forging ahead! @larsrh has done the legwork and the project has been transferred over to typelevel. Further discussion on design decisions and proposed changes should take place over there.

We'll be using the typelevel/cats gitter channel for discussion, since I don't think it makes sense to fork off separately. Additionally, I'll be pushing a hash snapshot build as soon as Sonatype grants me publish permissions (generally doesn't take them too long).