typelevel / fs2

Compositional, streaming I/O library for Scala
https://fs2.io
Other
2.37k stars 602 forks source link

Remove dependency on scalaz.concurrent #321

Closed pchlupacek closed 8 years ago

pchlupacek commented 9 years ago

Hi, this is just an initial idea. I would like to explore if we can remove the dependency on scalaz. Namely this is driven by fact that I would like to have full control of concurrent primitives (like Task, Future, and perhaps Actor and Strategy) in our code and don't be dependent on release cycles of scalaz for these.

What do you think guys? I would like to see scalaz-concurrent in our code and perhaps scalaz stuff to be in separate module of scalaz-streams.

djspiewak commented 9 years ago

There are some things from Scalaz that we need above and beyond the concurrency primitives. Most notably, abstractions like Monad and Functor, and utilities like Either. We can get those from Scalaz or we can get them from Cats if you prefer, but we do need them.

Having full control over our own primitives would be great though. Task is fantastic and beautiful and the fact that it's sort of tucked away inside of Scalaz has always been a shame. We lose some nice interop properties by having our own Task, but at the same time, if we're looking to cut the dependency, then interop isn't really a goal anymore.

rossabaker commented 9 years ago

I've been considering this as well, and have scalaz-stream compiling on cats (requires publishLocal of cats). Todo.scala lays out what is needed to make it real.

It's not a small job, but it's viable. The big decision, as I see it, is whether the base abstractions are sourced from cats-core or scalaz-core.

See also:

bryce-anderson commented 9 years ago

I'm also interested in a scalaz-stream that unhindered when using cats. I don't have any immediate technical comments that haven't already been discussed.

pchiusano commented 9 years ago

I'd rather just cut the dependency entirely, like @mpilquist has done for scodec. I do not want to move scalaz-stream from depending on scalaz to depending on cats (see below). Instead, I'd like to investigate:

The reason I'd rather just cut the dependency entirely is that I don't really want to pick sides in this whole mess of multiple projects competing to provide the same functionality. To the extent it is possible, I'd like whatever library "wins" to do so on its technical merits, not because of network effects by random projects like scalaz-stream choosing this or that library as their dependency. The only reason I've kept scalaz-stream depending on scalaz is when I last looked into it, it seemed pretty annoying to change. But, my head probably wasn't in it given how fried I was from dealing with scalaz drama, so I do think it's quite possible.

That said if @pchlupacek and/or @rossabaker would like to investigate breaking the dependency entirely, I would heartily endorse that effort! :) I myself do not have the bandwidth to work on it right now, though.

Now, the hardest part will be figuring out what to rename the project... :)

Just to clarify, I am totally fine continuing to depend on scodec-bits. That is a rock solid and stable dependency.

mpilquist commented 9 years ago

@pchiusano I am very happy to see this -- multiple repositories, one core along with one for each integration, sounds great. I'm happy to help with the conversion / dependency breaking.

pchlupacek commented 9 years ago

@pchiusano I would likely consider if we can't have core with 0 dependency. I like s-codecs, but perhaps having the bytes-xxx project as sort of module, may be more consistent. I understand that this is used only in io, so perhaps we may have io project that depends on s-codes.

pchiusano commented 9 years ago

Yes, I should say, anyone with an interest in this is welcome to help out, not just @pchlupacek and @rossabaker. :)

As a next step, I'd recommend that someone volunteer to take the lead in creating a new branch which removes scalaz as a dependency, and get a complete inventory of all the stuff missing. I'm guessing this branch will be in a noncompiling state for a while, but I'd still push the WIP in case it is possible to parallelize the work. (Like, we need sequence defined for Either, and these six other utility functions...)

pchiusano commented 9 years ago

I think I'd be okay with a multimodule project, with all the io stuff seprated, and it could depend on scodec-bits, with core having literally zero deps. But I don't have strong feelings either way. I feel pretty comfortable with the scodec-bits dependency just because it is so stable and slow-moving. If we were to do the separate io module, I'd do that as a separate effort from removing the scalaz dependency - they are orthogonal.

pchlupacek commented 9 years ago

yeah, was really just a proposition I am ok with this as it is as well. Is like removal of dependency on scala almost :-)

rossabaker commented 9 years ago

I'm definitely up for exploring the typeclass-lib-agnostic approach. It sounds wonderful on paper, but I envision many important functions will be exiled to duplicated across support modules (including Process.run!). Still, sharing any part of the core is better than a fork.

I will begin spiking at https://github.com/rossabaker/scalaz-stream/tree/topic/lean-core. Watch for either a PR or an admission of defeat soon. :)

rossabaker commented 9 years ago

The further I go toward removing scalaz-core in #322, the less appealing it becomes. It already requires a few specializations, and looks to require a few more, including interpreters for Process[Task, _]. One is quickly reminded why we have core type class dependencies. Also, the addition of new functionality that depends on type classes (like a new Process1) will not easily be enjoyed by those on the other side of the fence.

The library already supports Scalaz 7.0 and Scalaz 7.1 with git branches. My topic/cats branch also doesn't diverge much, and could be made more source compatible with syntax to reconcile differences such as pure vs. point. If we cut the scalaz-concurrency dependency, we could support any core library for which someone steps up to maintain a branch. It's essentially a second dimension of cross build, which sucks, but we already do something like it. We still have to "pick a winner" for master, but new additions that don't use exotic type classes will be useful in all branches.

A third approach would be to define our own core typeclasses and then have scataz-like modules to bridge to Scalaz, Cats, etc. The last thing I want is another monad trait in Scala. Instead of underabstracting like #322, it's overabstracting, but I'll put it on the table.

djspiewak commented 9 years ago

@rossabaker @pchiusano As noble of a goal as it is to have a completely dependency-free core and to avoid "picking a winner" in the Cats vs Scalaz deathmatch, I think in this case it might be a bit of a fool's errand. As Ross said, there's a reason why we have core typeclass dependencies in the first place.

Now, I can think of a couple of ways that we can make it manageable to publish a scalaz-stream artifact against both cats and scalaz, even without the current git branching scheme (which I'm not a fan of). I'm almost positive I can contort SBT into building multiple artifacts with different source directories. The majority of our sources can be in src/main/scala, and all of our cats/scalaz dependencies can be done through type aliases which are implemented in src/main/scala-scalaz7, src/main/scala-scalaz71 and src/main/scala-cats, respectively. It's not going to be the prettiest thing in the world from a build specification standpoint, but I'm pretty certain that it's possible.

Beyond that… I'm not sure that it's possible long-term to avoid "picking a winner" in the cats vs scalaz thing. Network effect is everything for any open source project, but especially an upstream framework. Frameworks don't win on technical merits; they win on community. That's just the nature of software, because it is in fact the nature of the people who write the software. As much as I'd like to see Cats succeed, I don't mind scalaz-stream having a hard dependency on scalaz. I would certainly rather have that than have to deal with crazy contortions in dependency resolution and/or specialized function implementations to avoid said dependencies.

So my preferences, in order, would be the following:

  1. Implement SBT voodoo to depend on ALL THE THINGS via source directory splits
  2. Stick with the hard dependency on scalaz, but extract Task into our own subproject so that we can fix stuff (e.g. interrupt semantics)
  3. Ditto the above, except for cats

The main reason that 3 comes below 2 is because we're already hard depending on scalaz, Task is part of scalaz, and in general the status quo is safer and lower risk.

My point is really that I don't think a dependency-free core is feasible. We can either pick a winner, or we can perform SBT magic to side-step that entire question, but I don't think we can shave our heads and withdraw from the World of the Abstracted.

mpilquist commented 9 years ago

Cross building could actually be worse for the community unless each cross-built JAR puts the types in discrete packages. Otherwise, we risk incompatibilities with downstream libraries -- imagine, for instance, http4s using scalaz-stream-scalaz and scodec-stream using scalaz-stream-cats, and an app that uses both.

djspiewak commented 9 years ago

Cross building could actually be worse for the community unless each cross-built JAR puts the types in discrete packages. Otherwise, we risk incompatibilities with downstream libraries -- imagine, for instance, http4s using scalaz-stream-scalaz and scodec-stream using scalaz-stream-cats, and an app that uses both.

I raised this point on the scalaz mailing list back when forking was proposed by Kmett. Ultimately, either Scalaz or Cats must win. Completely and utterly. If they both maintain a following but neither reaches "critical mass", then the community has the worst of all possible worlds.

tonymorris commented 9 years ago

No, really, neither "must win." In fact, they are not even competing. It is ludicrous to continue suggesting so.

pchiusano commented 9 years ago

Tony, tone it down please. We're having a discussion. Calling people's opinions ludicrous is unhelpful.

Anyway this is meant to be a discussion about what the scalaz stream project should do, and I'd like to keep it focused on that.

My feeling is that if the dependency can't be broken easily I'd rather stay with a scalaz dependency for the time being. Ross, thanks for your work, I'd like to review this week and see if there's maybe some other decent path forward. Also if other people have ideas please do pipe in!

Michael, your point about cross builds is a good one.

Honestly I can't really see myself wanting to build against multiple dependencies. I'd rather have zero dependencies, or just pick one. If someone would like to maintain a fork against a different dependency, then that is of course their right to do so. On Sun, Mar 1, 2015 at 7:49 PM Tony Morris notifications@github.com wrote:

No, really, neither "must win." In fact, they are not even competing. It is ludicrous to continue suggesting so.

— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/321#issuecomment-76642273 .

rossabaker commented 9 years ago

If we factor out scalaz-core dep at an accepted price now, I still see that cost steepening over time. The more anemic core makes it harder to build higher level modules. We see this effect already in text and tcp, struggling with the exile of repartition and translate from core for lack of foundational type classes.

The strategy that @djspiewak lays out is not uncommon in macro projects: src/main/scala is conditionally compiled with scala_2.10 and scala_2.11. It imposes a structural quarantine of the variable code, which is less flexible but easier to maintain than the git model. I'm not sure how to get the packaging @mpilquist suggests without extra hacks.

This extra dimension of cross building is suboptimal and frustrating, but this is where we are in early 2015. I see brilliant people bunkered down on both sides and still others straddling the fence. These strategies aren't desirable, but in this environment, I see them costing far less than a bifurcated community.

jedws commented 9 years ago

I'm not sure that this complete win is either particularly desirable or achievable. The two projects are not even really comparable (yet) and with Cats still to have any released artefacts the discussion of it maybe winning is currently hypothetical at best.

As far as the community totally adopting one or the other, the events of last year were enormously divisive, and some of the result of that would mean there is very little likelihood of that happening any time soon.

If there is a contest, as Paul said earlier it needs to be made on technical grounds as well as convenience. Currently the benefit of Cats seems to be that no-one else could possibly be using it, so we won't get version conflicts. While version conflicts are extremely painful in Scala, this is a short-term argument; presumably other project will start using it and it being a younger library it is more likely to have a more rapid release schedule, so this benefit recedes in inverse proportion to its popularity.

On 2 March 2015 at 08:36, Daniel Spiewak notifications@github.com wrote:

Cross building could actually be worse for the community unless each cross-built JAR puts the types in discrete packages. Otherwise, we risk incompatibilities with downstream libraries -- imagine, for instance, http4s using scalaz-stream-scalaz and scodec-stream using scalaz-stream-cats, and an app that uses both.

I raised this point on the scalaz mailing list back when forking was proposed by Kmett. Ultimately, either Scalaz or Cats must win. Completely and utterly. If they both maintain a following but neither reaches "critical mass", then the community has the worst of all possible worlds.

— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/321#issuecomment-76632353 .

mpilquist commented 9 years ago

@rossabaker To be clear, I'm not advocating for cross building. I'd much prefer to see this library with zero dependencies and compatibility modules.

rossabaker commented 9 years ago

Also to be clear, I am not advocating an exclusive or immediate switch. My branch way up in comment three is exploratory, so we downstream library authors and application developers understand and can plan to deal with the upstream situation. Besides the great schism, we have the production Scalaz 7.0 and 7.1, the imminent and binary incompatible Scalaz 7.2, and an active prototype of a source incompatible Scalaz 8.0.

I would also strongly prefer zero dependencies. Ideally, similar techniques could then be used in downstream libraries like http4s and doobie and remotely, and build an interoperable, minimally opinionated stack. But if that came without costly tradeoffs, I'm not sure why we'd have type classes at all. Now, scodec-bits did it. My question is how was it achieved there, and why does it apparently hurt here? Are we overlooking useful techniques, or was that just a simpler problem?

pchlupacek commented 9 years ago

folks can we make a list of MUST to have TypeClases etc. in core library? I mean these that the core implementation depends on? I think concurrent stuff is pretty easy to define, but I am kind a struggling to see if we really have that much usage of scalaz stuff that we really cannot put in scalaz module.

djspiewak commented 9 years ago

folks can we make a list of MUST to have TypeClases etc. in core library?

All of the interpreters either need to be built against a specific type (e.g. Task), or must have an array of typeclasses to provide operations on the otherwise abstract type constructor. Catchable and Functor seem like the obvious ones, but I think Monad might be needed in some cases. Monoid is needed as well with the current implementation.

rossabaker commented 9 years ago

The interpreters are the big one. There are a couple traverse_s in core. tcp benefits from ~>, and text benefits from Semigroup.

pchiusano commented 9 years ago

What do folks think about just specializing all the interpreters to Task? Obviously, it's less flexible, but it would mean we could avoid having to duplicate a bunch of typeclasses, and it seems like it might be the only way to get code dependency free. Honestly, I cannot recall a time where I've had to run a Process[F,_] for any F other than Task (or Nothing).

We would definitely still need translate, and ~>, since Task will be acting as the 'final object' that everything gets compiled to. But duplicating one 3 line class doesn't seem like a big deal. It's a shame Scala doesn't support rank 2 types natively... but anyway.

We could also if we really want just use ~> to accept unit and attempt as a first class values, again without having to bring in any typeclasses. unit : Id ~> F, etc. bind would need some two type parameter version of ~> I guess. This would be hideous, but it can be wrapped nicely for the common case of running Task. And if you want to run something other than a Task stream, you have to do something ugly, but at least it is possible.

On Mon, Mar 2, 2015 at 12:59 PM Daniel Spiewak notifications@github.com wrote:

folks can we make a list of MUST to have TypeClases etc. in core library?

All of the interpreters either need to be built against a specific type (e.g. Task), or must have an array of typeclasses to provide operations on the otherwise abstract type constructor. Catchable and Functor seem like the obvious ones, but I think Monad might be needed in some cases. Monoid is needed as well with the current implementation.

— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/321#issuecomment-76763401 .

mpilquist commented 9 years ago

@pchiusano +1 on specializing interpreters to Task.

djspiewak commented 9 years ago

While I'm generally in favor of abstraction, so much of the useful stuff in scalaz-stream is already specialized on Task (in particular, everything associated with concurrency), so it's not really much of a loss. In my experience, if you're using Process, you're almost certainly using Process[Task, _]. So… specializing on Task would not be the end of the world, especially if we can gain other (ideally significant) benefits from doing so.

rossabaker commented 9 years ago

It's not just interpreters, but it is mostly Task:

We'd also lose generic Channel.mapOut and Sink.toChannel syntax for lack of a Functor, but those could also be specialized on Task, I suppose.

pchiusano commented 9 years ago

I think handle and partialAttempt are unnecessary. They were introduced before onHalt / onFailure. I'm guessing they can be implemented in terms of onHalt, or just removed.

re Channel.mapOut and Sink.toChannel, I'd like to change the representation of Channel and Sink at some point. It should have been type Channel[F,A,B] = Process[F, A => Process[F,B]], which eliminates the need for the Functor. It's also somewhat awkward that channels have to return exactly one value for each input.

I'd probably just make runFoldMap take the binary operation and identity as regular arguments. Totally reasonable, and if the caller has a monoid, m, they can still call it easily enough.

I consider Nondeterminism to be a failed experiment, so I don't mind specializing there.

On Mon, Mar 2, 2015 at 3:01 PM Ross A. Baker notifications@github.com wrote:

It's not just interpreters, but it is mostly Task:

  • runFoldMap requires a Monoid. The others can all be specialized for Task and IndexedSeq, which is not a tremendous loss.
  • handle and partialAttempt also require specialization due to Catchable.
    • gatherMap/gather/sequence require specialization due to Nondeterminism.

We'd also lose generic Channel.mapOut and Sink.toChannel syntax for lack of a Functor, but those could also be specialized on Task, I suppose.

— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/321#issuecomment-76801315 .

runarorama commented 9 years ago

No thank you! We use Process extensively with other free monads (that may or may not eventually compile to Task). Specializing to Task would mean we would have to fork this library.

tonymorris commented 9 years ago

@jedws The Scalaz project is motivated by very different aspirations and goals to the cats library. It boggles my mind that we are talking about "competition." A library includes a Functor trait and now it is competing? Is that it? How weird.

I don't mind rewriting a stream library; if only to get away from the bloody nonsense!

/rant

rossabaker commented 9 years ago

Specializing to Task does not preclude other interpreters. I don't see why the existing monad/catchable interpreters couldn't still exist in Scalaz support.

pchiusano commented 9 years ago

Hang on, let's make sure we are talking about the same things here. Just to clarify, we will never specialize Process[F,A] to Process[Task,A]. So we won't change:

trait Process[F[_],A]

to:

trait Process[A]

That would be a huge step backward. Tons of code relies on the ability to use different F, including code internal to scalaz-stream itself, scodec-stream, and I'm sure tons of user code. So that will not change, @runarorama not sure if you were concerned about that.

We are just contemplating whether the runner(s) of Process, like runLog, could be specialized, at least in core. So rather than runLog working for any F with a Monad[F] and Catchable[F], it would be defined just for a Process[Task,A]. Also, as @rossabaker points out, there could be Monad/Catchable-generic versions of the various runners in the scalaz binding.

The reason I suspected specializing the runners to Task might not be much of a limitation in expressiveness is that if you have a monad, G, that you are using for the F in Process[F,A], you can sometimes (often? always?) either run the Process[G,A] to get a G[Blah], and then convert the G to a Task, or you can call translate on the Process[G,A] to get a Process[Task,A], and then run that. @runarorama or anyone else, do you have a concrete G where that doesn't work out, or a general class of examples? If so that would be really useful to think about. Since G also has to be Catchable for all the runner functions, it's going to have to be something Task or IO-like.

The only examples I could think of are basically things that are isomorphic to Env => Task[A], which can be handled via translate (this is the strategy used in scodec-stream and in the tcp module), which can bind Env. But perhaps I am just not very creative at coming up with examples. :)

pchlupacek commented 9 years ago

Well, I think for runners we can introduce type class ProcessRunner and in core library provide Task instance. Whereas others can live in scalaz/xxx bindings?

i.e.

def runLog(implicit runner:ProcessRunner[F,O]):F[IndexedSeq[O]] = runner.runLog

object Task {
  implicit def runner[O]: ProcessRunner[Task,O] = ??? 
}
pchiusano commented 9 years ago

Isn't ProcessRunner just going to be basically Monad + Catchable, though? Either that or all the ProcessRunner implementations duplicate the same logic... which is rather error prone.

On Mon, Mar 2, 2015 at 11:39 PM Pavel Chlupacek notifications@github.com wrote:

Well, I think for runners we can introduce type class ProcessRunner and in core library provide Task instance. Whereas others can live in scalaz/xxx bindings?

i.e.

def runLog(implicit runner:ProcessRunner[F,O]):F[IndexedSeq[O]] = runner.runLog object Task { implicit def runner[O]: ProcessRunner[Task,O] = ??? }

— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/321#issuecomment-76886066 .

rossabaker commented 9 years ago

One might summon a ProcessRunner from a Monad and a Catchable. I admit to not having explored this technique outside a trivial REPL example: https://gist.github.com/rossabaker/bf76b4d3449636a18c12

pchlupacek commented 9 years ago

@pchiusano yes, exactly. However we do not have monad + catchable in stream core, that's why we can introduce this. I don't think so we need Monad in streams core, but perhaps Catchable is reasonable TypeClass to include in streams core.

pchiusano commented 8 years ago

Closing. This is done in new design.