Closed pchlupacek closed 8 years ago
There are some things from Scalaz that we need above and beyond the concurrency primitives. Most notably, abstractions like Monad
and Functor
, and utilities like Either
. We can get those from Scalaz or we can get them from Cats if you prefer, but we do need them.
Having full control over our own primitives would be great though. Task
is fantastic and beautiful and the fact that it's sort of tucked away inside of Scalaz has always been a shame. We lose some nice interop properties by having our own Task
, but at the same time, if we're looking to cut the dependency, then interop isn't really a goal anymore.
I've been considering this as well, and have scalaz-stream compiling on cats (requires publishLocal of cats). Todo.scala lays out what is needed to make it real.
Catchable
or Nondeterminism
don't yet.Either3
doesn't, and probably won't, exist in cats. Shapeless coproducts solve this, but that's another dependency.maximum
.\/
and friends are aliases, pending final decision on non/cats#189.It's not a small job, but it's viable. The big decision, as I see it, is whether the base abstractions are sourced from cats-core or scalaz-core.
See also:
I'm also interested in a scalaz-stream
that unhindered when using cats
. I don't have any immediate technical comments that haven't already been discussed.
I'd rather just cut the dependency entirely, like @mpilquist has done for scodec. I do not want to move scalaz-stream from depending on scalaz to depending on cats (see below). Instead, I'd like to investigate:
Task
into scalaz-stream, where we fully control it. I think this is beneficial anyway.Monad
etc are handy but aren't necessary internal to the library. There may be some annoying stuff though, like having to convert from \/
to using Either
and reimplement some functionality currently in scalaz/cats. The reason I'd rather just cut the dependency entirely is that I don't really want to pick sides in this whole mess of multiple projects competing to provide the same functionality. To the extent it is possible, I'd like whatever library "wins" to do so on its technical merits, not because of network effects by random projects like scalaz-stream choosing this or that library as their dependency. The only reason I've kept scalaz-stream depending on scalaz is when I last looked into it, it seemed pretty annoying to change. But, my head probably wasn't in it given how fried I was from dealing with scalaz drama, so I do think it's quite possible.
That said if @pchlupacek and/or @rossabaker would like to investigate breaking the dependency entirely, I would heartily endorse that effort! :) I myself do not have the bandwidth to work on it right now, though.
Now, the hardest part will be figuring out what to rename the project... :)
Just to clarify, I am totally fine continuing to depend on scodec-bits. That is a rock solid and stable dependency.
@pchiusano I am very happy to see this -- multiple repositories, one core along with one for each integration, sounds great. I'm happy to help with the conversion / dependency breaking.
@pchiusano I would likely consider if we can't have core with 0 dependency. I like s-codecs, but perhaps having the bytes-xxx project as sort of module, may be more consistent. I understand that this is used only in io, so perhaps we may have io project that depends on s-codes.
Yes, I should say, anyone with an interest in this is welcome to help out, not just @pchlupacek and @rossabaker. :)
As a next step, I'd recommend that someone volunteer to take the lead in creating a new branch which removes scalaz as a dependency, and get a complete inventory of all the stuff missing. I'm guessing this branch will be in a noncompiling state for a while, but I'd still push the WIP in case it is possible to parallelize the work. (Like, we need sequence
defined for Either
, and these six other utility functions...)
I think I'd be okay with a multimodule project, with all the io stuff seprated, and it could depend on scodec-bits, with core having literally zero deps. But I don't have strong feelings either way. I feel pretty comfortable with the scodec-bits dependency just because it is so stable and slow-moving. If we were to do the separate io module, I'd do that as a separate effort from removing the scalaz dependency - they are orthogonal.
yeah, was really just a proposition I am ok with this as it is as well. Is like removal of dependency on scala almost :-)
I'm definitely up for exploring the typeclass-lib-agnostic approach. It sounds wonderful on paper, but I envision many important functions will be exiled to duplicated across support modules (including Process.run
!). Still, sharing any part of the core is better than a fork.
I will begin spiking at https://github.com/rossabaker/scalaz-stream/tree/topic/lean-core. Watch for either a PR or an admission of defeat soon. :)
The further I go toward removing scalaz-core in #322, the less appealing it becomes. It already requires a few specializations, and looks to require a few more, including interpreters for Process[Task, _]
. One is quickly reminded why we have core type class dependencies. Also, the addition of new functionality that depends on type classes (like a new Process1
) will not easily be enjoyed by those on the other side of the fence.
The library already supports Scalaz 7.0 and Scalaz 7.1 with git branches. My topic/cats branch also doesn't diverge much, and could be made more source compatible with syntax to reconcile differences such as pure
vs. point
. If we cut the scalaz-concurrency dependency, we could support any core library for which someone steps up to maintain a branch. It's essentially a second dimension of cross build, which sucks, but we already do something like it. We still have to "pick a winner" for master, but new additions that don't use exotic type classes will be useful in all branches.
A third approach would be to define our own core typeclasses and then have scataz-like modules to bridge to Scalaz, Cats, etc. The last thing I want is another monad trait in Scala. Instead of underabstracting like #322, it's overabstracting, but I'll put it on the table.
@rossabaker @pchiusano As noble of a goal as it is to have a completely dependency-free core and to avoid "picking a winner" in the Cats vs Scalaz deathmatch, I think in this case it might be a bit of a fool's errand. As Ross said, there's a reason why we have core typeclass dependencies in the first place.
Now, I can think of a couple of ways that we can make it manageable to publish a scalaz-stream artifact against both cats and scalaz, even without the current git branching scheme (which I'm not a fan of). I'm almost positive I can contort SBT into building multiple artifacts with different source directories. The majority of our sources can be in src/main/scala
, and all of our cats/scalaz dependencies can be done through type aliases which are implemented in src/main/scala-scalaz7
, src/main/scala-scalaz71
and src/main/scala-cats
, respectively. It's not going to be the prettiest thing in the world from a build specification standpoint, but I'm pretty certain that it's possible.
Beyond that… I'm not sure that it's possible long-term to avoid "picking a winner" in the cats vs scalaz thing. Network effect is everything for any open source project, but especially an upstream framework. Frameworks don't win on technical merits; they win on community. That's just the nature of software, because it is in fact the nature of the people who write the software. As much as I'd like to see Cats succeed, I don't mind scalaz-stream having a hard dependency on scalaz. I would certainly rather have that than have to deal with crazy contortions in dependency resolution and/or specialized function implementations to avoid said dependencies.
So my preferences, in order, would be the following:
Task
into our own subproject so that we can fix stuff (e.g. interrupt semantics)The main reason that 3 comes below 2 is because we're already hard depending on scalaz, Task
is part of scalaz, and in general the status quo is safer and lower risk.
My point is really that I don't think a dependency-free core is feasible. We can either pick a winner, or we can perform SBT magic to side-step that entire question, but I don't think we can shave our heads and withdraw from the World of the Abstracted.
Cross building could actually be worse for the community unless each cross-built JAR puts the types in discrete packages. Otherwise, we risk incompatibilities with downstream libraries -- imagine, for instance, http4s using scalaz-stream-scalaz and scodec-stream using scalaz-stream-cats, and an app that uses both.
Cross building could actually be worse for the community unless each cross-built JAR puts the types in discrete packages. Otherwise, we risk incompatibilities with downstream libraries -- imagine, for instance, http4s using scalaz-stream-scalaz and scodec-stream using scalaz-stream-cats, and an app that uses both.
I raised this point on the scalaz mailing list back when forking was proposed by Kmett. Ultimately, either Scalaz or Cats must win. Completely and utterly. If they both maintain a following but neither reaches "critical mass", then the community has the worst of all possible worlds.
No, really, neither "must win." In fact, they are not even competing. It is ludicrous to continue suggesting so.
Tony, tone it down please. We're having a discussion. Calling people's opinions ludicrous is unhelpful.
Anyway this is meant to be a discussion about what the scalaz stream project should do, and I'd like to keep it focused on that.
My feeling is that if the dependency can't be broken easily I'd rather stay with a scalaz dependency for the time being. Ross, thanks for your work, I'd like to review this week and see if there's maybe some other decent path forward. Also if other people have ideas please do pipe in!
Michael, your point about cross builds is a good one.
Honestly I can't really see myself wanting to build against multiple dependencies. I'd rather have zero dependencies, or just pick one. If someone would like to maintain a fork against a different dependency, then that is of course their right to do so. On Sun, Mar 1, 2015 at 7:49 PM Tony Morris notifications@github.com wrote:
No, really, neither "must win." In fact, they are not even competing. It is ludicrous to continue suggesting so.
— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/321#issuecomment-76642273 .
If we factor out scalaz-core dep at an accepted price now, I still see that cost steepening over time. The more anemic core makes it harder to build higher level modules. We see this effect already in text
and tcp
, struggling with the exile of repartition
and translate
from core for lack of foundational type classes.
The strategy that @djspiewak lays out is not uncommon in macro projects: src/main/scala
is conditionally compiled with scala_2.10
and scala_2.11
. It imposes a structural quarantine of the variable code, which is less flexible but easier to maintain than the git model. I'm not sure how to get the packaging @mpilquist suggests without extra hacks.
This extra dimension of cross building is suboptimal and frustrating, but this is where we are in early 2015. I see brilliant people bunkered down on both sides and still others straddling the fence. These strategies aren't desirable, but in this environment, I see them costing far less than a bifurcated community.
I'm not sure that this complete win is either particularly desirable or achievable. The two projects are not even really comparable (yet) and with Cats still to have any released artefacts the discussion of it maybe winning is currently hypothetical at best.
As far as the community totally adopting one or the other, the events of last year were enormously divisive, and some of the result of that would mean there is very little likelihood of that happening any time soon.
If there is a contest, as Paul said earlier it needs to be made on technical grounds as well as convenience. Currently the benefit of Cats seems to be that no-one else could possibly be using it, so we won't get version conflicts. While version conflicts are extremely painful in Scala, this is a short-term argument; presumably other project will start using it and it being a younger library it is more likely to have a more rapid release schedule, so this benefit recedes in inverse proportion to its popularity.
On 2 March 2015 at 08:36, Daniel Spiewak notifications@github.com wrote:
Cross building could actually be worse for the community unless each cross-built JAR puts the types in discrete packages. Otherwise, we risk incompatibilities with downstream libraries -- imagine, for instance, http4s using scalaz-stream-scalaz and scodec-stream using scalaz-stream-cats, and an app that uses both.
I raised this point on the scalaz mailing list back when forking was proposed by Kmett. Ultimately, either Scalaz or Cats must win. Completely and utterly. If they both maintain a following but neither reaches "critical mass", then the community has the worst of all possible worlds.
— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/321#issuecomment-76632353 .
@rossabaker To be clear, I'm not advocating for cross building. I'd much prefer to see this library with zero dependencies and compatibility modules.
Also to be clear, I am not advocating an exclusive or immediate switch. My branch way up in comment three is exploratory, so we downstream library authors and application developers understand and can plan to deal with the upstream situation. Besides the great schism, we have the production Scalaz 7.0 and 7.1, the imminent and binary incompatible Scalaz 7.2, and an active prototype of a source incompatible Scalaz 8.0.
I would also strongly prefer zero dependencies. Ideally, similar techniques could then be used in downstream libraries like http4s and doobie and remotely, and build an interoperable, minimally opinionated stack. But if that came without costly tradeoffs, I'm not sure why we'd have type classes at all. Now, scodec-bits did it. My question is how was it achieved there, and why does it apparently hurt here? Are we overlooking useful techniques, or was that just a simpler problem?
folks can we make a list of MUST to have TypeClases etc. in core library? I mean these that the core implementation depends on? I think concurrent stuff is pretty easy to define, but I am kind a struggling to see if we really have that much usage of scalaz stuff that we really cannot put in scalaz module.
folks can we make a list of MUST to have TypeClases etc. in core library?
All of the interpreters either need to be built against a specific type (e.g. Task
), or must have an array of typeclasses to provide operations on the otherwise abstract type constructor. Catchable
and Functor
seem like the obvious ones, but I think Monad
might be needed in some cases. Monoid
is needed as well with the current implementation.
The interpreters are the big one. There are a couple traverse_
s in core. tcp
benefits from ~>
, and text
benefits from Semigroup
.
What do folks think about just specializing all the interpreters to Task
?
Obviously, it's less flexible, but it would mean we could avoid having to
duplicate a bunch of typeclasses, and it seems like it might be the only
way to get code dependency free. Honestly, I cannot recall a time where
I've had to run a Process[F,_]
for any F
other than Task
(or
Nothing
).
We would definitely still need translate
, and ~>, since Task
will be
acting as the 'final object' that everything gets compiled to. But
duplicating one 3 line class doesn't seem like a big deal. It's a shame
Scala doesn't support rank 2 types natively... but anyway.
We could also if we really want just use ~> to accept unit
and attempt
as a first class values, again without having to bring in any typeclasses.
unit : Id ~> F
, etc. bind
would need some two type parameter version of
~>
I guess. This would be hideous, but it can be wrapped nicely for the
common case of running Task
. And if you want to run something other than
a Task
stream, you have to do something ugly, but at least it is possible.
On Mon, Mar 2, 2015 at 12:59 PM Daniel Spiewak notifications@github.com wrote:
folks can we make a list of MUST to have TypeClases etc. in core library?
All of the interpreters either need to be built against a specific type (e.g. Task), or must have an array of typeclasses to provide operations on the otherwise abstract type constructor. Catchable and Functor seem like the obvious ones, but I think Monad might be needed in some cases. Monoid is needed as well with the current implementation.
— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/321#issuecomment-76763401 .
@pchiusano +1 on specializing interpreters to Task.
While I'm generally in favor of abstraction, so much of the useful stuff in scalaz-stream is already specialized on Task
(in particular, everything associated with concurrency), so it's not really much of a loss. In my experience, if you're using Process
, you're almost certainly using Process[Task, _]
. So… specializing on Task
would not be the end of the world, especially if we can gain other (ideally significant) benefits from doing so.
It's not just interpreters, but it is mostly Task
:
runFoldMap
requires a Monoid. The others can all be specialized for Task
and IndexedSeq
, which is not a tremendous loss.handle
and partialAttempt
also require specialization due to Catchable
. gatherMap
/gather
/sequence
require specialization due to Nondeterminism
.We'd also lose generic Channel.mapOut
and Sink.toChannel
syntax for lack of a Functor
, but those could also be specialized on Task
, I suppose.
I think handle and partialAttempt are unnecessary. They were introduced
before onHalt
/ onFailure
. I'm guessing they can be implemented in
terms of onHalt
, or just removed.
re Channel.mapOut and Sink.toChannel, I'd like to change the representation of Channel and Sink at some point. It should have been type Channel[F,A,B] = Process[F, A => Process[F,B]], which eliminates the need for the Functor. It's also somewhat awkward that channels have to return exactly one value for each input.
I'd probably just make runFoldMap take the binary operation and identity as
regular arguments. Totally reasonable, and if the caller has a monoid, m
,
they can still call it easily enough.
I consider Nondeterminism to be a failed experiment, so I don't mind specializing there.
On Mon, Mar 2, 2015 at 3:01 PM Ross A. Baker notifications@github.com wrote:
It's not just interpreters, but it is mostly Task:
- runFoldMap requires a Monoid. The others can all be specialized for Task and IndexedSeq, which is not a tremendous loss.
- handle and partialAttempt also require specialization due to Catchable.
- gatherMap/gather/sequence require specialization due to Nondeterminism.
We'd also lose generic Channel.mapOut and Sink.toChannel syntax for lack of a Functor, but those could also be specialized on Task, I suppose.
— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/321#issuecomment-76801315 .
No thank you! We use Process
extensively with other free monads (that may or may not eventually compile to Task
). Specializing to Task
would mean we would have to fork this library.
@jedws The Scalaz project is motivated by very different aspirations and goals to the cats library. It boggles my mind that we are talking about "competition." A library includes a Functor
trait and now it is competing? Is that it? How weird.
I don't mind rewriting a stream library; if only to get away from the bloody nonsense!
/rant
Specializing to Task
does not preclude other interpreters. I don't see why the existing monad/catchable interpreters couldn't still exist in Scalaz support.
Hang on, let's make sure we are talking about the same things here. Just to clarify, we will never specialize Process[F,A]
to Process[Task,A]
. So we won't change:
trait Process[F[_],A]
to:
trait Process[A]
That would be a huge step backward. Tons of code relies on the ability to use different F
, including code internal to scalaz-stream itself, scodec-stream, and I'm sure tons of user code. So that will not change, @runarorama not sure if you were concerned about that.
We are just contemplating whether the runner(s) of Process
, like runLog
, could be specialized, at least in core. So rather than runLog
working for any F
with a Monad[F]
and Catchable[F]
, it would be defined just for a Process[Task,A]
. Also, as @rossabaker points out, there could be Monad
/Catchable
-generic versions of the various runners in the scalaz binding.
The reason I suspected specializing the runners to Task
might not be much of a limitation in expressiveness is that if you have a monad, G
, that you are using for the F
in Process[F,A]
, you can sometimes (often? always?) either run the Process[G,A]
to get a G[Blah]
, and then convert the G
to a Task
, or you can call translate
on the Process[G,A]
to get a Process[Task,A]
, and then run that. @runarorama or anyone else, do you have a concrete G
where that doesn't work out, or a general class of examples? If so that would be really useful to think about. Since G
also has to be Catchable
for all the runner functions, it's going to have to be something Task
or IO
-like.
The only examples I could think of are basically things that are isomorphic to Env => Task[A]
, which can be handled via translate
(this is the strategy used in scodec-stream and in the tcp
module), which can bind Env
. But perhaps I am just not very creative at coming up with examples. :)
Well, I think for runners we can introduce type class ProcessRunner and in core library provide Task instance. Whereas others can live in scalaz/xxx bindings?
i.e.
def runLog(implicit runner:ProcessRunner[F,O]):F[IndexedSeq[O]] = runner.runLog
object Task {
implicit def runner[O]: ProcessRunner[Task,O] = ???
}
Isn't ProcessRunner
just going to be basically Monad
+ Catchable
,
though? Either that or all the ProcessRunner
implementations duplicate
the same logic... which is rather error prone.
On Mon, Mar 2, 2015 at 11:39 PM Pavel Chlupacek notifications@github.com wrote:
Well, I think for runners we can introduce type class ProcessRunner and in core library provide Task instance. Whereas others can live in scalaz/xxx bindings?
i.e.
def runLog(implicit runner:ProcessRunner[F,O]):F[IndexedSeq[O]] = runner.runLog object Task { implicit def runner[O]: ProcessRunner[Task,O] = ??? }
— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/321#issuecomment-76886066 .
One might summon a ProcessRunner from a Monad and a Catchable. I admit to not having explored this technique outside a trivial REPL example: https://gist.github.com/rossabaker/bf76b4d3449636a18c12
@pchiusano yes, exactly. However we do not have monad + catchable in stream core, that's why we can introduce this. I don't think so we need Monad in streams core, but perhaps Catchable is reasonable TypeClass to include in streams core.
Closing. This is done in new design.
Hi, this is just an initial idea. I would like to explore if we can remove the dependency on scalaz. Namely this is driven by fact that I would like to have full control of concurrent primitives (like Task, Future, and perhaps Actor and Strategy) in our code and don't be dependent on release cycles of scalaz for these.
What do you think guys? I would like to see scalaz-concurrent in our code and perhaps scalaz stuff to be in separate module of scalaz-streams.