typelevel / fs2

Compositional, streaming I/O library for Scala
https://fs2.io
Other
2.36k stars 602 forks source link

Blocking terminology #1097

Closed backuitist closed 6 years ago

backuitist commented 6 years ago

I find confusing the use of blocking throughout the documentation.

I would argue that the generally accepted definition of blocking means that a thread is waiting for some event. As we know, a thread might be something different depending on the runtime. In my eyes FS2 provides an environment in which "blocking" is cheap (like blocking a green thread), by means of continuations. Only that the underlying runtime isn't made of green thread, so blocking isn't the appropriate term, leading to the creation of a new term: semantically blocking.

As an FS2 user, working on the JVM I need to make conscious execution decisions (choosing an appropriate thread pool) depending on what kind of computation I'm dealing with (blocking or non-blocking). I believe that a semantically blocking operation is a foreign concept to most people working on the JVM, and documentation such as https://github.com/functional-streams-for-scala/fs2/blob/7ad702a21e4af95e188cff9a7352ff3bc17beabd/core/shared/src/main/scala/fs2/async/Promise.scala#L21 is confusing at best.

I'm therefore suggesting a different terminology: suspend.

mpilquist commented 6 years ago

Thanks for the feedback @backuitist! I tend to agree that use of "blocking" throughout FS2 documentation is confusing for folks working on the JVM. At the very least, I think we should commit to never mentioning "blocking" without the "semantic" modifier. I'm open to other names too, though I don't like suspend as I find that confusing with other uses of suspend in the library & upstream libraries.

backuitist commented 6 years ago

I don't like suspend as I find that confusing with other uses of suspend in the library & upstream libraries.

Do the other uses differ in semantic?

https://github.com/functional-streams-for-scala/fs2/blob/7ad702a21e4af95e188cff9a7352ff3bc17beabd/core/shared/src/main/scala/fs2/async/Promise.scala#L36

SystemFw commented 6 years ago

Yes, they do. suspend there means "suspending side effects", nothing to do with concurrency.

backuitist commented 6 years ago

Wouldn't you say that you're "suspending a side effect" by suspending a computation?

https://github.com/typelevel/cats-effect/blob/c15b00fb947dc75c0304c43d3dc65cfe73a27d3c/core/shared/src/main/scala/cats/effect/IO.scala#L623

In other words, suspending a side effect is a by-product of suspending a computation.

My impression is that it is widely understood that:

Examples of this are:

On the other hand, I can't find that many references of "suspending side effects" or "semantically blocking" in the literature, but I'm happy to be proven wrong :)

SystemFw commented 6 years ago

No, these mean two different things. By suspending side effects we mean that you can have any scala code block within an IO {..}, and the result is still referentially transparent. This is entirely orthogonal to semantic vs actual blocking: IO { Thread.sleep(1000); println("yo") }, suspends side effects, but it's still blocking a thread. On the other hand, an async call in javascript, kotlin or C-sharp with a call to a DB is not suspending side effects (it's not referentially transparent).

I agree that "suspend" is often used in the context of coroutines: it's not a bad name, it's just that we already use it everywhere in cats/cats-effect to describe the difference between say, IO and Future.

On the other hand,

blocking blocks the caller (a thread for instance) until something unblocks it (meanwhile it cannot do anything, things are halted)

But the crucial difference is between having the blocked computation unable to do anything, and having the underlying system thread be unable to do anything.

For example, in Haskell you talk about thread blocking liberally, even though it's only the underlying haskell thread that's semantically blocked, and not the OS thread. See for example the MVar docs: https://hackage.haskell.org/package/base-4.10.1.0/docs/Control-Concurrent-MVar.html

Note that for us, semantically blocking a thread is a closer description than a coroutine, from a user perspective the computation is just blocked until someone else unblocks it (see the scaladoc for Promise). We add semantically to mean that the OS/JVM thread is still free, but the mental model is the same (just better, more scalable/performant).

I agree that the term is confusing and I'm open to changing it, but suspend unfortunately won't do due to the overlap mentioned above :(

In the meantime, if you spot places where we use "blocking" without "semantically", PR's welcome :) I'm also planning to add a longer writeup on fs2 concurrency in the docs (I already have talks about it in preparation), which should hopefully make things clearer.

backuitist commented 6 years ago

This is entirely orthogonal to semantic vs actual blocking: IO { Thread.sleep(1000); println("yo") }, suspends side effects, but it's still blocking a thread.

The Thread.sleep is the computation being suspended and it does not block the thread doing the suspension.

[we use it to describe] the difference between say, IO and Future

I understand that Future breaks the referential transparency as it does not "suspend side effects", still it does suspend a computation. From that perspective IO and Future are similar.

In the Async SIP suspend is also being used with this definition: http://docs.scala-lang.org/sips/pending/async.html

It seems that capture is also used as a synonymous of suspend (e.g https://github.com/typelevel/cats-effect/issues/78). All I'm saying is that this terminology is confusing, and perhaps capture should be used instead of suspend, so that suspend can be used instead of semantically blocking.

PS: your previous comment is missing the "crucial difference" but I can imagine that you wanted to illustrate how Future breaks the referential transparency.

SystemFw commented 6 years ago

I pressed comment too soon, updated now.

The Thread.sleep is the computation being suspended and it does not block the thread doing the suspension.

I think we are talking about different things. when you do a Thread.sleep, a JVM thread is blocked, and not available for other things to run on it. So if you have a FixedThreadPool of 10 threads and you call Thread.sleep(3000), you have only 9 threads available for 3 seconds.

I understand that Future breaks the referential transparency as it does not "suspend side effects", still it does suspend a computation. From that perspective IO and Future are similar.

I don't understand this. If you are talking about suspend as in suspending side effects, IO does and Future doesn't, so they're not similar. If you are talking about submitting things to a Thread pool, Future does and IO doesn't, so they are also not similar.

Also, if you say that Future suspends a computation (and I don't precisely know what you mean), then we still can't use it as a synonym of semantically blocking, because Future does not do what we mean by semantic blocking. Semantic blocking is the equivalent of Await (note the capital case), but instead of blocking a Thread (as in, there's a thread less in the thread pool until the thing being awaited on completes), it leaves all the threads available while waiting.

backuitist commented 6 years ago

I don't understand this.

I'm talking about control. When a Future suspends a computation it gives the control back to the caller, i.e the caller is not blocked.

Well, I believe I've said enough about this :) so feel free to do whatever with it, but as you're mostly targeting the JVM you should understand that adding a foreign/non-standard terminology does not help with the already steep learning curve. Again I'd love to read proof backing your claims. For instance in this paper https://www.doc.ic.ac.uk/~dorchard/publ/haskell14-effects.pdf they talk about capturing effects not suspending effects. I'll add a ticket to cats-effect with a reference to this one.

SystemFw commented 6 years ago

I'm talking about control. When a Future suspends a computation it gives the control back to the caller, i.e the caller is not blocked.

If you are talking about suspending in that sense, then it's not a good term to describe what we mean by semantically blocking, because we want to express precisely the fact that the caller is waiting on a result.

However, your use of blocking is actually non-standard on the JVM, where blocking means blocking a thread (as in "JDBC is blocking", regardless of whether is in a Future or not).

Furthermore,

When a Future suspends a computation it gives the control back to the caller, i.e the caller is not blocked.

Is not actually necessarily true. All a Future does is submitting a Runnable to a threadpool. If the threadpool has only one thread, and the Runnable blocks the thread, than the previous Future is blocked as well. When running on JS, thinking about having only one thread is relevant (and again, Promise semantic blocking works also on a single-thread thread pool)

SystemFw commented 6 years ago

As for the suspending effects, you are not going to find similar terminology, because Scala is the only language that does purely functional programming but isn't pure. In particular, there is no need to suspend side effects in Haskell, because Haskell has no side effects. So Haskell has no equivalent of Sync.delay. The paper you linked is not using capture in any way that is related to this discussion (it's talking about fine grained representation of effects in the type instead), and indeed no Haskell paper will, because suspending side effects is solving a problem that Haskell cannot have, by design.

backuitist commented 6 years ago

Man, I hoped I wouldn't get an answer on a Saturday ;)

However, your use of blocking is actually non-standard on the JVM, where blocking means blocking a thread.

I never claimed otherwise. You misread my comment about Thread.sleep, it was in the context of your example IO { Thread.sleep }. Trust me I know what Thread.sleep does ;)

You mean:

for {
   a <- x.get // blocking
   b <- y(a) // blocked by the line above
} yield b

I mean

val a = x.get // suspend
println("hello") // executed immediately after

Is not actually necessarily true. All a Future does is submitting a Runnable to a threadpool. If the threadpool has only one thread, and the Runnable blocks the thread, than the previous Future is blocked as well.

You meant if the "threadpool" (= ExecutionContext) is implemented as running in the current thread (what you call directEC, trampoline is doing that too) then you do not get the control back immediately. This is an example of how an ExecutionContext should not be implemented.

This is what is expected from Future:

val f = Future { Thread.sleep(1000) } // return control to the caller by "suspending" the computation
println("I have control") // not blocked

no Haskell paper will, because suspending side effects is solving a problem that Haskell cannot have, by design.

In the language, but not in its implementation. Beside, I'm open to non-haskell papers. I find hard to believe that there isn't a standard terminology around effects.

SystemFw commented 6 years ago

You mean:

for {
   a <- x.get // blocking
   b <- y(a) // blocked by the line above
} yield b

I mean

val a = x.get // suspend
println("hello") // executed immediately after

Right, that's it actually. Your interpretation is not very helpful in a pure context, because IOs are just values, nothing is happening, and they have no side effects, just like returning 1 doesn't, so

def kill = IO(killJVM)

kill
println("still here")

Doesn't do anything, but if I'm writing docs for kill, I'd still say "kill kills the JVM", with the assumption that the reader is already familiar with the basics of purely functional programming. Similarly, the reason why:

val a = x.get
println("hello")

prints "hello" has nothing to do with promise, blocking, or threads, but just with the general fact that IOs are values, but in docs for Promise.get, it's far more useful to describe what happens when the resulting IO is bound (as in, in a flatMap), because that's how it's used, and in that case:

for {
   a <- x.get // blocking
   b <- y(a) // blocked by the line above
} yield b

holds.


you do not get the control back immediately. This is an example of how an ExecutionContext should not be implemented.

It's also how Javascript behaves, and the reason why things that actually block a thread (like IO.unsafeRunSync) cannot be used on scala.js


I find hard to believe that there isn't a standard terminology around effects.

Side effects, not effects. You might find it hard to believe, but:

My point isn't so much that your alternative words (like capture) are wrong, but that we do have a standard meaning of "suspend" in fs2, cats, and cats-effect to mean "suspend evaluation of side effects" and not "return control to the caller"

SystemFw commented 6 years ago

In the language, but not in its implementation.

The implementation there is completely different. Haskell has lazy evaluation, so the problem there is not so much having to suspend the side effects (they already "suspended" by full laziness), but having to constrain evaluation so that things are evaluated in the correct order, and not shared between computations like normal lazy values are (this is what Haskell's IO, based on a fake state monad with RealWorld does).

In Scala we have the opposite problem.

SystemFw commented 6 years ago

I'm now unclear on what aspect do you want us to change :)

Is it that:

for {
   a <- x.get // blocking
   b <- y(a) // blocked by the line above
} yield b

doesn't block a thread, but only awaits semantically? (which is what we mean by semantic blocking) And the context here is that nothing happens one way or the another until the IO is run.

or that a in

val a = x.get
println("hello")

Doesn't do anything?

I think the latter is equivalent to wanting us to document the side effects of get, in which case the docs would be the same for the whole of fs2: no side effects.

Again, I'm open to changing the former, but not with suspend, since it means something different :)

backuitist commented 6 years ago

I think we got a bit sidetracked here :)

My example of suspend was confusing... All I'm suggesting is that we rename blocking semantically by suspend and I completely understand why fs2 came up with this terminology.

It doesn't hurt my eyes to have

for {
  a <- promise.get // suspend the evaluation until the promise is fulfilled
  b <- somethingElse(a)
} yield b

Anyway, it doesn't seem that cats-effect will change its terminology, making my point here moot. Thanks for your input anyway.

SystemFw commented 6 years ago

Thank you for yours. If you can come up with another name for it, feel free to reopen :)

backuitist commented 6 years ago

I have in fact another name, coblocking (you just have to reverse the arrow).

Just kidding :) fs2 is great, thanks for the work ;)