typelevel / cats-effect

The pure asynchronous runtime for Scala
https://typelevel.org/cats-effect/
Apache License 2.0
2.03k stars 520 forks source link

Why no withMVar, modifyMVar, modifyMVar_? #451

Closed schernichkin closed 5 years ago

schernichkin commented 5 years ago

Mutating MVars properly is a common task in multithread programming. In Haskell such functions are absolutely necessary, because in presence of asyncronous exceptions you should be extremely carefull with masking and unmasking. But even in cats.effect I see modifyMVar_ as

def modifyMVar_[F[_], A](mv: MVar[F, A])(f: A => F[A])(implicit F: Bracket[F, _]): F[Unit] = 
    F.bracketCase(mv.take)(a => f(a) >>= mv.put)((prev, e) => e match {
        case ExitCase.Error(_) | ExitCase.Canceled => mv.put(prev)
        case ExitCase.Completed => ().pure[F]
    })

to handle cancelation and synchronous exceptions correctly. So, why not in API, is it intentional decision?

alexandru commented 5 years ago

The function you've described is not atomic. If that function exists in Haskell, it isn't atomic either, because MVar is described by put, take and read.

And this makes it dangerous.

Wouldn't such a use-case be better for Ref?

schernichkin commented 5 years ago

Yes, it's not atomic in presence of other producers (what stated in Haskell's documentation), however it widely used for synchronizing access to mutable structures. In particular when multiple writers want to update single variable they only allowed to do so via modifyMVar, no sole take and put operations allowed.

A practical example can be found here. As you may see all operations synchronized using withMVar/modifyMVar and no sole take/put functions.

SystemFw commented 5 years ago

Right, but we have Ref, which is a powered up version of atomicModifyIORef, and that's a better fit for that use case I think

schernichkin commented 5 years ago

But Ref not suitable for protecting mutable resources, while MVar allows it.

SystemFw commented 5 years ago

If you mean cases where you actually need a mutex, you can use Semaphore.withPermit, or build something with Ref + Deferred like this one: https://github.com/ovotech/fs2-kafka/blob/master/src/main/scala/fs2/kafka/internal/Synchronized.scala

schernichkin commented 5 years ago

If you mean cases where you actually need a mutex, you can use Semaphore.withPermit

With mutex I can build any synchronization primitive, by this logic I do not need MVar at all :) The idea was to express other primitives using MVars, because it easy to do.

or build something with Ref + Deferred like this one

It's so complicated that I can't even understand how it exactly works. Using MVar it could be:

def withMVar[B](mvar: )(f: A => F[B])(implicit F: Bracket[F, _]): F[B] = {
   F.bracket(mvar.take)(f)(mvar.put)
}

object Synchronized {
  def of[F[_]: Concurrent, A](a: A) = MVar.of(a).flatMap { mv =>  
      new Synchronized[F[_], A] {
         override def use[B](f: A => F[B]): F[B] = withMVar(mv)(f)
      }
    }
}
SystemFw commented 5 years ago

, by this logic I do not need MVar at all

In fact my opinion is exactly that you don't need MVar at all, and using Ref + Deferred is better (Semaphore itself is done with Ref + Deferred, for example).

t's so complicated that I can't even understand how it exactly works.

Ref + Deferred is a way simpler model than MVar, because there are fewer state transitions and fewer places you can block, therefore fewer places where you can deadlock, which is quite easy to do with MVar.

The idea was to express other primitives using MVars, because it easy to do.

I quite strongly disagree here, and this is evident when you try and build actual complicated structures like Queues and Topics, which is why I introduced Ref + Promise in fs2 instead of going with MVar (and they then made it into cats effect, and Promise was renamed to Deferred). Scalaz IO used to have MVar as well, and they also moved to a model based on Ref + Promise because it's way simpler. You just need to learn how to design things with it, just like you have to learn how to design things with MVar. I have done both, and that's why I don't like MVar anymore :) (although obviously there are always going to be cases where one thing is better than the other)

wedens commented 5 years ago

It's very easy to get a deadlock with withMVar: withMVar(mvar)(_ => mvar.take). This mvar.take can be somewhere deep in your f.

SystemFw commented 5 years ago

It's just very easy to get a deadlock with MVar in general: An MVar is actually quite a complicated primitive, because it has transitions in both ways (empty to full and full to empty), an unlimited number of times, and with (semantic) blocking in both directions. If you consider that most primitives require more than one MVar, you quickly need to keep in mind a pretty big state space of potentially blocking transitions. This is made ever more complex by the fact that some things that in theory don't require waiting, like concurrent mutation, are done via semantically blocking primitives in MVar, case in point modifyMVar done with put and take.

At the heart of the issue is the fact that MVar conflates two different things: concurrent mutation, and synchronisation.

The Ref + Deferred model instead is very orthogonal: you have the simplest primitive possible for concurrent mutation (Ref) and the simplest possible primitive for synchronisation (Deferred). Complex behaviour is then built compositionally using those two, and the state space is reduced: Deferred only has one state transition (empty to full, and only once), and one place you can block (get on empty). If you need more transitions, you manage them compositionally using Ref , which offers a simpler model for concurrent mutation: atomic modify with no waiting.

So basically everything becomes Ref[F, DataStructureWithDeferreds], which gets a lot more manageable, both in theory (given the argument about orthogonality I just gave) and in practice (given in depth anectodical experience designing and implementing primitives with both in fs2 and Haskell).

schernichkin commented 5 years ago

@SystemFw So, is your point is not to introduce additional utility functions to discourage people from using MVars? (btw thanks for Ref + Deferred pattern, I've finally got it)

@wedens MVars are not deadlock-free by design. One can easy deadlock with mvar.take >>= _ => mvar.take :)

SystemFw commented 5 years ago

Another thing with your Synchronisation implementation with MVar compared to the one I linked you to is the behaviour wrt cancelation. In the MVar version you cannot try to acquire the Synchronisation with a timeout (because bracket acquire is uncancellable, and you take in there), whereas the Ref + Deferred version allows you to do that.

Semaphore had the same problem, and it required an internal fix.

is your point is not to introduce additional utility functions to discourage people from using MVars

No, I'm not that strongly against them (even though fs2 doesn't use them and I wouldn't use them), but if the utility functions actually encourage problematic behaviour (like they do in this case) and we have alternatives (like we do in this case), then I'd rather not have them, so that I don't have to explain to users not to use them. Other people might have a different opinion, which is why MVar is in cats-effect in the first place.

But basically I'm just answering the question:

So, why not in API, is it intentional decision?

to which the answer is yes :)

schernichkin commented 5 years ago

and we have alternatives (like we do in this case)

So, what about bringing these alternatives to the library? Something like mentioned Synchronized with use[B](f: A => F[B]): F[B] implemented with Ref + Deferred? Or, even modify[B](f: A => F[(A, B)]): F[B]?

Because the first thing which newcomer will look at if he will need "functional mutex" will be MVar because many people know about them from Haskell. Not finding bracketed functions where, he will probably reimplement or ask here as I did.

SystemFw commented 5 years ago

Something like mentioned Synchronized with

Well, but in this case you can just do Semaphore.withPermit

alexandru commented 5 years ago

Btw, I'm looking at the given function and I was initially saying that it isn't atomic:

def modifyMVar_[F[_], A](mv: MVar[F, A])(f: A => F[A])(implicit F: Bracket[F, _]): F[Unit] = 
    F.bracketCase(mv.take)(a => f(a) >>= mv.put)((prev, e) => e match {
        case ExitCase.Error(_) | ExitCase.Canceled => mv.put(prev)
        case ExitCase.Completed => ().pure[F]
    })

However if take succeeds, then other actors are blocked from doing take. Other actors are only allowed to put. So this function is safe only as long as take >> put is the sequence all actors are using.

πŸ€”

alexandru commented 5 years ago

In other words, maybe it's not a bad function to have in some contexts. Maybe with big warnings in the ScalaDoc.

schernichkin commented 5 years ago

@SystemFw Don't you think that specialized implementation of modify[B](f: A => F[(A, B)]): F[B] on Ref + Deferred could be faster and simpler? I will also need separate variable to store mutable state with semaphore, I don't need more than one permit. Few minutes ago you criticized mvars but now suggest primitive which has even more state transitions:))

alexandru commented 5 years ago

Ah, related to what I said F.bracket(mvar.take)(...)(mvar.put) will not behave like Semaphore's withPermit because this will make mvar.take to be uninterruptible and this can be a problem, because take also waits for a value to be available.

SystemFw commented 5 years ago

Few minutes ago you criticized mvars but now suggest primitive which has even more state transitions:))

More variables β‰  more state transitions. If you have a Ref plus a Semaphore.withPermit there are no state transitions visible to you (other usage patterns of Semaphore are worse than MVar though, agreed on that). That being said, my preference is to build things with Ref + Deferred directly, which is why that Synchronisation thing looks the way it looks (I helped design it) and doesn't use Semaphore.

I guess the question is: do we want a Synchronisation thing in cats effect? Note that often, you can use Ref + Deferred directly to build whatever you need to build withSynchronisation instead of using that. The use case for that is literally synchronised access to a mutable thing (like a Kafka consumer), which isn't that common of a use case, and if you need to do mutation on an immutable thing, then it's Ref again :)

schernichkin commented 5 years ago

The use case for that is literally synchronised access to a mutable thing (like a Kafka consumer), which isn't that common of a use case

Why do you think it uncommon? I see following use cases:

It just capturing idea of acquiring lock, performing effectful action on locked resource and releasing log regardless of exception/cancelation. Just like Resource do with disposable resources.

SystemFw commented 5 years ago

Well, it's not like use cases don't exist, and in fact we do have Semaphore.withPermit, which is simple to use and behaves well wrt cancelation. Note that both your use cases are covered by it, although I'd argue case 1 is not that common (either you use Ref, or you don't have concurrency, or you have a data structure that already supports concurrency like ConcurrentMap).

schernichkin commented 5 years ago

The final question. MVars was introduced to Haskell as basic block for building complex syncronization structures (simulating mutexes, building chans, qsems, even I-Var implementations use MVars internally 1 2)

Since cats-effect provides Ref and Deferred naturally and most basic MVar patterns will not handle cancelation well (for instance, everything performing blocking operations in bracket's acquire/release blocks) is there any use for MVars at all? If there is no syncronization patterns which can be expressed by MVar but not by Ref + Deferred (I guess it's quite easy to prove by expressing MVar itself with Ref and Deferred) should not MVars be deprecated?

P.S. Digging to the origins of MVars I've found ID Language Reference Manual where it seems first was introduced. Original M-structures has slightly different semantics (they dont block on put, putting to non-empty M-structure result runtime error). But the most interesting thing is:

In typical usage of M-structure components, several concurrent computations share a resource. Each computation takes it, computes with it, and puts it back. The semantics guarantees that each computation has exclusive access to the resource. Note: for correct operation, every take must be followed by a put , i.e., it is the programmer's responsibility to make sure that these operations are balanced.

So, bracketing MVars with get/put operations to guarantee that every taken MVar will be putted back is exactly how MVar's initially supposed to be used. At since everyone here considers that these

functions actually encourage problematic behaviour

probably MVars just not suitable for cats-effect computation model?

jdegoes commented 5 years ago

@schernichkin What you want is ZIO's RefM, which looks almost exactly like Ref but the update / modify functions can be effectful. So you can use mutable data in the Ref or just effectfully change immutable data.

alexandru commented 5 years ago

@schernichkin

most basic MVar patterns will not handle cancelation well (for instance, everything performing blocking operations in bracket's acquire/release blocks) is there any use for MVars at all?

Of course there are, at the end of the day MVar is like a BlockingQueue(size=1) and you can model all sorts of producer/consumer pipelines with it.

As to the problems we are having with bracket, that's actually a problem with bracket that we'll have to solve either by introducing a version of bracket that doesn't execute acquire uninterruptibly, or by introducing versions of put / take that work well with this bracket (which is possible if you change their signatures) ... this later approach is what we did in Semaphore's withPermit, but it's for the moment internal.

If there is no syncronization patterns which can be expressed by MVar but not by Ref + Deferred (I guess it's quite easy to prove by expressing MVar itself with Ref and Deferred) should not MVars be deprecated?

So first of all you don't even need Deferred, you just need Ref. Stop thinking for a second at Haskell and think of mainstream languages instead. A Ref is nothing more than an AtomicReference, which in itself is nothing more than a variable that's being read with volatile semantics or modified via compareAndSet or getAndSet which get translated to platform intrinsics.

This is the primitive building block for doing synchronization on top of the JVM and most platforms similar to the JVM, being linked directly to the JVM's memory model (the JMM). You can build locks and semaphores with atomic references. You can do shared-transactional memory in terms of atomic references. You can do pretty much everything.

Forget Deferred, because in fact Deferred too can be expressed with Ref.

But the problem is that Ref is often too low level, plus solutions built on Ref can suffer from performance issues too. Building a concurrent queue on top of Ref for example is possible, but has poor performance characteristics compared with something built to take advantage of Java's memory model directly.

So no, we should not deprecated MVar. It's an awesome tool imo.


@jdegoes

@schernichkin What you want is ZIO's RefM, which looks almost exactly like Ref but the update / modify functions can be effectful. So you can use mutable data in the Ref or just effectfully change immutable data.

We are in the context of the Cats-Effect project πŸ˜‰ so if you'd like to propose RefM for Cats-Effect, I'd be happy to hear it.

SystemFw commented 5 years ago

I think we need to distinguish between implementation concerns, and what sort of model we'd like to propose for our users, for most of which our abstractions really appear as "primitives".

Implementation wise, yeah all you need is Ref (AtomicReference) and async/asyncF. Everything else can be implemented on top of it. In terms of the programming model however, I actually disagree with

So no, we should not deprecate MVar. It's an awesome tool imo.

I think it's just as low level as the alternative Ref + Deferred model, and way more error prone (although obviously there are going to be some exceptions in select use cases). So, if it was entirely up to me and we didn't have any concern about breaking code for our users, I wouldn't have any issues deprecating MVar. Fortunately, both these conditions don't hold :)

ut has poor performance characteristics compared with something built to take advantage of Java's memory model directly.

I think this point is fairly orthogonal to this discussion, and also various caveats about "poor" apply, although I'd definitely agree with "poorer.

We are in the context of the Cats-Effect project πŸ˜‰ so if you'd like to propose RefM for Cats-Effect, I'd be happy to hear it.

Agreed. Also, I think I like Synchronization's implementation strategy better, seems lighter weight (RefM is basically Actor, so you have a queue and a consumer fiber).

probably MVars just not suitable for cats-effect computation model?

@schernichkin Note that the recommendation in Haskell is to not use MVar by default, but STM, exactly because coordinating several MVars gets hard really quickly. MVar should be preferred for lower level things, especially when you need fairness, which is its main selling point (according to Simon Marlow)

jdegoes commented 5 years ago

Of course there are, at the end of the day MVar is like a BlockingQueue(size=1) and you can model all sorts of producer/consumer pipelines with it.

That's why MVar should be deleted IMO. It's a special case of a blocking queue. And people wouldn't think to (abuse) a "queue" to do some of the things they are tempted to do with MVar (e.g. modifyMVar).

We are in the context of the Cats-Effect project πŸ˜‰ so if you'd like to propose RefM for Cats-Effect, I'd be happy to hear it.

Maybe after cats-concurrent exists. πŸ˜†

schernichkin commented 5 years ago

@jdegoes Thanks, ZIO looks promising.

@alexandru

that's actually a problem with bracket that we'll have to solve either by introducing a version of bracket that doesn't execute acquire uninterruptibly

Haskell took this approach that’s why withMVar works fine in Haskell. But it ended up with mask which masks asynchronous exceptions but still can be interrupted if blocking function invoked and uninterruptibleMask which will unconditionally mask all exceptions. Not the best choice in my opinion, you may never know if computation uses blocking functions somewhere internally.

Stop thinking for a second at Haskell and think of mainstream languages instead.

Every programming language relies on synchronization primitives provided by OS to implement non-busy waiting, including Haskell. You could express locking primitives with sole CAS and sleep() but sleep() itself should be provided by OS. And I’m talking about something practical. From this point of view sole Ref is not enough.

SystemFw commented 5 years ago

and sleep() but sleep() itself should be provided by OS

That's not true. green threading systems like the ones in Haskell or cats-effect can express sleep, and semantic waiting in general, provided you have

Once you have Ref and asyncF/cancelableF, you can build Deferred, and from there you can build the rest.

EDIT: actually, strictly speaking, you don't even need a threadpool, since cats-effect concurrency works on scala.js too

schernichkin commented 5 years ago

How you going to pass control to OS when application has nothing to do? sleep() (or any other non-busy wait mechanics) should be hidden somewhere, possibly in thread scheduler, and on the OS level it should call something like HLT. Otherwise you will end up in spinlocks, boiling up processor.

SystemFw commented 5 years ago

How you going to pass control to OS when application has nothing to do? sleep() (or any other non-busy wait mechanics) should be hidden somewhere, possibly in thread scheduler, and on the OS level it should call something like HLT. Otherwise you will end up in spinlocks, boiling up processor.

Again, no. You are describing Thread.sleep, i.e. actual thread blocking, whereas IO.sleep or MVar.take or Deferred.get don't block a thread. The key idea is that what you see as blocking on one level is actually suspending on the level below.

So when your fiber "sleeps" it's literally giving control back to the thread pool after encountering an Async node in its runloop, giving a chance to other fibers to run . Maybe I misunderstood your question, you do need one thing that can schedule things, (for example a ScheduledThreadExecutor), but you do not need a Thread.sleep

schernichkin commented 5 years ago

Ah, I see.

alexandru commented 5 years ago

@SystemFw

Agreed. Also, I think I like Synchronization's implementation strategy better, seems lighter weight (RefM is basically Actor, so you have a queue and a consumer fiber).

The Synchronized sample escaped me. So 2 things:

  1. Mutexes are in general bad, a solution of last resort, because it prevents threads from making progress in case the scheduler pauses the OS thread that holds the lock; for this reason we want non-blocking, even wait-free algorithms, whenever possible
  2. This implementation itself has performance characteristics that are less than ideal β€” depending on use-case, it could use Java 9 platform intrinsics for spin-locking and it could be biased for single threaded uses

Point number 1 is important because it doesn't really matter how you implement a mutex, in terms of Ref or in terms of MVar, it's still going to represent a bottleneck.

Point number 2 is important because it shows that Ref is not all you need for building Synchronized ;-)

Just to be clear, I'm very much against deleting MVar. If nothing else, MVar is important for people like @schernichkin that want to migrate logic from Haskell. And Ref is cool, but you can barely go lower level than Ref on top of the JVM, implementations based on Ref being anything but elegant or simple.

So that said, our discussion does not address the main problem of the described function. We need to establish in Cats-Effect what we have to do for this particular pattern:

  1. do we introduce a bracket with an cancelable acquire?
  2. or do we introduce take and put versions with signatures that can work with the current bracket (what we did in Semaphore's withPermit internally)?

I have a feeling the solution is probably going to be number 2, because number 1 is a can of worms.

SystemFw commented 5 years ago

@alexandru

Mutexes are in general bad, a solution of last resort

Oh, I definitely agree with that. And in fact that Synchronisation is used only because there's no other way (that I know of) of dealing with the Kafka api in that situation (there's a ThreadId check).

Just to be clear, I'm very much against deleting MVar

As I said, I would prefer if MVar was not there, but I appreciate there's no consensus, and it's not a battle worth fighting, it can stay there :)

I have a feeling the solution is probably going to be number 2, because number 1 is a can of worms.

I agree. That being said, that Synchronisation does avoid that problem without requiring changes to the primitives, which is another point in favour of Ref + Deferred imho.

schernichkin commented 5 years ago

@alexandru Before introducing any "MVar-friendly" bracket I think your should consider a Haskellers experience of mixing of resource handling and cancelation: https://haskell-lang.org/tutorial/exception-safety Yep, it seems to be implementable (http://hackage.haskell.org/package/unliftio-0.2.10/docs/UnliftIO-Exception.html), but it defently not looks simple.

alexandru commented 5 years ago

@schernichkin thanks, that article looks useful.