Add an Alternative instance for Fiber/IO.Par

LukaJCB commented 6 years ago

Alternative basically allows us to combine two F into F[(A, B)]] or F[Either[A, B]] along with identities for each. This maps perfectly onto racing (where IO.never is the identity) and running in parallel (where IO.unit is the identity) An instance for IO.Par could allow us to make use of existing abstractions using the SemigroupK family. I.e. racing a list of IOs could be as simple list.foldMapK(f).

LukaJCB commented 6 years ago

Potentially I think the same can be done for Fiber :)

LukaJCB commented 6 years ago

Seems like Haskell already runs with this idea in its Alternative instance for Concurrently: http://hackage.haskell.org/package/async-2.2.1/docs/Control-Concurrent-Async.html#t:Concurrently

catostrophe commented 5 years ago

race depends on scheduling hence introduces indeterminism. How can we prove SemigroupK associativity law?

catostrophe commented 5 years ago

I implemented SemigroupK[IO.Par]/MonoidK[IO.Par] and the associativity law test failed:

[info] - IO.Par.monoidK.monoidK left identity
[info] - IO.Par.monoidK.monoidK right identity
[info] - IO.Par.monoidK.semigroupK associative *** FAILED ***
[info]   GeneratorDrivenPropertyCheckFailedException was thrown during property evaluation.
[info]    (BaseTestsSuite.scala:44)
[info]     Falsified after 1 successful property evaluations.
[info]     Location: (BaseTestsSuite.scala:44)
[info]     Occurred when passed generated values (
[info]       arg0 = IO(1936284316),
[info]       arg1 = <function1>,
[info]       arg2 = IO$1181956722
[info]     )
[info]     Label of failing property:
[info]       Expected: <function1>
[info]   Received: <function1>

catostrophe commented 5 years ago

Should we add a lawless instance in favor of convenience (e.g. for having foldMapK)?

LukaJCB commented 5 years ago

I wonder why it wouldn't be associative, conceptually it seems to me like race should definitely be associative 🤔

catostrophe commented 5 years ago

The choice depends on which task finishes first, and it's unpredictable and not reproducible in general. In contrast, parApplicative guarantees the same result irregardless of the execution speed & order.

catostrophe commented 5 years ago

IO(1) <+> IO(2) <+> IO(3) may result in any of 1, 2, 3. We can't say that (IO(1) <+> IO(2)) <+> IO(3) == IO(1) <+> (IO(2) <+> IO(3)) because the result of each side is affected by uncontrolled realworld randomness.

catostrophe commented 5 years ago

But I also find MonoidK instance useful. We probably need a special way to summon unlawful instances e.g. from a separate package.

catostrophe commented 5 years ago

If only we knew (could fix) the completion time for each IO a priori and if it was constant for each re-run, then we could prove the SemigroupK laws (with these conditions race is indeed associative).

Don't know how to translate it into a law test.

LukaJCB commented 5 years ago

parApplicative guarantees the same result irregardless of the execution speed & order.

Hmmm, I don't think I agree with you there, if we consider uncontrolled realworld randomness, how can we guarantee that IO is associative at all, if the same IOs could fail from one moment to the next? IMO laws can only really hold with strictly deterministic values. race will always cancel the task that's slower. Meaning (a <+> b) <+> c) will result in the same thing as (a <+> (b <+> c)) if a, b and c take the same time when they're run. This is to me the same as saying (a <+> b) <+> c) and (a <*> (b <*> c)) are the same if the same values fail both times they're run.

catostrophe commented 5 years ago

I meant something different: if IO result itself is deterministic, i.e. it's either fail or complete, then parApplicative's combinators guarantee the same result for each re-run. And we can easily test it with our frameworks (we do).

But when we consider race it's not enough. We need to generate IOs with significantly different execution time and hope for scheduling fairness.

catostrophe commented 5 years ago

A simple test that does work:

(IO(sleep(k)) <+> IO(sleep(m))) <+> IO(sleep(n)) == IO(sleep(k)) <+> (IO(sleep(m)) <+> IO(sleep(n)))

where k != m != n, in 1..10 seconds and sleep(k) just sleeps k seconds and returns k

catostrophe commented 5 years ago

I made some progress.

I created an instance with combineK as IO.race and empty as IO.never
I changed the generator for IO.Par so that it produces IOs with a random and almost unique sleep (up to hash collisions). It made parallel tasks properly sorted within the TestContext execution queue. Also cats.effect.laws.util.TestInstances#eqFuture was modified as to pass a sufficiently long fixed time interval to tick so that the delayed tasks get executed by ec.

This change didn't break any existing tests for IO.Par and helped to pass the tests for MonoidK[IO.Par] which depend on the order of execution.

However, we still have a problem with the Alternative semiringal laws:

Right absorption law ff <*> empty == empty breaks if ff is a failing IO: the lhs fails fast along with ff while the rhs (IO.never) keeps running.
Right distributivity law (ff <+> fg) <*> fa == (ff <*> fa) <+> (fg <*> fa) breaks when fa runs longer than ff and fg, because we encounter indeterminism again: the lhs firstly races between ff and fg, chooses the winner and only then applies it to fa, while the rhs races between the applied IOs, whose execution times are equal (to fa time), and the winner gets chosen randomly.

catostrophe commented 5 years ago

Another thing I thought of is how to make it smoothly work with TF encoding (as it does with Parallel/Par for applicative instances).

LukaJCB commented 5 years ago

I think we might have to scrap the Alternative instance. Even if we get rid of the race condition for distributivity, the two expressions also differ in the sense that in the right case fa is going to be run twice, which is a significant difference.

For something like Fiber this doesn't apply, since the join is idempotent.

LukaJCB commented 5 years ago

@djspiewak any reason for closing this? I think it's still a feature we should be thinking about at the very least. Alternative[IO.Par] is unlawful with how its laws are currently defined, but MonoidK[IO.Par] and Alternative[Fiber[F, ?]] should still be very much on the table IMO.

djspiewak commented 5 years ago

@LukaJCB Ah sorry! I closed it because it seemed like the discussion had run its course. Reopening.

marcin-rzeznicki commented 5 years ago

Hey, I think I can handle this as a part of the Sustainability program. I am skeptical about IO as much as the next guy, but an instance for Fiber should be slick :smile:

marcin-rzeznicki commented 5 years ago

So, I think that the most sensible thing I could come up with is:

  private case object Empty extends RuntimeException("Empty")

  implicit def fiberAlternative[F[_]](implicit F: Concurrent[F]): Alternative[Fiber[F, ?]] = new Alternative[Fiber[F, ?]] {
    private def cancelBoth[A,  B](f1: Fiber[F, A], f2: Fiber[F, B]) = F.map2(f1.cancel, f2.cancel)((_, _) => ())
//...
    final override def empty[A] = Fiber(F.raiseError(Empty), F.unit)
    final override def combineK[A](x: Fiber[F, A], y: Fiber[F, A]) = {
      val fa = F.attempt(x.join)
      val fb = F.attempt(y.join)
      Fiber(
        F.racePair(fa, fb).flatMap {
          case Left((Left(ex), fiberB))  => F.adaptError(F.rethrow(fiberB.join)) { case Empty => ex }
          case Left((Right(a), fiberB))  => F.map(fiberB.cancel)(_ => a)
          case Right((fiberA, Left(ey))) => F.adaptError(F.rethrow(fiberA.join)) { case Empty => ey }
          case Right((fiberA, Right(b))) => F.map(fiberA.cancel)(_ => b)
        },
        cancelBoth(x, y))
    }

This of course fails distributivity/associativity laws of Alternative, because of non-determinism in racePair (so I am hoping, or else I've made a blatant mistake). Could you please advise on what to do next?

marcin-rzeznicki commented 5 years ago

It passes the absorption laws tough ie. ff <*> empty == empty. In simpler version

    final override def empty[A] = Fiber(F.never, F.unit)
    final override def combineK[A](x: Fiber[F, A], y: Fiber[F, A]) = map(Fiber(F.race(x.join, y.join), cancelBoth(x, y)))(_.merge)

for a failing ff , ff <*> empty is failed and empty is not. But, of course, if we're to remain lawless anyway, the simpler version might be preferable

djspiewak commented 5 years ago

@marcin-rzeznicki I don't see how it fails the laws exactly. I can see why it fails the tests, which require determinism, but I think it's still lawful. Critically, associativity says that (a |+| b) |+| c must be substitutable for a |+| (b |+| c), which it is when |+| is racePair. You can't predict the winner in any association, and the probabilities are the same with each association (or at the very least, indistinguishable).

I'm not sure exactly what to do here. Algebraically, it seems lawful (I haven't done a rigorous proof), but we have no way to prove it via the unit testing tools we have access to. We would need something that ran an arbitrary number of trials, recorded the distribution of results, and compared against the same procedure on a different expression.

marcin-rzeznicki commented 5 years ago

Having read Oleg Kiselyov's essay on the laws of MonadPlus http://okmij.org/ftp/Computation/monads.html#monadplus I started to believe that these laws are over-specified in case of non-deterministic instances, such as this one. It'd be ideal to have a reduced set of laws for this. Currently, I think I can either make a PR with no tests of Alternative laws, or maybe I can somehow pick only a subset?

djspiewak commented 5 years ago

I don't know of a way to pick a subset. Given that implementing non-deterministic law testing tools is a much larger yak shave, I'd say that PRing with no Alternative tests (or having them but commented out perhaps?) is probably better.

marcin-rzeznicki commented 5 years ago

@djspiewak Great, I'll do that! One hacky solution that comes to mind is imposing a consistent ordering in execution speed on IOs obtained from a generator. If that's possible (is it?), you can at least test whether an implementation that uses non-det execution model (such as race) doesn't do something downright wrong

djspiewak commented 5 years ago

Oh, hmm. Actually this is, in a sense, what TestContext does. I wonder if there's a way to exploit that in this case.

catostrophe commented 5 years ago

@marcin-rzeznicki I did it almost the same way last year and it worked out. Read my comments above.

I stopped at the Alternative laws. They are semiringal in cats, while Alternative is actually a near-semiring or even a less lawful class. I didn't know what to do with it and gave up.

marcin-rzeznicki commented 5 years ago

@djspiewak Ah, so these tests fail because of this logic:

        val forExecution = {
          val arr = current.tasks.iterator.takeWhile(_.runsAt == firstTick).take(10).toArray
          arr(Random.nextInt(arr.length))
        }

All these tasks are scheduled at the same instant and are executed semi-randomly - I'll check and change if needed (UPDATE: Yeah, when you pick just the head to run, the only law that fails is the SemigroupK associativity law - I'll investigate further)

@catostrophe Right, I changed the way empty is represented to pass absorption laws, now I am trying to figure out how to exploit test context to pass remaining tests. But I do agree, having read Oleg's essay, that semiringal laws exclude a lot of useful non-deterministic instances.

djspiewak commented 5 years ago

@marcin-rzeznicki Ahhhh, I didn't realize that was in there. The randomness is kind of important due to fairness though. I'm not sure what to do exactly. Oleg's essay (which is very good) spends a lot of time talking about the observable equality, which is good because that's exactly what laws are all about. We don't have a good way to observe probabalistic behavior, and yet that behavior is essential to this situation.

We almost need a better ScalaCheck. :-)

marcin-rzeznicki commented 5 years ago

@djspiewak So I did

@@ -125,7 +125,7 @@ import scala.util.control.NonFatal
  *   assert(f.value == Some(Success(2))
  * }}}
  */
-final class TestContext private () extends ExecutionContext { self =>
+final class TestContext private (deterministic: Boolean) extends ExecutionContext { self =>
   import TestContext.{State, Task}

   private[this] var stateRef = State(
@@ -305,12 +305,11 @@ final class TestContext private () extends ExecutionContext { self =>
   private def extractOneTask(current: State, clock: FiniteDuration): Option[(Task, SortedSet[Task])] = {
     current.tasks.headOption.filter(_.runsAt <= clock) match {
       case Some(value) =>
-        val firstTick = value.runsAt
-        val forExecution = {
+        val forExecution = if (deterministic) value else  {
+          val firstTick = value.runsAt
           val arr = current.tasks.iterator.takeWhile(_.runsAt == firstTick).take(10).toArray
           arr(Random.nextInt(arr.length))
         }
-
         val remaining = current.tasks - forExecution
         Some((forExecution, remaining))

@@ -334,8 +333,8 @@ final class TestContext private () extends ExecutionContext { self =>

 object TestContext {
   /** Builder for [[TestContext]] instances. */
-  def apply(): TestContext =
-    new TestContext
+  def apply(deterministic: Boolean = false): TestContext =
+    new TestContext(deterministic)

   /** Used internally by [[TestContext]], represents the internal
    * state used for task scheduling and execution.

and then

@@ -36,8 +36,8 @@ class BaseTestsSuite extends AnyFunSuite with Matchers with Checkers with Discip
     test(name, tags:_*)(silenceSystemErr(f(TestContext())))(pos)
   }

-  def checkAllAsync(name: String, f: TestContext => Laws#RuleSet): Unit = {
-    val context = TestContext()
+  def checkAllAsync(name: String, f: TestContext => Laws#RuleSet, deterministic: Boolean = false): Unit = {
+    val context = TestContext(deterministic)
     val ruleSet = f(context)

     for ((id, prop) <- ruleSet.all.properties)

But I don't know why associativity is still failing :cry:

marcin-rzeznicki commented 5 years ago

Oh, I get it now So, even the deterministic TestContext schedules tasks in FIFO order (due to task ordering) - so in expressions like (a |+| b) |+| c that race between (a |+| b) and c and then, race between a and b - c will always win (TestContext state will look like: (race a b) (c) -> (c) (a) (b) -> c gets computed and wins) In general, the task of lowest depth in the imaginary "race tree" will always win. Concluding, expressions (a |+| b) |+| c and a |+| (b |+| c) will, in presence of only pure values, yield always c and a, respectively. To make the TestContext schedule tasks in LIFO order (well, not considering scheduling etc), I simply changed the ordering SortedSet.empty[Task](if (deterministic) Task.ordering.reverse else Task.ordering). With this change all Alternative laws pass. Well, this is cheating of course, but is it acceptable cheating? @djspiewak @catostrophe

marcin-rzeznicki commented 5 years ago

Thanks guys. I think the issue can be closed now

LukaJCB commented 5 years ago

Let's close this when we also add the MonoidK instance for IO.Par (or decide we don't want it)

marcin-rzeznicki commented 5 years ago

So actually these implementations will be more or less the same. So I could quickly write the instance using Alternative of io.start if this is fine with you (although it might be more logical to re-write Fibers in terms of IO)

marcin-rzeznicki commented 5 years ago

... disregard my previous comment. I didn't think it through. So, I attempted to write it similarly to how Alternative[Fiber] is implemented. Please check out #588

typelevel / cats-effect

Add an Alternative instance for Fiber/IO.Par #405