Closed LukaJCB closed 5 years ago
Potentially I think the same can be done for Fiber
:)
Seems like Haskell already runs with this idea in its Alternative
instance for Concurrently
:
http://hackage.haskell.org/package/async-2.2.1/docs/Control-Concurrent-Async.html#t:Concurrently
race
depends on scheduling hence introduces indeterminism. How can we prove SemigroupK
associativity law?
I implemented SemigroupK[IO.Par]
/MonoidK[IO.Par]
and the associativity law test failed:
[info] - IO.Par.monoidK.monoidK left identity
[info] - IO.Par.monoidK.monoidK right identity
[info] - IO.Par.monoidK.semigroupK associative *** FAILED ***
[info] GeneratorDrivenPropertyCheckFailedException was thrown during property evaluation.
[info] (BaseTestsSuite.scala:44)
[info] Falsified after 1 successful property evaluations.
[info] Location: (BaseTestsSuite.scala:44)
[info] Occurred when passed generated values (
[info] arg0 = IO(1936284316),
[info] arg1 = <function1>,
[info] arg2 = IO$1181956722
[info] )
[info] Label of failing property:
[info] Expected: <function1>
[info] Received: <function1>
Should we add a lawless instance in favor of convenience (e.g. for having foldMapK
)?
I wonder why it wouldn't be associative, conceptually it seems to me like race
should definitely be associative 🤔
The choice depends on which task finishes first, and it's unpredictable and not reproducible in general. In contrast, parApplicative
guarantees the same result irregardless of the execution speed & order.
IO(1) <+> IO(2) <+> IO(3)
may result in any of 1, 2, 3. We can't say that (IO(1) <+> IO(2)) <+> IO(3) == IO(1) <+> (IO(2) <+> IO(3))
because the result of each side is affected by uncontrolled realworld randomness.
But I also find MonoidK
instance useful. We probably need a special way to summon unlawful instances e.g. from a separate package.
If only we knew (could fix) the completion time for each IO a priori and if it was constant for each re-run, then we could prove the SemigroupK
laws (with these conditions race
is indeed associative).
Don't know how to translate it into a law test.
parApplicative
guarantees the same result irregardless of the execution speed & order.
Hmmm, I don't think I agree with you there, if we consider uncontrolled realworld randomness, how can we guarantee that IO is associative at all, if the same IOs could fail from one moment to the next?
IMO laws can only really hold with strictly deterministic values.
race
will always cancel the task that's slower.
Meaning (a <+> b) <+> c)
will result in the same thing as (a <+> (b <+> c))
if a
, b
and c
take the same time when they're run. This is to me the same as saying (a <+> b) <+> c)
and (a <*> (b <*> c))
are the same if the same values fail both times they're run.
I meant something different: if IO result itself is deterministic, i.e. it's either fail or complete, then parApplicative's combinators guarantee the same result for each re-run. And we can easily test it with our frameworks (we do).
But when we consider race
it's not enough. We need to generate IOs with significantly different execution time and hope for scheduling fairness.
A simple test that does work:
(IO(sleep(k)) <+> IO(sleep(m))) <+> IO(sleep(n)) == IO(sleep(k)) <+> (IO(sleep(m)) <+> IO(sleep(n)))
where k != m != n, in 1..10 seconds
and sleep(k)
just sleeps k seconds and returns k
I made some progress.
I created an instance with combineK
as IO.race
and empty
as IO.never
I changed the generator for IO.Par
so that it produces IOs with a random and almost unique sleep (up to hash collisions). It made parallel tasks properly sorted within the TestContext
execution queue. Also cats.effect.laws.util.TestInstances#eqFuture
was modified as to pass a sufficiently long fixed time interval to tick
so that the delayed tasks get executed by ec
.
This change didn't break any existing tests for IO.Par
and helped to pass the tests for MonoidK[IO.Par]
which depend on the order of execution.
However, we still have a problem with the Alternative
semiringal laws:
Right absorption law ff <*> empty == empty
breaks if ff
is a failing IO: the lhs fails fast along with ff
while the rhs (IO.never
) keeps running.
Right distributivity law (ff <+> fg) <*> fa == (ff <*> fa) <+> (fg <*> fa)
breaks when fa
runs longer than ff
and fg
, because we encounter indeterminism again: the lhs firstly races between ff
and fg
, chooses the winner and only then applies it to fa
, while the rhs races between the applied IOs, whose execution times are equal (to fa
time), and the winner gets chosen randomly.
Another thing I thought of is how to make it smoothly work with TF encoding (as it does with Parallel/Par
for applicative instances).
I think we might have to scrap the Alternative
instance. Even if we get rid of the race condition for distributivity, the two expressions also differ in the sense that in the right case fa
is going to be run twice, which is a significant difference.
For something like Fiber
this doesn't apply, since the join
is idempotent.
@djspiewak any reason for closing this? I think it's still a feature we should be thinking about at the very least. Alternative[IO.Par]
is unlawful with how its laws are currently defined, but MonoidK[IO.Par]
and Alternative[Fiber[F, ?]]
should still be very much on the table IMO.
@LukaJCB Ah sorry! I closed it because it seemed like the discussion had run its course. Reopening.
Hey, I think I can handle this as a part of the Sustainability program. I am skeptical about IO
as much as the next guy, but an instance for Fiber
should be slick :smile:
So, I think that the most sensible thing I could come up with is:
private case object Empty extends RuntimeException("Empty")
implicit def fiberAlternative[F[_]](implicit F: Concurrent[F]): Alternative[Fiber[F, ?]] = new Alternative[Fiber[F, ?]] {
private def cancelBoth[A, B](f1: Fiber[F, A], f2: Fiber[F, B]) = F.map2(f1.cancel, f2.cancel)((_, _) => ())
//...
final override def empty[A] = Fiber(F.raiseError(Empty), F.unit)
final override def combineK[A](x: Fiber[F, A], y: Fiber[F, A]) = {
val fa = F.attempt(x.join)
val fb = F.attempt(y.join)
Fiber(
F.racePair(fa, fb).flatMap {
case Left((Left(ex), fiberB)) => F.adaptError(F.rethrow(fiberB.join)) { case Empty => ex }
case Left((Right(a), fiberB)) => F.map(fiberB.cancel)(_ => a)
case Right((fiberA, Left(ey))) => F.adaptError(F.rethrow(fiberA.join)) { case Empty => ey }
case Right((fiberA, Right(b))) => F.map(fiberA.cancel)(_ => b)
},
cancelBoth(x, y))
}
This of course fails distributivity/associativity laws of Alternative
, because of non-determinism in racePair
(so I am hoping, or else I've made a blatant mistake). Could you please advise on what to do next?
It passes the absorption laws tough ie. ff <*> empty == empty
. In simpler version
final override def empty[A] = Fiber(F.never, F.unit)
final override def combineK[A](x: Fiber[F, A], y: Fiber[F, A]) = map(Fiber(F.race(x.join, y.join), cancelBoth(x, y)))(_.merge)
for a failing ff
, ff <*> empty
is failed and empty
is not. But, of course, if we're to remain lawless anyway, the simpler version might be preferable
@marcin-rzeznicki I don't see how it fails the laws exactly. I can see why it fails the tests, which require determinism, but I think it's still lawful. Critically, associativity says that (a |+| b) |+| c
must be substitutable for a |+| (b |+| c)
, which it is when |+|
is racePair
. You can't predict the winner in any association, and the probabilities are the same with each association (or at the very least, indistinguishable).
I'm not sure exactly what to do here. Algebraically, it seems lawful (I haven't done a rigorous proof), but we have no way to prove it via the unit testing tools we have access to. We would need something that ran an arbitrary number of trials, recorded the distribution of results, and compared against the same procedure on a different expression.
Having read Oleg Kiselyov's essay on the laws of MonadPlus http://okmij.org/ftp/Computation/monads.html#monadplus I started to believe that these laws are over-specified in case of non-deterministic instances, such as this one. It'd be ideal to have a reduced set of laws for this. Currently, I think I can either make a PR with no tests of Alternative
laws, or maybe I can somehow pick only a subset?
I don't know of a way to pick a subset. Given that implementing non-deterministic law testing tools is a much larger yak shave, I'd say that PRing with no Alternative tests (or having them but commented out perhaps?) is probably better.
@djspiewak Great, I'll do that!
One hacky solution that comes to mind is imposing a consistent ordering in execution speed on IOs obtained from a generator. If that's possible (is it?), you can at least test whether an implementation that uses non-det execution model (such as race
) doesn't do something downright wrong
Oh, hmm. Actually this is, in a sense, what TestContext
does. I wonder if there's a way to exploit that in this case.
@marcin-rzeznicki I did it almost the same way last year and it worked out. Read my comments above.
I stopped at the Alternative laws. They are semiringal in cats, while Alternative is actually a near-semiring or even a less lawful class. I didn't know what to do with it and gave up.
@djspiewak Ah, so these tests fail because of this logic:
val forExecution = {
val arr = current.tasks.iterator.takeWhile(_.runsAt == firstTick).take(10).toArray
arr(Random.nextInt(arr.length))
}
All these tasks are scheduled at the same instant and are executed semi-randomly - I'll check and change if needed (UPDATE: Yeah, when you pick just the head to run, the only law that fails is the SemigroupK associativity law - I'll investigate further)
@catostrophe Right, I changed the way empty
is represented to pass absorption laws, now I am trying to figure out how to exploit test context to pass remaining tests. But I do agree, having read Oleg's essay, that semiringal laws exclude a lot of useful non-deterministic instances.
@marcin-rzeznicki Ahhhh, I didn't realize that was in there. The randomness is kind of important due to fairness though. I'm not sure what to do exactly. Oleg's essay (which is very good) spends a lot of time talking about the observable equality, which is good because that's exactly what laws are all about. We don't have a good way to observe probabalistic behavior, and yet that behavior is essential to this situation.
We almost need a better ScalaCheck. :-)
@djspiewak So I did
@@ -125,7 +125,7 @@ import scala.util.control.NonFatal
* assert(f.value == Some(Success(2))
* }}}
*/
-final class TestContext private () extends ExecutionContext { self =>
+final class TestContext private (deterministic: Boolean) extends ExecutionContext { self =>
import TestContext.{State, Task}
private[this] var stateRef = State(
@@ -305,12 +305,11 @@ final class TestContext private () extends ExecutionContext { self =>
private def extractOneTask(current: State, clock: FiniteDuration): Option[(Task, SortedSet[Task])] = {
current.tasks.headOption.filter(_.runsAt <= clock) match {
case Some(value) =>
- val firstTick = value.runsAt
- val forExecution = {
+ val forExecution = if (deterministic) value else {
+ val firstTick = value.runsAt
val arr = current.tasks.iterator.takeWhile(_.runsAt == firstTick).take(10).toArray
arr(Random.nextInt(arr.length))
}
-
val remaining = current.tasks - forExecution
Some((forExecution, remaining))
@@ -334,8 +333,8 @@ final class TestContext private () extends ExecutionContext { self =>
object TestContext {
/** Builder for [[TestContext]] instances. */
- def apply(): TestContext =
- new TestContext
+ def apply(deterministic: Boolean = false): TestContext =
+ new TestContext(deterministic)
/** Used internally by [[TestContext]], represents the internal
* state used for task scheduling and execution.
and then
@@ -36,8 +36,8 @@ class BaseTestsSuite extends AnyFunSuite with Matchers with Checkers with Discip
test(name, tags:_*)(silenceSystemErr(f(TestContext())))(pos)
}
- def checkAllAsync(name: String, f: TestContext => Laws#RuleSet): Unit = {
- val context = TestContext()
+ def checkAllAsync(name: String, f: TestContext => Laws#RuleSet, deterministic: Boolean = false): Unit = {
+ val context = TestContext(deterministic)
val ruleSet = f(context)
for ((id, prop) <- ruleSet.all.properties)
But I don't know why associativity is still failing :cry:
Oh, I get it now So, even the deterministic TestContext
schedules tasks in FIFO order (due to task ordering) - so in expressions like (a |+| b) |+| c
that race between (a |+| b)
and c
and then, race between a
and b
- c
will always win (TestContext state will look like: (race a b) (c) -> (c) (a) (b) -> c gets computed and wins) In general, the task of lowest depth in the imaginary "race tree" will always win. Concluding, expressions (a |+| b) |+| c
and a |+| (b |+| c)
will, in presence of only pure values, yield always c and a, respectively. To make the TestContext
schedule tasks in LIFO order (well, not considering scheduling etc), I simply changed the ordering SortedSet.empty[Task](if (deterministic) Task.ordering.reverse else Task.ordering)
. With this change all Alternative
laws pass. Well, this is cheating of course, but is it acceptable cheating? @djspiewak @catostrophe
Thanks guys. I think the issue can be closed now
Let's close this when we also add the MonoidK
instance for IO.Par
(or decide we don't want it)
So actually these implementations will be more or less the same. So I could quickly write the instance using Alternative
of io.start
if this is fine with you (although it might be more logical to re-write Fibers in terms of IO)
... disregard my previous comment. I didn't think it through. So, I attempted to write it similarly to how Alternative[Fiber]
is implemented. Please check out #588
Alternative
basically allows us to combine twoF
intoF[(A, B)]]
orF[Either[A, B]]
along with identities for each. This maps perfectly onto racing (whereIO.never
is the identity) and running in parallel (whereIO.unit
is the identity) An instance forIO.Par
could allow us to make use of existing abstractions using theSemigroupK
family. I.e. racing a list of IOs could be as simplelist.foldMapK(f)
.