typelevel / fs2

Compositional, streaming I/O library for Scala
https://fs2.io
Other
2.37k stars 602 forks source link

Append is not associative with finalizers #333

Closed djspiewak closed 9 years ago

djspiewak commented 9 years ago
  property("associativity of append with finalizers") = secure {
    @volatile
    var finalized = false

    val p = emit(()) ++ (emit(()) onComplete eval_(Task delay { finalized = true }))
    // val p = (emit(()) ++ emit(())) onComplete eval_(Task delay { finalized = true })

    p.kill.run.run
    finalized
  }

In its current configuration, the above test fails. If I swap the commented lines, which is to say swap the associativity on the ++ operator, the test will pass. In other words, binding a finalizer to a non-halt process within an append breaks the guarantees about finalizer evaluation.

Even more seriously, the fact that there is an observable difference between the two p variants means that ++ is not associative. As an extension of this, flatMap is also non-associative. It was previously known that flatMap was non-associative in the presence of njoin (#274). This test case shows that njoin is unnecessary to demonstrate failure of associativity with monadic joining.

Needless to say, this is an undesirable state of affairs. It makes it difficult to reason about finalizers in general, and can mean that naively extracting a process definition into a variable (thus altering the associativity) can change the semantics. Even worse, a given process definition will have different semantics depending on the associativity with which it is composed into another process!

I believe all of these issues come down to a single design flaw: append and onComplete are implemented using the same mechanism (onHalt), which is overly general. We need to split up append from finalization. Doing this correctly should resolve the associativity of ++ issue described here (and by extension, flatMap). Resolving the njoin associativity issues would require we take the further step of restricting finalizer processes to be of type Process[F, Nothing] (in fact, we probably need to do this even for ++ associativity, but I haven't thought that through).

So the tl;dr here is that the monadic laws do not hold for Process in the presence of finalizers. Or rather, they do not hold in the presence of finalizers for a "strict" definition of process equality. The above assumes we define two processes to be equal if they a) produce the same elements in order, and b) perform the same effects in order. If we drop the second requirement, then magically the monadic laws (and ++ associativity!) holds once more. However, I would argue that this weaker definition of process equality is somewhat useless, given the primary use-cases for effectful streams.

At the end of the day, we want to use Process as a way of reasonably managing effects, and it is difficult to reason about a "monad" for which the monadic laws do not always hold.

djspiewak commented 9 years ago

BTW, what is the reasoning behind using Trampoline in Await and Append? All of the interpreters are already stackless, so this seems to be just adding overhead.

djspiewak commented 9 years ago

@pchlupacek I made the changes to stepAsync on my branch (djspiewak:bug/await-cleanup). However, it looks like wye is somehow broken by this? I'm not quite sure what's going on, especially since the test in question doesn't really rely on finalizers to my knowledge. Still working on things.

Edit It's because of my change to repeatEval, which is in turn because of the change to await. Dear god, why on earth would we create a situation where those two versions of repeatEval are different?!

pchlupacek commented 9 years ago

@djspiewak I quickly looked on your branch. What exactly is broken on wye?

pchlupacek commented 9 years ago

on the Trampoline question, I think the main reason for it was so we can get stack safety on nested rcv's in await And Append is just essentially collecting these. I think without them we may have troubles in pipe, tee, wye, but perhaps not in Process[Task,O].

However, the cln may be imho w/o trampoline hence it is guarded by trampolined rcv, and other than that we don't use it in any combinators.

djspiewak commented 9 years ago

@pchlupacek All tests passing, including some that hit the cleanup logic which is implemented as you specified. Probably some other things could be altered as well, but I didn't go through and do a thorough audit.

The biggest problem is what I referenced previously: Terminated(End) is a major issue. Specifically, due to the way it interacts with repeatEval and the way repeatEval interacts with this change (by creating an infinite Await body). I think that Terminated(End) is a terrible idea anyway, and hence I filed #369 to address it directly. In the meantime, I just excluded End from the toHalt conversion and added onHalt to the appropriate places.

The good news is that porting in the test for the mergeN hang (but not the njoin fix!) shows that the double-finalizer bug is in fact fixed by this branch, as expected.

on the Trampoline question, I think the main reason for it was so we can get stack safety on nested rcv's in await

As I mentioned, all of the interpreters are stackless, so it shouldn't actually matter with rcv. It's possible that Append still needs Trampoline (especially when dealing with Process0), but I'm quite certain that Await does not.

pchlupacek commented 9 years ago

@djspiewak excellent. I just quickly went through branch, and I think we have to remove Terminated possibly alltogether. Specifically I think we don't ned to have it in places where we have full control over the code like async._ stuff.

radekm commented 9 years ago

To me the original behaviour seems correct.

val p = a ++ (b onComplete f)

means that f is cleanup for b and when you kill p process b doesn't even start - so there is no need to execute f.

The motivation for this behaviour of kill is that the process tries to terminate as soon as possible when killed. For this reason even empty ++ p isn't equivalent to p (you can use kill to see the difference).

Even more seriously, the fact that there is an observable difference between the two p variants means that ++ is not associative.

Is there some example where a ++ (b ++ c) isn't same as (a ++ b) ++ c?

pchlupacek commented 9 years ago

@radekm I think the problem lies in code like this :


await(resource)(r => doSomethingWith(r) ++ doSomethingElseWith(r) onComplete release(r)) 

At first the code seems to be correct, but hence you ++ the two processes the onComplete is attached to 2nd instance and thus it may not be executed in case the doSomethingElse won't be executed.

To me the whole think is more about reasoning about code, obviously if you add the two parentheses, the code will be correct, but this is exactly where we may have unintended behaviours//bugs. I think however the problem is that we swallow cleanup code in ++. I have been shortly trying to replace ++ code, but so far without reasonable progress, but I think there is possibility this will work. Can you please look on that? I know you have been good on that :-)

jedws commented 9 years ago

The point could maybe be best expressed as that the types lie.

If we have a Process, we expect that if we add onComplete x then we must expect the finalizer x will get run, if it is effectively non-deterministic, then onComplete is a lie – it would better be called maybeOnComplete.

So, append is thus a lie, we cannot append in a lawful manner. Therefore we are not really appending, but munging them together, and we would be better served with:

trait Process[…] {
  def munge[…](next: Process[…]): MaybeCompletableProcess[…]
}

This more honest type shows that we cannot legally append, so there is no legal SemiGroup and therefore Monoid instance. This is unfortunate, although possibly solved with some weaker type that doesn't promise completability.

On 10 May 2015 at 16:41, Pavel Chlupacek notifications@github.com wrote:

@radekm https://github.com/radekm I think the problem aries in code like this :

await(resource)(r => doSomethingWith(r) ++ doSomethingElseWith(r) onComplete release(r))

At first the code seems to be correct, but hence you ++ the two processes the onComplete is attached to 2nd instance and thus it may not be executed in case the doSomethingElse won't be executed.

To me the whole think is more about reasoning about code, obviously if you add the two parentheses, the code will be correct, but this is exactly where we may have unintended behaviours//bugs. I think however the problem is that we swallow cleanup code in ++. I have been shortly trying to replace ++ code, but so far without reasonable progress, but I think there is possibility this will work. Can you please look on that? I know you have been good on that :-)

— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/333#issuecomment-100594610 .

radekm commented 9 years ago

if it is effectively non-deterministic, then onComplete is a lie – it would better be called maybeOnComplete.

IMO it's deterministic. q onComplete fin means that fin is executed after q completes. And in the original example which has the form p ++ (q onComplete fin) process q isn't even started so it can never complete so it seems fine to not run fin.

This behaviour is similar to the behaviour of Scala programs. q onComplete fin behaves as try { q } finally { fin } and p ++ q behaves as { p; q }. Now you can see that

{
  p;
  try { q }
  finally { fin }
}

is different from

try { p; q }
finally { fin }
pchlupacek commented 9 years ago

@djspiewak any status update on your branch? Can I take it and work from it? or is it complete?

pchlupacek commented 9 years ago

@radekm I fully agree with you, and I think this was our view why we didn't completely focus on this issue early enough. However unlike the normal program that usually cannot be killed or interrupted, processes can.

imagine that doSomethingWith(r) will get interrupted with Kill. In that case cleanup will never be called. However when Kill will be received after ++, then cleanup IS called. So I think @jedws is referring to this as nonDeterminism as you can't really tell if this will happen or not :-)

radekm commented 9 years ago

To me the whole think is more about reasoning about code

@pchlupacek This is a good point and it makes sense to me (mistakes should be obvious and this unfortunately isn't).

But I still don't see is why isn't ++ associative? Do we have a, b, c such that a ++ (b ++ c) doesn't behave as (a ++ b) ++ c?

is unsound precisely because of the non-associativity of append.

@djspiewak It may really happen that the following code allocates a resource and doesn't free it?

eval(allocate) flatMap { r => p onComplete cleanup(r) }

I would expect that this works because there is no ++ involved.

radekm commented 9 years ago

We need to update the algebra. I'm pretty sure we need to separate appends and finalizers.

@djspiewak Some time ago I created an alternative instruction set (https://github.com/radekm/scalaz-stream/issues/1) but I haven't implemented it yet (there are still many unclear things - eg. time complexity of some operations). In that proposal the process is only a sequence of instructions so append is trivially associative because it's append of sequences. Cleanups are solved by finally blocks (delimited by BeginFinally and EndFinally) - these blocks aren't skipped when the process is killed. Unfortunately there is no onHalt and you cannot catch exceptions (this is by design).

pchlupacek commented 9 years ago

@radekm ++ is obviously associative, but not with the termination handler, like onComplete.

radekm commented 9 years ago

++ is obviously associative, but not with the termination handler, like onComplete.

@pchlupacek Then I don't see how the monadic laws are violated - since for some monads you can implement try-finally (eg. bracket in Haskell).

jedws commented 9 years ago

It is non-deterministic in the sense that if you have any arbitrary Process p, you cannot determine whether adding an onComplete handler will actually result in that handler being run, you must know the provenance of p and if it is a composite process, and which one of the component processes is going to get killed.

With the current behaviour we are definitely in the same position as in the world of promiscuous side-effects that is normal Scala. This hinders reasoning about the program using the types alone as it causes the types to lie – which is why we aspire to the Scalazzi Safe Subset* and hope that libraries adhere to it.

* https://dl.dropboxusercontent.com/u/7810909/talks/parametricity/4985cb8e6d8d9a24e32d98204526c8e3b9319e33/parametricity.pdf

On 10 May 2015 at 19:17, radekm notifications@github.com wrote:

if it is effectively non-deterministic, then onComplete is a lie – it would better be called maybeOnComplete.

IMO it's deterministic. q onComplete fin means that fin is executed after q completes. And in the original example which has the form p ++ (q onComplete fin) process q isn't even started so it can never complete so it seems fine to not run fin.

This behaviour is similar to the behaviour of Scala programs. q onComplete fin behaves as try { q } finally { fin } and p ++ q behaves as { p; q }. Now you can see that

{ p; try { q } finally { fin } }

is different from

try { p; q }finally { fin }

— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/333#issuecomment-100609620 .

radekm commented 9 years ago

you must know the provenance of p and if it is a composite process, and which one of the component processes is going to get killed.

If you write p onComplete fin and if p starts then fin will run after p completes - no matter what p is composed of. Yes, ++ isn't associative with onComplete but this doesn't mean that onComplete doesn't work.

Proposed behaviour is probably better than the current behaviour but I still don't see how does the current behaviour violate the monadic laws (as noted above Haskell has bracket which IMO is the same thing as current onComplete) or how does this relate to parametricity?

jedws commented 9 years ago

The question is, do you expect p onComplete fin to run fin or not? If the answer is 'maybe?' then you cannot reason well about your program.

The lack of associativity means that the Monoid laws are broken.

This of course doesn't matter if we have a type that doesn't promise onComplete handling, as it is only that bit which doesn't append (afaik).

On 11 May 2015 at 17:29, radekm notifications@github.com wrote:

you must know the provenance of p and if it is a composite process, and which one of the component processes is going to get killed.

If you write p onComplete fin and if p starts then fin will run after p completes - no matter what p is composed of. Yes, ++ isn't associative with onComplete but this doesn't mean that onComplete doesn't work.

Proposed behaviour is probably better than the current behaviour but I still don't see how does the current behaviour violate the monadic laws (as noted above Haskell has bracket which IMO is the same thing as current onComplete) or how does this relate to parametricity?

— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/333#issuecomment-100797340 .

radekm commented 9 years ago

The question is, do you expect p onComplete fin to run fin or not? If the answer is 'maybe?'

As I said the answer is yes iff p is "started" and completes. If p isn't started or doesn't complete then the answer is no. Generally you want to run finalizer only if the resource was allocated - so I may ask the same - do you expect await(resource)(r => doSomethingWith(r) onComplete release(r)) to run release(r) or not? And of course the answer depends whether await runs and whether resource completes successfully.

The lack of associativity means that the Monoid laws are broken.

IMO monoid has only single associative operation - so I don't see why should be ++ associative with onComplete (these are two different operations). Also note that ++ doesn't have left neutral element.

djspiewak commented 9 years ago

@pchlupacek The status of my branch is "basically complete, but lacking the Terminated change". I need to finish the Terminated work first, at which point the revised Await branch will be ready to land.

jedws commented 9 years ago

As I said the answer is yes iff p is "started" and completes. If p isn't started or doesn't complete then the answer is no.

Right, you do not know.

IMO monoid has only single associative operation - so I don't see why

should be ++ associative with onComplete (these are two different operations).

Read the original report again. You can change behaviour of your program by simply factoring out expressions. This is fundamentally against core notions of the value of functional programming.

Also note that ++ doesn't have left neutral element.

I do not know what you mean, if you mean that Process.++ does not, then you are simply stating the fact that ++ does not obey the Monoid laws, and it should.

On 11 May 2015 at 19:04, radekm notifications@github.com wrote:

The question is, do you expect p onComplete fin to run fin or not? If the answer is 'maybe?'

As I said the answer is yes iff p is "started" and completes. If p isn't started or doesn't complete then the answer is no. Generally you want to run finalizer only if the resource was allocated - so I may ask the same - do you expect await(resource)(r => doSomethingWith(r) onComplete release(r)) to run release(r) or not? And of course the answer depends whether await runs and whether resource completes successfully.

The lack of associativity means that the Monoid laws are broken.

IMO monoid has only single associative operation - so I don't see why should be ++ associative with onComplete (these are two different operations). Also note that ++ doesn't have left neutral element.

— Reply to this email directly or view it on GitHub https://github.com/scalaz/scalaz-stream/issues/333#issuecomment-100824463 .

pchlupacek commented 9 years ago

@djspiewak cool. The problem with terminated is repeatEval currently right?

radekm commented 9 years ago

Read the original report again. You can change behaviour of your program by simply factoring out expressions.

I don't see anything like that in the original report. The behaviour was changed by changing parentheses - it's like complaining that (1 + 2) * 3 is different from 1 + (2 * 3).

I do not know what you mean, if you mean that Process.++ does not, then you are simply stating the fact that ++ does not obey the Monoid laws, and it should.

I know it doesn't and I agree that it should. I was only saying that the problem is not with associativity of ++ but in not existence of neutral element.

radekm commented 9 years ago

Right, you do not know.

Ok. But IMO you cannot eliminate this uncertainty - so I expect that you'll be able to define onComplete with similar behaviour even after the fix (at least for processes with Task).

djspiewak commented 9 years ago

@pchlupacek No, repeatEval is easy. The problem is through.

djspiewak commented 9 years ago

@radekm I missed a comment you made earlier. eval . flatMap most certainly is not safe and cannot provide the same guarantees as await since there is no lexical bounding. Under the surface, flatMap is using ++, as in fact you would expect from a join on a sequence.

pchlupacek commented 9 years ago

@djspiewak I think we sort of mix two issues together.I am not sure if ++ associativity with cleanup is achievable, because I think @radekm arguments are valid specifically the * explanation he made.

The bigger problem I see is the issue that flatMap is not resource safe, and I think that's something we perhaps can resolve. I see this more like incorrect flatMap implementation. The reason for flatMap being like this was mainly performance issues, but perhaps we can do better and make it more low-level in case of (emit).

How about start with 2-3 flatMap examples that are not valid and try to resolve them ?

What do you think?

djspiewak commented 9 years ago

@pchlupacek I do agree that the ++ problem is unresolvable. Unfortunately, I think that flatMap is either going to be the same or we're going to end up generating a flatMap which is inconsistent with ++, and the latter alternative is even scarier. I'm willing to try though!

Here's a simple example to work from:

val p = eval(acquire) flatMap { r =>
  doThings(r) onComplete release(r)
}

val interrupt = p stepAsync { _ => () }
// wait for acquire to start
interrupt(Kill)

// release(r) will not be run at this point, whereas it is guaranteed to run if we use await

Basically, take the await properties in ProcessSpec and just rewrite them to use eval . flatMap and they will obligingly fail.

pchlupacek commented 9 years ago

ok the

val p = eval(acquire) flatMap { r =>
  doThings(r) onComplete release(r)
}
//is in fact 
val p0 = Await(acquire, {
   case -\/(c) => Halt(c)
   case \/-(a) => 
   //Emit(Seq(a)).flatMap{ r => doThings(r) onComplete release(r) }
   doThings(a) onComplete release(a)
 }) 

I don't see where ++ is involved here. So I am not sure if this fails if this is really related to append. Any thoughts?

pchlupacek commented 9 years ago

Ah, I perhaps see here what do you mean, is this asynchronous release we have resolved with additional cleanup. Hm, I think for this sort of code we have only option to use await for now. I think resource has to be guarded by await where it is acquired, and we shall not do it in flatMap for now. I see your point tho.

pchlupacek commented 9 years ago

@djspiewak after more thinking of this how about introducing

def evalAsync[F[_], O](fo:F[O], onPreempt: O => Process[F,Nothing] = _ => halt):Process[F,O] = ???

I think this will resolve the issue with flatMap as you may be able to write

evalAsync(request,onPreempt).flatMap { o => ....}

we may perhaps add this onPreempt to eval, and eval_, but, I think explicit syntax is perhaps better.

Additionally, we may have to do a bit of implicit play and reincarnate your idea with Eval but just for infix syntax


case class Eval[F[_],O](fo:F[O], onPreempt:Option[O => Process[F,Nothing]])

object Eval {
 def eval2Process[F[_],O](in:Eval[F,O]):Process[F,O] = ???

 implicit class EvalSyntax[F[_],O](val self:Eval[F,O]) extends AnyVal {
    def onPreempt(f: O => Process[F,O]):Process[F,O] = ???
  }

}
djspiewak commented 9 years ago

@pchlupacek Actually given that we now have an onPreempt case within Await in master, and we no longer attempt to identify finalizers lexically within the Await body, there may not actually be a problem here any longer. There certainly shouldn't be an observable difference between the following renderings:

eval(acquire) flatMap { r => doThings(r) onComplete release(r) }

// ...and

await(acquire) { r => doThings(r) onComplete release(r) }

Let me think about this a bit and see if I can come up with an example that breaks flatMap despite the new preemption handling.

pchlupacek commented 9 years ago

I think you are right. The only difference is that await allows you to add the onPreempt whether eval does not. Hence at least in our code base eval is used more frequently then just await (except for P1s) I think eval deserves onPreempt as the special parameter or as syntax.

djspiewak commented 9 years ago

@pchlupacek I don't mind adding it as a special parameter. It would be less clunky than await. I would recommend using first-class overloading rather than a default parameter, since in my experience it is not uncomment to do something like p flatMap Process.eval and similar, which would break if eval had a second defaulted parameter.

Another option would be to come up with a name for the preempt-assigned version of eval. Not sure what would make sense though.

djspiewak commented 9 years ago

I'm going to close this, since I'm relatively certain we've resolved it.