Clean up suspend binding implementation

Munzey commented 3 years ago

Also add more real-world test cases and improve its related readme section.

Turns out that with the current implementation, if you launch coroutines inside the binding block and call bind inside one of those launched child coroutines then when the scope is cancelled these children coroutines aren't getting cancelled (they were basically being launched in the parent scope of binding so it had no effect).

Correct solution is to make the lambda block of binding have a receiver of CoroutineScope. In this case that meant making SuspendableResultBinding implement CoroutineScope.

This way we can launch the binding block inside an async coroutine where its context is used for SuspendableResultBinding (as its now also a CoroutineScope), so that when we cancel SuspendableResultBinding, we cancel the async coroutine and all its children.

Also discovered that suspendCancellableCoroutine makes for a handy way to get around bind needing to throw in the error case for no reason. Think its a lot cleaner now.

ragnese commented 3 years ago

Coroutines scare and confuse me, so take this with a grain of salt.

But, I think the only change needed for the current implementation to pass the two tests added by @Munzey is to make SuspendableResultBinding implement CoroutineScope so that the launchs in his test are attached to the correct scope.

After that, I think the tests pass and the issue in the topic is fixed, isn't it?

michaelbull commented 3 years ago

It definitely makes sense for SuspendableResultBinding to implement CoroutineScope, based on the docs of what CoroutineScope state and my understanding of it. That avoid having to make the caller wrap the binding in coroutineScope { } themselves. It's the other stuff I'm not sure of, especially launching new coroutines with async and launch when we're already in the context of a running coroutine. If we can just make the change for the CoroutineScope implementation that seems reasonable so far.

ragnese commented 3 years ago

Like I said- I'm not an expert by any means. I actually found your library because I was implementing my own for-comprehension syntax in a private work project and my implementation ended up looking very similar to yours!

But I agree- I don't see a reason to call async or any of that stuff.

Munzey commented 3 years ago

hmm @ragnese can you provide a snippet of what you are suggesting? for example:

return try {
        receiver = SuspendableResultBindingImpl(coroutineContext)
        with(receiver) { Ok(block()) }
    } catch (ex: BindCancellationException) {
        receiver.internalError
    }

will fail several of the tests.

likewise with

return try {
        receiver = SuspendableResultBindingImpl(coroutineContext + Job())
        with(receiver) { Ok(block()) }
    } catch (ex: BindCancellationException) {
        receiver.internalError
    }

which if you read the java doc for coroutineScope this is pretty much doing the same thing.

As I mentioned, if you rely on the parent scope and you only launch inside the binding block, you aren't guaranteeing that you will wait for all of the children in the block to complete. There is the other issue that the only way to catch a cancellation exception seems to be from an async builder (otherwise I think it gets propagated up to a global coroutine exception handler).

I think the issue here is we're bumbing into the fact that Roman designed structured concurrency with the intention that the end user of the api is purposefully starting corutines with async inside of a coroutineScope block that they define.

Here we are trying to provide a way to pass a lambda where we don't know if the user will just launch inside of the block. Does that make sense? We are trying to force structured concurrency because we want a binding block to await until we get a result or its cancelled. even if a user just calls launch.

I would be happy to remove the extra boilerplate if you can propose a code block that passes all the tests in their current form. Really appreciate the input on this. But right now im convinced the inner async is needed. Happy to be proven otherwise!

@michaelbull my question around the atomic reference would still stand regardless here

ragnese commented 3 years ago

I'm suggesting to leave the binding function exactly as it's written in the current master branch.

The only change you need (I think) to pass the two tests you added is to have SuspendableResultBinding interface inherit CororutineScope. Then the launch calls in your second test will be automatically joined to the correct scope (the scope of the Binding block).

This is the code I am suggesting:

public interface SuspendableResultBinding<E>: CoroutineScope { // < -- change here
    public suspend fun <V> Result<V, E>.bind(): V
}

@PublishedApi
internal class SuspendableResultBindingImpl<E> : SuspendableResultBinding<E> {

    private val mutex = Mutex()
    lateinit var internalError: Err<E>
    var coroutineScope: CoroutineScope? = null

    override suspend fun <V> Result<V, E>.bind(): V {
        return when (this) {
            is Ok -> value
            is Err -> {
                mutex.withLock {
                    if (::internalError.isInitialized.not()) {
                        internalError = this
                    }
                }
                coroutineScope?.cancel(BindCancellationException)
                throw BindCancellationException
            }
        }
    }

    override val coroutineContext: CoroutineContext // <-- change here to impl CoroutineScope
      get() = coroutineScope?.coroutineContext ?: error("Used as a CoroutineScope before being initialized")
}

ragnese commented 3 years ago

Also, regarding your point about propagating the cancellation exception, I agree that it's all very confusing. And frankly, I don't really understand how it works. I thought that putting the try {} catch {} outside of the coroutineScope {} was incorrect when I first tried to write my own version of this same mechanism. I thought that once you caught and completed the scope, that it wouldn't keep throwing up the chain, but I was wrong.

I also thought that you should use supervisorScope here instead of coroutineScope because the supervisor scope does not fail its parent when it fails. But in my tests, neither does the regular coroutine scope... So I really don't know.

Munzey commented 3 years ago

I thought that putting the try {} catch {} outside of the coroutineScope {} was incorrect when I first tried to write my own version of this same mechanism.

I hadn't tried wrapping the try catch outside! Thanks @ragnese for the suggestions. I don't really like the accessing of the coroutine scope with get so I kept how I was setting the coroutine context.

So I've reverted the suspendCancellableCoroutine part. I still think that part of the solution would be cleaner using that function with an atomic reference for the internal error; the fact that right now you have to cancel and throw the BindCancellation exception in the err case seems like a hack, but agree that launching a new coroutine for each failing bind that is racing is overkill.

michaelbull commented 3 years ago

So if, rightfully so, SuspendableResultBinding implements CoroutineScope, surely we should be able to remove our explicit call to coroutineScope?

Munzey commented 3 years ago

hmm tried there and I don't think its possible - without it the first launch to complete is what binding returns. I believe that we need the coroutines started in the block to run inside a separate Job, which is what coroutineScope provides. Doing this is not enough:

//        coroutineScope {
            receiver = SuspendableResultBindingImpl(coroutineContext + Job())
            with(receiver) { Ok(block()) }
//        }

There must be some magic to suspendCoroutineUninterceptedOrReturn that coroutineScope calls. i.e it makes sure that all the children complete or are cancelled.

ragnese commented 3 years ago

I still think that part of the solution would be cleaner using that function with an atomic reference for the internal error; the fact that right now you have to cancel and throw the BindCancellation exception in the err case seems like a hack, but agree that launching a new coroutine for each failing bind that is racing is overkill.

I don't see a way around cancelling and throwing. I tried a bunch of different things. We need that method to not return (return Nothing). Calling cancel on the coroutineScope cancels the "root" scope of this function, which will eventually cancel all children, but we also need to cancel this child right now by throwing. The only other thing I could think of is to call coroutineScope.cancel and then suspend. You'd then have to have a way to resume the function from outside of it, after the cancellation has taken effect. I'm not really sure how this could work without launching more independent tasks and creating more overhead. Just for note: I don't think you have to use BindCancellation in the call to cancel- just any CancellationException. It's the thrown one that has to be BindCancellation. ... Or maybe it was vice versa. In either case, it's better that they're both BindCancellation because it's an object so it doesn't allocate or build another stack trace, but it's not actually required for the correct functionality.

@michaelbull As @Munzey said, you do still need the call to coroutineScope{}. This creates a NEW scope in the hierarchy that has a new Job that is different from the environment where binding is called. You need to treat this new scope as the "root" of this function so you can call cancel in bind and only fail what happens in this function- not the entire scope that binding is called in.

Making SuspendableResultBinding implement CoroutineScope doesn't actually do anything, it just makes the launch calls inside the block use the correct scope for this. If you took away the call to coroutineScope{}, then implementing CoroutineScope wouldn't actually do anything at all since it would just be pointing to the outer scope in which the function is called. The whole point is that implementing the interface "tricks" the launch calls into spawning off of your new scope.

Munzey commented 3 years ago

I don't see a way around cancelling and throwing.

In my original solution in this pr, I used suspendCancellableCoroutine which means bind actually suspends the coroutine in which it is called, and all that was needed was to either call continuation.resume(value) or simply cancel the parent scope, no throw needed. That is what Im referrign to when I said "cleaner using that function with an atomic reference for the internal error". "That function" being suspendCancellableCoroutine.

ragnese commented 3 years ago

I think I tried something similar in my explorations. My guess is that your solution worked with the extra calls to async{} or whatever? That was my issue- that's extra runtime overhead. If you don't spawn an extra coroutine to do the resuming, the code can hang as the coroutine stays suspended forever.

So I chose the slightly less elegant approach of throwing because it didn't require as much overhead. That was my understanding, at least. I could very well be wrong.

Munzey commented 3 years ago

My guess is that your solution worked with the extra calls to async{} or whatever?

nope it didnt need async, check the first commit I added, f0005ae The only overhead is not having a simple way to atomically set the internal error

If you don't spawn an extra coroutine to do the resuming, the code can hang as the coroutine stays suspended forever.

pretty sure while a coroutine is suspended in this state, that means its at a cancellation point, so it does get cancelled as soon as the parent scope is cancelled.

ragnese commented 3 years ago

Your code in that commit has this:

private fun setError(error: Err<E>) {
        this.launch {
            mutex.withLock {
                if (::internalError.isInitialized.not()) {
                    internalError = error
                    this@SuspendableResultBindingImpl.cancel(BindCancellationException)
                }
            }
        }
    }

That launch is an extra coroutine. It's the "only overhead" as you said, but that's exactly why I said I think it's unavoidable to use throw and cancel. I think there's no other way without extra overhead.

Munzey commented 3 years ago

That launch is an extra coroutine.

yep please read again what i wrote above:

The only overhead is not having a simple way to atomically set the internal error

If we didnt care about racing threads that code works without launching a new coroutine. In fact its entirely doable if we imported atomicfu to not launch any additional coroutines

ragnese commented 3 years ago

Fair enough. I'm not familiar with atomicfu. How does it work? Does it just not suspend?

We could also use a ReadWriteLock from the Java side of things, which is actually better than a symmetric Mutex anyway (reads should be cheap/free).

Munzey commented 3 years ago

We could also use a ReadWriteLock from the Java side of things, which is actually better than a symmetric Mutex anyway (reads should be cheap/free).

yep thats what I wanted to do, but that won't work in common code. Atomicfu is a kotlin org lib written by Roman and co that basically provides a readwritelock for common code, so you're on the money there 😄

ragnese commented 3 years ago

Ah. Right- I forgot this project isn't JVM only. :)

michaelbull commented 3 years ago

I guess given atomicfu is an official Kotlin project we could consider adopting it as an implementation-only dependency. Would it reasonably clean things up for us, or at least make this more readible? I feel we are suffering right now with readability and everyone is getting confused each time we read over this bit of code, and if atomicfu helps alleviate that it might be worth adopting.

Edit: I've looked at AtomicFU and it seems quite a large thing to adopt, given it requires us to use a gradle plugin and isn't a simple drop-in library. I'd like to try and avoid it if we can.

ragnese commented 3 years ago

I honestly don't know how to make it less confusing. Coroutines are just confusing. There's just a lot of stuff we have to know/understand about how they work and the type system doesn't help at all in that regard. The only thing you can do, IMO, is write a big doc block explaining that a new coroutine scope is created so that all coroutines launched in the block will inherit from it by default. Then explain how the first sub-task to be canceled will trigger the "root" to be canceled as well, which will propagate to any other children in the block.

Munzey commented 3 years ago

Started looking at adding atomicfu over the weekend. The code certainly looked cleaner (no more mutex block, no more throwing) and the ide was happy with the dependency in the common code, but when I went to run the tests It fell over as it couldnt find the dep for java or js 😅

So will need to do some more exploring of how exactly to set this lib up - it mentions for common code you can just use the dependency but then it also mentions how you can use it as a gradle plugin (written in the old "apply" style so makes it hard to experiment with the new kotlin gradle dsl).

Will report back if I get it working.

I agree @ragnese that coroutines is clearly not as straight forward to work with as is preached once you try do anything beyond the shiny examples they give on the site. But here i think we are just trying to have doc that makes it clear for users how to use the function, and that the code itself is as straight forward as possible for any developers to follow. I would hope this pr will itself serve as pretty good documentation if someone were to want to come back to see what the design decisions were around it.

Munzey commented 3 years ago

@michaelbull sorry just saw your comment edit. Yep perhaps in the future it will be simpler to adopt.

Maybe the logic in its current state for this pr is enough then and we can just have an open issue to track ways to make it cleaner?

Munzey commented 3 years ago

ok I got it working there - trick was to apply it as the plugin only.

rather than pushing to this branch, heres the commit with the changes https://github.com/Munzey/kotlin-result/commit/be8e9cd1ebf962f09af65576c45103ea0ae6cfc6 if that seems no better we can just leave as is, otherwise I can push to here.

michaelbull commented 3 years ago

Yeah not much better to be honest, but a really useful comparison. Really appreciate everyone's insight in this discussion though.

Munzey commented 3 years ago

ran a couple benchmarks comparing the two implementations (mutex vs atomic); couldn't really discern any massive difference (between the different runs they averaged out around the same ops/ms throughput), so unfortunately can't say that helps the decision 😄

michaelbull / kotlin-result

Clean up suspend binding implementation #46