louthy / language-ext

C# functional language extensions - a base class library for functional programming

MIT License

6.48k stars 417 forks source link

Deadlock from calling Task.Result #317

Closed TysonMN closed 6 years ago

TysonMN commented 6 years ago

Example Code

Here is a test that exhibits a deadlock.

[Fact]
public void MTaskFold_InAsyncContextWithTaskWaitingForActivation_Halts() =>
  AsyncContext.Run(() => MTaskFold_WithTaskWaitingForActivation_Halts());

private static void MTaskFold_WithTaskWaitingForActivation_Halts() {
  var intTask = TimeSpan
    .FromMilliseconds(100)
    .Apply(Task.Delay)
    .ContinueWith(_ => 0);

  var actual = default(MTask<int>)
    .Fold(intTask, 0, (x, y) => 0)
    (Unit.Default);

  // execution terminates by reaching here
}

The type AsyncContext is from the NuGet package Nito.AsyncEx.

Expected Behavior

As stated in the test, the expected behavior is that the test terminates.

Actual Behavior

The test does not terminate. Here is the source code under test.

[Pure]
public Func<Unit, S> Fold<S>(Task<A> ma, S state, Func<S, A, S> f) => _ =>
{
  if (ma.IsFaulted) return state;
  return f(state, ma.Result);
};

The test does not terminate because of a deadlock that is caused when calling ma.Result.

Possible Fix

Stephen Cleary, an expert on this topic and author of the aforementioned NuGet package Nito.AsyncEx, explains why this deadlock occurs. To summarize, my understanding is that it is a best practice to not call Task.Result unless Task.Status == TaskStatus.RanToCompletion. This is not true in the example that I have provided. Instead it is in the state TaskStatus.WaitingForActivation.

Otherwise, the best practice is to await the task after calling Task.ConfigureAwait(false). I don't see how you can do this without changing the return type of Fold. If I may adjust the return type slightly from Func<Unit, S> to Func<Unit, Task<S>>, then I would implement Stephen's best practice suggestion like this.

[Pure]
public Func<Unit, Task<S>> Fold<S>(Task<A> ma, S state, Func<S, A, S> f) => async _ =>
{
    if (ma.IsFaulted) return state;
    return f(state, await ma.ConfigureAwait(false));
};

This new return type seems more natural to me. My understanding of monads is that you are not allowed to just extract the underlying value of the monad. Unfortunately, Fold is doing just that when it calls Task.Result. However, I have only recently started using monads and your library, so I know that I still have a great deal to learn and that it is very possible that I am misunderstanding something here.

Thank you for considering my bug report. I am interested to hear your thoughts about it.

TysonMN commented 6 years ago

Reducing to `Match`

We can simply the problem slightly by defining Fold<> in terms of Match<>.

[Pure]
public Func<Unit, S> Fold<S>(Task<A> ma, S state, Func<S, A, S> f) => _ =>
    default(MTask<A>).Match(ma, a => f(state, a), () => state);

This version of Fold seems bug free to me while Match<> has the same issues as the current version of Fold. Maybe both of these methods behave as intended, and I should be using FoldAsync instead.

Considering `FoldAsync`

I can imagine a potential reply saying to use FoldAsync instead of Fold. It is true that the test passes when using FoldAsync. However, I created the above test while trying to simplify this one.

[Fact]
public void SequenceOnTaskSeq_InAsyncContexAndWithSingletonSeq_Terminates() =>
  AsyncContext.Run(() => SequenceOnTaskSeq_WithSingletonSeq_Terminates());

private static void SequenceOnTaskSeq_WithSingletonSeq_Terminates() {
  var source = 0
    .AsTask()
    .Apply(SeqOne)
    .Sequence();

  source.Sequence();

  // execution terminates by reaching here
}

This test eventually calls Fold in a manner similar to the test that I originally gave. I don't see any method like SequenceAsync that would eventually call FoldAsync instead of Fold. Maybe this is the real bug?...That the code in this test should eventually call FoldAsync instead of Fold?

Task Status Canceled

On a related note, this test passes by throwing a TaskCanceledException. Is that the expected behavior?

[Fact]
public async Task MTaskFoldAsync_CancelledTask_ThrowTaskCanceledException() {
  var cts = new CancellationTokenSource();
  cts.Cancel();
  var intTask = Task.Run(() => 0, cts.Token);

  var f = default(MTask<int>)
    .FoldAsync(intTask, 0, (x, y) => 0);

  await Assert.ThrowsAsync<TaskCanceledException>(() => f(Unit.Default));
}

faeriedust commented 6 years ago

I don't think I would call it a bug so much as just bad practice to call that overload of Fold. You're asynchronously getting values in a synchronous method, so a deadlock seems pretty natural to occur there, unless I'm missing something.

The fact that calling Sequence twice on a Seq<Task<int>> deadlocks does seem like a bug however.

I went ahead and made a repo that demonstrates this issue: https://github.com/faeriedust/languageExtTests

louthy commented 6 years ago

You're both right for different reasons. Yes, the quick fix would be to asyncify the code. But what I've actually started doing is looking into breaking Monad apart into Monad and MonadAsync. The attempt to unify the two has lead to a slightly messy interface and compromises like this implementation of Fold for MTask; it should be FoldAsync that returns a Task.

I have a busy few days coming up, but hopefully I should have something by the end of the week.

faeriedust commented 6 years ago

Yeah, I tried to fix the bug in the repo and immediately ran into many issues...

I think breaking Monad into Monad and MonadAsync does seem likely the much better approach.

I'm looking forward to this. :)

louthy commented 6 years ago

@faeriedust @bender2k14 Just a heads up on the progress of this. The work is being done on the async-refactor branch, it's becoming quite a large refactor that will almost certainly lead to a significant improvement for the async story.

Here's the first commit to give you an idea of the scope of the changes: https://github.com/louthy/language-ext/commit/96e83efbbd398575b5b5cab1f763c7b6b1edbe88

The unfortunate knock-on effect is with the transformer types (nested monads). Synchronous and asynchronous monads are not directly compatible with each other now, and so I've split the HKT t4 templates to generate the extension methods for async-async pairs and sync-sync pairs.

The main issue with that is something like Lst<TryAsync<A>> won't get the transformer extension methods. This will definitely cause problems for users of the library, so I'm just looking into ways to still pair up sync and async types where it makes sense. It should be a solvable problem, just needs a bit more thinking time.

louthy commented 6 years ago

@bender2k14 @faeriedust The latest release deals with the issue above in that default(MTask<int>).Fold does not exist any more and won't exist for any asynchronous type. Calling default(MTask<int>).FoldAsync exhibits the behaviour you'd expect.

TysonMN commented 6 years ago

I see that you have removed all the calls to properties of Task<>, such as Result and IsFaulted. That is an extremely good improvement just by itself.

I have two questions about he new design.

First Question

On the interface FunctorAsync<FA, FB, A, B>, there are two methods with signatures

FB MapAsync(FA ma, Func<A, B> f) and
FB MapAsync(FA ma, Func<A, Task> f).

Can you explain the reasoning for calling them both MapAsync? In your alpha release notes, you say

To reduce the problems of method resolution a large number of methods have been renamed to have Async suffixes...

Is this reasoning used on these method names as well? Instead of calling them both MapAsync, I was wondering if there would be any problems calling them both Map or calling them Map and MapAsync respectively. Do you know if either of these other choices allows for method resolution problems?

Second Question

If there is no technical limitation to calling the first method above just Map, then the possibility exists to having the interface FunctorAsync<FA, FB, A, B> extend the interface Functor<FA, FB, A, B>.

Would this inheritance be good or bad?

louthy commented 6 years ago

@bender2k14 Actually that's an oversight. I started by making all async interfaces have ...Async methods. But then after hitting some issues where the C# compiler just can't work out the difference between an A and a Task<A> I started using ...Async just for higher-order functions that have a asynchronous Funcs or return a Task<A>.

I've been so knee deep in the transformer system that I will probably need to review all the naming before pushing it to beta status. Initially I wanted to avoid issues with a type implementing both the sync and async interfaces for say Monad or Functor.... the requirements slightly changed as I was going along.

I think the final policy will be:

If the method returns a Task then append Async. That allows for say a synchronous int Sum() and an asynchronous Task<int> SumAsync() for example.
If the method is a higher-order method and the delegate argument returns a Task then Async is appended.

Possible policy:

Use a naming convention for delegate arguments, so devs could specify the name of the argument they're providing, if it's unique then it will trump the type system (something like Sync or Async). i.e. value.Map(Sync: x => x + 1);. This would help for methods like BiMap where there are four possible variants. Something like: value.BiMap(LeftSync: x => x + 1, RightAsync: y => y + 2);

This will be a be consistency issue throughout the code-base I expect (from pre-existing async functionality). So, it will need some time to iron all of that out. It's a good time to get this stuff done and out of the way (because of the other breaking changes).

On your second point - you could be right, I don't have the code in front of me at the moment (about to hit the hay after looking at a screen for too long). If they unify then that would certainly be a benefit.

TysonMN commented 6 years ago

If the method returns a Task then append Async. That allows for say a synchronous int Sum() and an asynchronous Task<into> SumAsync() for example.

The example is good. Specifically, if there are two methods that do the same thing except one returns A and the other returns Task<A>, then appending Async is good.

However, I don't like the idea that the name of any method that returns a Task should end in Async. For example, I don't think the extension method AsTask that lifts an A to an Task<A> should be called AsTaskAsync.

I like the idea of appending Async only if there is also a synchronous variant. This idea may well have exceptions though.

This would help for methods like BiMap where there are four possible variants.

I would typically think of handing that by lifting one delegate to return a Task<> to match the other delegate and then calling the overload that takes two delegates that returns Task<>. Or await when able before calling a synchronous function. Do we have to be worried about the performance tradeoff from a additional method call like that?

If they unify then that would certainly be a benefit.

The only difference is the name of the method (Map vs MapAsync). My current understanding is that they should be unified.

louthy commented 6 years ago

I would typically think of handing that by lifting one delegate to return a Task<> to match the other delegate and then calling the overload that takes two delegates that returns Task<>.

There's an additional allocation cost which would be saved by using named arguments. Where possible I try to reduce unnecessary allocations - not that this library is particularly light on the allocations front, but I try to minimise them where I can.

I think also there's the real-world use-case to consider. Often with choice types where they're bi-mappable, or bi-foldable, the 'alternative' value is used for the error case (not always, obviously, but mostly). I suspect that the most common usage is for choice 1 to be asynchronous where choice 2 would be synchronous (perhaps just giving context to an error value for example). Choice 1 is the 'happy path' where all the heavy lifting is happening.

There is also already precedent for using named arguments for Match and the like, so I'm comfortable with using them to aid the selection of the correct overload.

However, having said all of that, I wanted to clear some ideas through my head first, so I'm not just arguing from a preference point-of-view, but from a 'good for the library' point-of-view.

Think about this situation:

    static async Task<int> AddValue(int x) =>
        x + await GetValue();

    static Error AddContextToError(Error err) =>
        err.AddSomeContext();

     ma = ma.BiMapAsync(RightAsync: AddValue, Left: AddContextToError);

That has a certain amount of elegance to it because it's not cluttered with lambda grammar. It also has no additional allocation costs over what the AddValue and AddContextToError functions use in their implementations.

If BiMapAsync only took Task<R> and Task<L> then the Left side would have to be lifted:

     ma = ma.BiMapAsync(Right: AddValue, Left: err => AddContextToError(err).AsTask());

That instantly becomes much less elegant. There is also an allocation for the lambda and for the Task. Finally, there's the cost of constructing a Task - probably negligible, but but if the Left type is a value-type then we're essentially boxing something that doesn't need boxing.

I took the thought experiment a bit further to see if I could provide some improvements on the elegance front, by providing some common lifting functions:

    static Func<Task<A>> liftTask<A>(Func<A> f) =>
        () => f().AsTask();

    static Func<A, Task<B>> liftTask<A, B>(Func<A, B> f) =>
        a => f(a).AsTask();

    // ... etc ...

These would work like fun or act but instead they would lift the synchronous function so it could be used in the various async methods. However, it suffers from the exact same problems as fun and act and that is C#'s terrible handling of method-groups. The type system refuses to infer the correct method, even when there is only one possible candidate.

And so, what should be this:

    ma = ma.BiMapAsync(AddValue, liftTask(AddContextToError));

Which has the elegance. Instead ends up like this:

    ma = ma.BiMapAsync(AddValue, liftTask<Error, Error>(AddContextToError));

Which is ugly. It also suffers from the same allocation problem as providing a lambda.

Hopefully one day C# will treat methods in method-groups as first-class citizens and provide a decent inference story, but I'm not holding my breath. In the meantime I feel the named arguments approach is likely to make life easier for devs.

However, I don't like the idea that the name of any method that returns a Task should end in Async. For example, I don't think the extension method AsTask that lifts an A to an Task should be called AsTaskAsync.

I think I mostly agree here. Context is important. For example, Match in EitherAsync returns a Task<R> but has no delegates that return a Task. The type EitherAsync by default makes all operations on it asynchronous. So, it makes sense for Match to not have Async appended.

However, that general rule falls apart slightly for the ToString and GetHashCode overrides on EitherAsync. They can't have their names or signatures changed (because they're inherited from Object) and they will force synchronous evaluation. I have provided ToStringAsync and GetHashCodeAsync, but that's immediately breaks the rules set above.

Ideally the rules wouldn't have too many caveats so they can be easily followed and maintained. I'm working through the interfaces this evening, so I'll try and come up with a concrete set that works for all types.

TysonMN commented 6 years ago

I like your change from last night (hash d3c2ccb). Specifically,

the Map-like method the accepts a Func<A, B> is called Map
the Map-like method the accepts a Func<A, Task> is called MapAsync.

I think it is consistent with the monad design pattern to call the methods as you have because the method name varies with the argument types but not with the monad type.

Recall, for a monad M<T>, that

the method that accepts a Func<A, B> is called Map and
the method that accepts a Func<A, M> is called Bind.

So this method MapAsync is lies on the interval between Map and Bind. I suppose any combined monad would face a similar naming issue. Looking at TryOption<> though, I don't see any method that accepts just one argument that is either Func<A, Option> or Func<A, Try>.

louthy commented 6 years ago

I like your change from last night

Yeah, it's starting to feel right. I still need to go through the other async types (TryOptionAsync, TryAsync) and update them as well as the async extensions for Option and TryOption. This is one of those 'by convention' things that will just need a bit of time and care to refine.

So this method MapAsync is lies on the interval between Map and Bind.

I know why you say that, but I see the Async variants more as artefacts of C#'s strict evaluation and less of a 'sliding scale of Map and Bind'.

If values were lazy like Haskell then I wouldn't need to litter the code with Task<...> and the implementations of Map and Bind could deal with the asynchronous behaviour transparently. Unfortunately C# doesn't work that way and so we end up with async polluting the function signatures too. So, although MapAsync and BindAsync isn't a purist definition of a functor map and monad bind operation, the end result is still a functor map operation and monad bind operation, just with asynchronous behaviour.

Looking at TryOption<> though, I don't see any method that accepts just one argument that is either Func<A, Option> or Func<A, Try>.

I wouldn't expect that though. They are not the definition of Map or Bind. TryOption has:

TryOption Map<A, B>(this TryOption<A> self, Func<A, B> f)
TryOption Bind<A, B>(this TryOption<A> self, Func<A, TryOption> f)

I would expect the dev to lift any Option or Try into a TryOption to use with the standard Bind operation.

   var x = TryOption(Some(123));
   var y = Try(123).ToTryOption();

TysonMN commented 6 years ago

I know why you say that, but I see the Async variants more as artefacts of C#'s strict evaluation and less of a 'sliding scale of Map and Bind'.

If values were lazy like Haskell then I wouldn't need to litter the code with Task<...> and the implementations of Map and Bind could deal with the asynchronous behaviour transparently. Unfortunately C# doesn't work that way and so we end up with async polluting the function signatures too. So, although MapAsync and BindAsync isn't a purist definition of a functor map and monad bind operation, the end result is still a functor map operation and monad bind operation, just with asynchronous behaviour.

This makes a lot of sense to me. In particular, synchronous computation is a special case of asynchronous computation. This is clear from C#. Asynchronous computation can easily call synchronous computation but there is no completely correct way for synchronous computation to call asynchronous computation.

However, that general rule falls apart slightly for the ToString and GetHashCode overrides on EitherAsync. They can't have their names or signatures changed (because they're inherited from Object) and they will force synchronous evaluation. I have provided ToStringAsync and GetHashCodeAsync, but that's immediately breaks the rules set above.

I think this exception to the rule is acceptable. Another method with this problem is IEnumerable<T>.GetEnumerator, which, in your design, is on every monad.

I see that you have removed all the calls to properties of Task<>, such as Result and IsFaulted. That is an extremely good improvement just by itself.

I said that 4 days ago, presumably in reference to state of your branch at commit 39b7400. Somehow I made a mistake. There are still many calls to Result and IsFaulted in the current commit.

Do you intend to remove them?

I don't see how your asynchronous monad types can correctly implement synchronous methods like ToString, Equals, GetHashCode, and GetEnumerator.

faeriedust commented 6 years ago

However, that general rule falls apart slightly for the ToString and GetHashCode overrides on EitherAsync. They can't have their names or signatures changed (because they're inherited from Object) and they will force synchronous evaluation. I have provided ToStringAsync and GetHashCodeAsync, but that's immediately breaks the rules set above.

Why do you have to evaluate for these methods? Just use Object.ToString() and Object.GetHashcode(), no?

TysonMN commented 6 years ago

I said that 4 days ago, presumably in reference to state of your branch at commit 39b7400. Somehow I made a mistake. There are still many calls to Result and IsFaulted in the current commit.

Do you intend to remove them?

I just reviewed the current version (which is commit 966d9bfecaeb84668338564c3a841ea0cd611ace) of your async-refactor branch. I think you (@louthy) have fixed (almost) all the possible deadlocks that could happen by calling Task.Result. The only methods in production code (I didn't check tests or sample code) that could deadlock are EitherAsync.ToString and EitherAsync.GetHashCode because they both call Task.Result on the Task returned from their async counterparts. I think you did that intentionally, and I think it is reasonable to do so.

Good work!

TysonMN commented 6 years ago

Oops. Didn't mean to do that.

faeriedust commented 6 years ago

I disagree that it is reasonable. Task.ToString() is not the same as Task.Result, nor should it be. The same applies to async types in this library, they should not be the same thing, imo.

Suppose my task is going to take 10 minutes, why should I wait for 10 minutes, or even deadlock my program, just to see the text in the debugger or similar?

If it's asynchronous, it's specifically because I chose to delay evaluation, so don't evaluate it for me, unless I need the value, which .ToString() semantically does not require in other CLR types, such as Task

TysonMN commented 6 years ago

Previously in this issue we were discussing the naming of methods like Map and Bind depending on whether or not their argument types include Task<>.

@louthy, I really like the idea you shared in your v3.0 release notes. There, when talking about this naming, you gave this example.

Match(SomeAsync, None)
Match(Some, NoneAsync)
Match(SomeAsync, NoneAsync)

said that sometimes the method would keep its normal name (in this case, Match) and one can explicitly disambiguate among the overloads by explicitly naming the arguments.

I just wanted to say that I think this is a great idea. This approach should allow the name to be simple (i.e. just Match instead of something like MatchAsync) in most cases without any problems (i.e. the compiler will select the indented overload). Then in the rare cases where things are ambiguous to the compiler, we have a simple way to clarify the ambiguity (by explicitly naming one or more arguments). Furthermore, we don't have to come with an exponential number (relative to the number of parameters) of different method names in order to avoid ambiguous situations for the compiler.

P.S. Since the original purpose of this issue was a bug that is now fixed, I think this issue can now be close.

Great work @louthy! :)

louthy / language-ext

Deadlock from calling Task.Result #317

Example Code

Expected Behavior

Actual Behavior

Possible Fix

Reducing to Match

Considering FoldAsync

Task Status Canceled

First Question

Second Question

Reducing to `Match`

Considering `FoldAsync`