jwiegley / async-pool

Other
21 stars 13 forks source link

mapTasks_ does not work #2

Open spl opened 9 years ago

spl commented 9 years ago

Sorry about the vague title. I haven't looked into the cause, but I thought it was significant enough to report.

I have the following code:

liftIO $ withTaskGroup 50 $ \tg -> mapTasks_ tg tasks

This was the first time I tried async-pool, so when I ran it, I was quite surprised that the tasks did not run. I swapped in mapTasks for mapTasks_:

liftIO $ withTaskGroup 50 $ \tg -> void $ mapTasks tg tasks

And the tasks ran just fine.

Since the problem seems to be pretty clearly due to mapTasks_, I hope it's an easy fix for you. Otherwise, if you need a test case or something, let me know.

jwiegley commented 9 years ago

@spl Thanks for the report! Also, I see your point about nested mapTasks, although I'm not sure what to do there. async-pool is creating all the tasks immediately, but only spawning threads for a few jobs. But as you say, within those jobs is another mapTasks, which will indeed deadlock waiting for a free slot.

How is this for a solution: Each time we spawn a thread within a task group, we remember the threadId used for that job. Then, anytime a new job is enqueued for the same task group, and we see that it's being enqueued from within that threadId, and that we will be waiting on the result, we allow those jobs to additionally execute within that thread (at most serially, but if slots become free they can run concurrently again).

I'm not sure exactly how this will work in the code, but we do need some way for the blocked worker thread to make progress in these situations. I'll create a test to reproduce the deadlock and then see what I can do.

Thanks for trying async-pool! If you think it's ready for prime-time, I was thinking of announcing it to Haskell-Cafe this weekend.

jwiegley commented 9 years ago

As for mapTasks_: The main difference here is that it does not wait before returning. Perhaps this is making it appear as though the tasks aren't running, because they haven't had a chance to yet? mapTasks does wait, which is perhaps too different of a semantic to justify simply dropping the _.

jwiegley commented 9 years ago

To further clarify the deadlock-resolving solution:

When calling wait on a job, and that job was enqueued within the current threadId, replace the job in the task group with a thunk waiting on a TMVar and immediately execute the corresponding IO action in that thread, putting to the TMVar on completion.

This way we can commandeer execution to avoid the delay, while not upsetting anyone depending on the result via the task group.

spl commented 9 years ago

@jwiegley Thanks for the response. I don't have time to answer fully right now, but there are a couple of things I wanted to mention.

I did see the code containing mapTasks_ complete, so it makes sense that that is because it doesn't wait. Each of the tasks forked by mapTasks_ had database updates, and they were not waiting on anything other than free slots. I never saw those database updates. I'm not sure if those threads were somehow killed or what.

I ran the new mapTasks-based code on a production migration last night. Unfortunately, something made it get stuck in the middle. No idea what it was.

jwiegley commented 9 years ago

@spl Ok, I really want to get to the bottom of this, so anything I can do to help, let me know.

I tried implementing the deadlock avoidance that I described above, but it is not possible: At the point where the deadlock happens, I don't have access to IO, only STM.

So what about this for an alternate solution: Waiting, within an active task, on a task spawned from that thread throws an exception, with a suggestion to use a secondary task group?

spl commented 9 years ago

I feel I'm getting a bit lost. There are two independent issues, right?

  1. mapTasks_ semantics and implementation
  2. Nested use of a TaskGroup

To keep the discussion focused, I suggest creating a new GitHub issue for the second issue. I think it's much more difficult to resolve. At this point, I would add documentation that functions such as mapTasks create all tasks immediately, so you do not want to have one of those tasks using the same TaskGroup.

As for the first issue, were you able to reproduce my problem? If not, I'll try to make a small test case. It should be easy to do using my app+database setup, but I can try to do it without the database, perhaps with a threadDelay or something.

That said, time to respond to some of your comments:

As for mapTasks_: The main difference here is that it does not wait before returning. [...] mapTasks does wait, which is perhaps too different of a semantic to justify simply dropping the _.

I do think it is unexpected to have the _ version of a function be so radically different from the non-_ version. I glanced at the implementation of mapTasks_ when I ran into the problem. I was, admittedly, surprised to see that it wasn't mapTasks with void, but I didn't dig into it further. If it is the case that mapTasks_ will be so different, it should be documented in bold.

So what about this for an alternate solution: Waiting, within an active task, on a task spawned from that thread throws an exception, with a suggestion to use a secondary task group?

This sounds like some form of deadlock detection. How certain can you be that the exception doesn't get thrown for a “legitimate” wait?

jwiegley commented 9 years ago

My apologies for confusing this ticket. I'll continue discussing mapTasks_, but will open a new issue for the deadlock problem.

jwiegley commented 9 years ago

As for mapTasks_, a reproducible test case would be a great help. As for the semantic difference, I think we should just underscore it in bold in the documentation for now.

jwiegley commented 9 years ago

@spl After thinking about it a lot more, I think that mapTasks_ f xs should be equivalent to void $ mapTasks f xs. If one wishes to start a group of tasks without waiting for any result, they can use mapM_ async xs.

spl commented 9 years ago

That makes sense, and it would be good to document the async alternative.

Sorry I haven't gotten back to you with the test case. I've been rather busy lately. I've got a deadline next weekend, so I'm busy this weekend as well.

Do you still want the test case for mapTasks_?

jwiegley commented 9 years ago

@spl Whenever you have the time, I'd really like to know what's going on. Thanks!

clrnd commented 9 years ago

This is a great library, I'd love if this one got fixed. Maybe I can help somehow?

jwiegley commented 9 years ago

@alvare That would be great! If you could write some tests to demonstrate the exact problem, it shouldn't be too hard to find the issue and fix it.

Unrelated, by Simon Marlow has pointed out a potential race condition that simply can't be fixed -- but we should have tests to demonstrate it, at the very last as documentation. For example, if you have a task group with 1 slot, and a task A that manually depends on B (using an MVar, instead of the dependency logic in async-pool), and A gets started first by the scheduler, we'll deadlock without detection.

The answer in this case is to make use of the dependency graph in async-pool itself, but it does raise the point that manually using blocking constructions can lead to undetected deadlocks. This doesn't happen with regular async, because every thread is allowed to start right away.

clrnd commented 9 years ago

@jwiegley well that's easy:

https://gist.github.com/alvare/67c5717affe6bba891f3

Replace mapTasks_ with mapTasks and it prints the numbers. mapTasks_ is just not waiting for it's tasks to finish, a simple threadDelay and you see the numbers get printed.

spl commented 9 years ago

@alvare Thanks for doing that.

@jwiegley I'm sorry I didn't get around to writing something up. I had moved onto other problems and never made time for this one.

dzhus commented 8 years ago

Is this problem with mapConcurrently the same as the one being discussed?

λ> withTaskGroup 2 $ \g -> mapConcurrently g (const $ getCurrentTime) [1, 2, 3]
*** Exception: thread blocked indefinitely in an STM transaction
neongreen commented 6 years ago

It looks like the latest version on Github doesn't have this bug anymore (at least the example above definitely works) but I'm not sure that other functions are not buggy.

0xd34df00d commented 5 years ago

mapConcurrently works for me too for the current HEAD but not 0.9.0.2 on hackage, with a more involved usage pattern. Perhaps worth a bump?

jchia commented 4 months ago

What's the status of the mapTasks_ issue? I have a test program (but not a 'test') https://github.com/jchia/async-pool-test The expected output is 12, but the actual output is 2. It repros every time for me.